Monday, May 11, 2015

Simple CloudFlare bypass

Accidentally i've discovered a simple way to bypass CloudFlare anti DDoS protection for future website scraping purposes.
For example will take http://skidpaste.org.


If you will try to get the main page with requests python module:
>>> import requests
>>> r = requests.get('http://skidpaste.org')
>>> r.status_code
503
>>>

or with mechanize module:
>>> import mechanize
>>> br = mechanize.Browser()
>>> br.set_handle_robots(False)
>>> br.open('http://skidpaste.org')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open
    raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 403: Forbidden
>>>
There different response codes, but the main point is clear: you haven't the website content.

If we'll open the resource with Firefox browser and wait for the actual website, we'll receive a CloudFlare cookies. Which will be checked every time when you'll access the resource.



So the idea is to get these cookies, and pass to my lovely requests module :)

Pseudocode look like this:
1. Open website with selenium
2. Wait for 10 seconds
3. Get CloudFlare cookies
4. Close selenium browser.

Python example:

#!/usr/bin/python

from selenium import webdriver
from time import sleep
import cookielib
import requests

print 'Launching Firefox..'
browser = webdriver.Firefox()
print 'Entering to skidpaste.org...'
browser.get('http://skidpaste.org')
print 'Waiting 10 seconds...'
sleep(10)
a = browser.get_cookies()
print 'Got cloudflare cookies:\n'
print 'Closing Firefox..'
browser.close()

h = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0'}

b = cookielib.CookieJar()

for i in a:
  ck = cookielib.Cookie(name=i['name'], value=i['value'], domain=i['domain'], path=i['path'], secure=i['secure'], rest=False, version=0,port=None,port_specified=False,domain_specified=False,domain_initial_dot=False,path_specified=True,expires=i['expiry'],discard=True,comment=None,comment_url=None,rfc2109=False)
  b.set_cookie(ck)

r = requests.get('http://skidpaste.org', cookies=b, headers=h)
print len(r.content)
print r.status_code

The output:
# ./cloudflare_bypass.py 
Launching Firefox..
Entering to skidpaste.org...
Waiting 10 seconds...
Got cloudflare cookies:

[{u'domain': u'.skidpaste.org', u'name': u'__cfduid', u'value': u'd8af70c3b49361a5a1b818e91171e598d1431355518', u'expiry': 1462891518, u'path': u'/', u'secure': False}, {u'domain': u'.skidpaste.org', u'name': u'cf_clearance', u'value': u'5857af9797c612cde4ac590fe900e0e9f3d7098f-1431355526-57600', u'expiry': 1431416726, u'path': u'/', u'secure': False}, {u'domain': u'skidpaste.org', u'name': u'PHPSESSID', u'value': u'eefc5d29f6cea1ddb70ca5a0baaf60e1', u'expiry': None, u'path': u'/', u'secure': False}]
Closing Firefox..
115026
200

2 comments:

  1. Is It Okey To Use => https://github.com/Anorov/cloudflare-scrape ???

    ReplyDelete
  2. After the sleep(10) the redirection is done and you can directly work on the updated page

    ReplyDelete