API usage

Base URL

[GET|POST] https://webscraperapi.datashake.com/

Request parameters

url *

string An URL to scrape.

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/"
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

country

string A geographic proxy location. Please have a look at currently supported proxy geolocations.

If no specific geolocation was selected, it will get picked randomly.

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.example.com%2F&country=us&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "country": "us"
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

render

boolean JavaScript rendering.

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?render=True&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "render": True
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

wait_for

integer|string Sleep or wait for CSS selector

The value can be:

time (in milliseconds) to wait before returning the page content. This might be useful if you want to force the page to load all necessary elements before returning the content back.
a valid CSS selector that should appear on the page before returning the page content

This parameter can only be used with rendering enabled
integer value cannot exceed 15 seconds (15000 ms)
In case the selector is not found on the page, you'll receive unsuccessful response. If you want to get the website body even if your selector is not found, make sure to use selector which is always found in HTML body, such as body (eg. #some-selector,body)

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?render=True&wait_for=10000&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "render": True,
    "wait_for": 10000
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

click_on

string Click on CSS selector

This parameter can only be used with rendering enabled

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?render=True&click_on=div.wrapper-more-availability%20button&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "render": True,
    "click_on": "div.wrapper-more-availability button"
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

request_headers

string Add custom request headers in the following format:

[header_name]:[header_value]|[header_name]:[header_value]

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?request_headers=accept-language%3Aen-GB%7Chost%3Awebscraperapi.com&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

headers = {
    "accept-language": "en-GB",
    "host": "www.example.com"
}
encoded_headers = "|".join(f"{k}:{v}" for k, v in headers.items())

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "request_headers": encoded_headers
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

request_cookies

string Add custom request cookies in the following format:

[cookie_name]:[cookie_value]|[cookie_name]:[cookie_value]

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?request_cookies=x-session%3A1234%7Cx-user%3Axyz&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

cookies = {
    "x-session": 1234,
    "x-user": "xyz"
}
encoded_cookies = "|".join(f"{k}:{v}" for k, v in cookies.items())

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "request_cookies": encoded_cookiess
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

response_format

string Response format. Please have a look at Response section to decide which response format better suits your needs.

Default value is html.

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?response_format=json&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "response_format": "json"
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

default_options [WIP]

boolean Use API default scraping options.

Learn more about scraping options here.

Code examplecurlpythonrubyjavascriptphp
curl --request GET --url 'https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.example.com%2F&default_options=true&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "default_options": True
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

Rules and restrictions

The url parameter is required.

Payload

Web Scraper API supports POST requests. Currently, supported payload types are:

application/json
application/x-www-form-urlencoded

The content type has to be passed using request_headers parameter.

Code examplecurlpythonrubyjavascriptphp
curl --request POST --url 'https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty' --data '{\"hello\": \"world\"}'
import requests

payload = {
    "hello": "world"
}
url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "request_headers": "content-type:application/json"
}
response = requests.post(
    url=url,
    params=params,
    data=payload
)
coming soon
coming soon
coming soon

Response

wsa-original-status header is the original status code returned from the targeted URL.

To keep track of jobs and be able to debug them, we provide a unique identifier for every request you submit. You can use this ID to refer to your request and you'll always find it in request headers under wsa-task-id.

Response format: HTML

GET https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.google.com%2F&api_key=qwerty

Response body

<!doctype html>
    <html itemscope="" itemtype="http://schema.org/WebPage" lang="en">
    <head><meta charset="UTF-8"><meta content="origin" name="referrer"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image">
    <link href="/manifest?pwa=webhp" crossorigin="use-credentials" rel="manifest">
        <title>Google</title>
        <script nonce="higSK9pYDY087Sf7Xy__CA">(function(){window.google={kEI:'b33GY_CELcXOkPIP0uqlmAw',kEXPI:'31',kBL:'DWTW'};google.sn='webhp';google.kHL='en';})();(function()
    ...

Response headers

date: Tue, 17 Jan 2023 10:50:24 GMT
content-type: text/html; charset=utf-8
content-length: 142228
wsa-task-id: 16733242-8854b345
wsa-original-status: 200
wsa-url: https%3A//www.google.com/
wsa-set-cookie: 1P_JAR=2023-01-17-10; Domain=google.com; expires=Thu, 16-Feb-2023 10:50:23 GMT; Path=/; SameSite=none; Secure
wsa-set-cookie: AEC=ARSKqsL6lDkrvnqKA0fUF8GZlxkjs-d34tRYFcoSLuTmr8PFCM-4INqtfA; Domain=google.com; expires=Sun, 16-Jul-2023 10:50:23 GMT; HttpOnly; Path=/; SameSite=lax; Secure
wsa-set-cookie: NID=511=NA_YjaUbg4Kjv-aI0OdQmrzqa2J_7VpIFxREdAo0iZE-itboMkEw57kIvg9594Yhm2NpCZ0vMWY-WnAoFtd0oCXHGC1vleqNVHCkngL5rZGv9mWKl_M_JTMJCjHVTV9HqGthLsOrzm-Qy0NdFu5q9ozLCP3GMa1ih_9wxMDaS10; Domain=google.com; expires=Wed, 19-Jul-2023 10:50:23 GMT; HttpOnly; Path=/; SameSite=none; Secure
strict-transport-security: max-age=15724800; includeSubDomains

Response format: JSON

GET https://webscraperapi.datashake.com/?response_format=json&url=https%3A%2F%2Fwww.google.com%2F&api_key=qwerty

Response body

{
    "success": true,
    "details": "Scraping successful",
    "response": {
        "url": "https://www.google.com/",
        "headers": {
            "Date": "Tue, 17 Jan 2023 10:45:02 GMT",
            "Expires": "-1",
            "Cache-Control": "private, max-age=0",
            "Content-Type": "text/html; charset=UTF-8",
            "Strict-Transport-Security": "max-age=31536000",
            ...
        },
        "cookies": [
            {
                "name": "1P_JAR",
                "value": "2023-01-17-10",
                "expires": "Thu, 16-Feb-2023 10:45:02 GMT",
                "path": "/"
            },
            {
                "name": "AEC",
                "value": "ARSKqsK2pwpuSbuO42ga70tjjf639w4a4jzGn7TTmRqCav-Y8xSyaAE_7sg",
                "expires": "Sun, 16-Jul-2023 10:45:02 GMT",
                "path": "/"
            },
            {
                "name": "NID",
                "value": "511=I_L3hpVRy4FJQXtJRThEN699BfG-7gD42mUrl63vwn7g4jGKSsIfEznn5qr6N8Qj0zZW-A4kb51iSRYFMkiVI3c3LmsqBCxOqMYEJcjcqVTjRMSjFMU0_YxeehNBtvz5qxamOsW2o8xFjt9Q0sJZQgDz7tPWnsjGSgCDTggxV9E",
                "expires": "Wed, 19-Jul-2023 10:45:02 GMT",
                "path": "/"
            }
        ],
        "json_body": null,
        "body": "<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org/WebPage\" lang=\"en\"><head><meta charset=\"UTF-8\"><meta content=\"origin\" name=\"referrer\"><meta content=\"/images/branding/googleg/1x/googleg_standard_color_128dp.png\" itemprop=\"image\"><link href=\"/manifest?pwa=webhp\" crossorigin=\"use-credentials\" rel=\"manifest\"><title>Google</title><script...",
        "status_code": 200
    }
}

Response headers

date: Tue, 17 Jan 2023 10:45:03 GMT
content-type: application/json
content-length: 151802
wsa-task-id: 16733242-8854bdfa
wsa-original-status: 200
wsa-url: https%3A//www.google.com/
wsa-set-cookie: 1P_JAR=2023-01-17-10; Domain=google.com; expires=Thu, 16-Feb-2023 10:45:02 GMT; Path=/; SameSite=none; Secure
wsa-set-cookie: AEC=ARSKqsK2pwpuSbuO42ga70tjjf639w4a4jzGn7TTmRqCav-Y8xSyaAE_7sg; Domain=google.com; expires=Sun, 16-Jul-2023 10:45:02 GMT; HttpOnly; Path=/; SameSite=lax; Secure
wsa-set-cookie: NID=511=I_L3hpVRy4FJQXtJRThEN699BfG-7gD42mUrl63vwn7g4jGKSsIfEznn5qr6N8Qj0zZW-A4kb51iSRYFMkiVI3c3LmsqBCxOqMYEJcjcqVTjRMSjFMU0_YxeehNBtvz5qxamOsW2o8xFjt9Q0sJZQgDz7tPWnsjGSgCDTggxV9E; Domain=google.com; expires=Wed, 19-Jul-2023 10:45:02 GMT; HttpOnly; Path=/; SameSite=none; Secure
strict-transport-security: max-age=15724800; includeSubDomains