Skip to content

API usage

Base URL

[GET|POST] https://webscraperapi.datashake.com/

Request parameters

url *

string An URL to scrape.

Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/"
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon
country

string A geographic proxy location. Please have a look at currently supported proxy geolocations.

If no specific geolocation was selected, it will get picked randomly.

Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.example.com%2F&country=us&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "country": "us"
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon
render

boolean JavaScript rendering.

Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?render=True&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "render": True
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon
wait_for

integer|string Sleep or wait for CSS selector

The value can be:

  • time (in milliseconds) to wait before returning the page content. This might be useful if you want to force the page to load all necessary elements before returning the content back.
  • a valid CSS selector that should appear on the page before returning the page content
  • This parameter can only be used with rendering enabled
  • integer value cannot exceed 15 seconds (15000 ms)
  • In case the selector is not found on the page, you'll receive unsuccessful response. If you want to get the website body even if your selector is not found, make sure to use selector which is always found in HTML body, such as body (eg. #some-selector,body)
Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?render=True&wait_for=10000&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "render": True,
    "wait_for": 10000
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon
click_on

string Click on CSS selector

  • This parameter can only be used with rendering enabled
Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?render=True&click_on=div.wrapper-more-availability%20button&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "render": True,
    "click_on": "div.wrapper-more-availability button"
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon
request_headers

string Add custom request headers in the following format:

[header_name]:[header_value]|[header_name]:[header_value]
Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?request_headers=accept-language%3Aen-GB%7Chost%3Awebscraperapi.com&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

headers = {
    "accept-language": "en-GB",
    "host": "www.example.com"
}
encoded_headers = "|".join(f"{k}:{v}" for k, v in headers.items())

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "request_headers": encoded_headers
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon
request_cookies

string Add custom request cookies in the following format:

[cookie_name]:[cookie_value]|[cookie_name]:[cookie_value]
Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?request_cookies=x-session%3A1234%7Cx-user%3Axyz&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

cookies = {
    "x-session": 1234,
    "x-user": "xyz"
}
encoded_cookies = "|".join(f"{k}:{v}" for k, v in cookies.items())

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "request_cookies": encoded_cookiess
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon
response_format

string Response format. Please have a look at Response section to decide which response format better suits your needs.

Default value is html.

Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?response_format=json&url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "response_format": "json"
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon
default_options [WIP]

boolean Use API default scraping options.

Learn more about scraping options here.

Code example
curl --request GET --url 'https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.example.com%2F&default_options=true&api_key=qwerty'
import requests

url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "default_options": True
}
response = requests.get(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

Rules and restrictions

  • The url parameter is required.

Payload

Web Scraper API supports POST requests. Currently, supported payload types are:

  • application/json
  • application/x-www-form-urlencoded

The content type has to be passed using request_headers parameter.

Code example
curl --request POST --url 'https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.example.com%2F&api_key=qwerty' --data '{\"hello\": \"world\"}'
import requests

payload = {
    "hello": "world"
}
url = "https://webscraperapi.datashake.com/"
params = {
    "api_key": "qwerty",
    "url": "https://www.example.com/",
    "request_headers": "content-type:application/json"
}
response = requests.post(
    url=url,
    params=params
)
coming soon
coming soon
coming soon

Response

wsa-original-status header is the original status code returned from the targeted URL.

To keep track of jobs and be able to debug them, we provide a unique identifier for every request you submit. You can use this ID to refer to your request and you'll always find it in request headers under wsa-task-id.

Response format: HTML

GET https://webscraperapi.datashake.com/?url=https%3A%2F%2Fwww.google.com%2F&api_key=qwerty
Response body
<!doctype html>
    <html itemscope="" itemtype="http://schema.org/WebPage" lang="en">
    <head><meta charset="UTF-8"><meta content="origin" name="referrer"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image">
    <link href="/manifest?pwa=webhp" crossorigin="use-credentials" rel="manifest">
        <title>Google</title>
        <script nonce="higSK9pYDY087Sf7Xy__CA">(function(){window.google={kEI:'b33GY_CELcXOkPIP0uqlmAw',kEXPI:'31',kBL:'DWTW'};google.sn='webhp';google.kHL='en';})();(function()
    ...
Response headers
date: Tue, 17 Jan 2023 10:50:24 GMT
content-type: text/html; charset=utf-8
content-length: 142228
wsa-task-id: 16733242-8854b345
wsa-original-status: 200
wsa-url: https%3A//www.google.com/
wsa-set-cookie: 1P_JAR=2023-01-17-10; Domain=google.com; expires=Thu, 16-Feb-2023 10:50:23 GMT; Path=/; SameSite=none; Secure
wsa-set-cookie: AEC=ARSKqsL6lDkrvnqKA0fUF8GZlxkjs-d34tRYFcoSLuTmr8PFCM-4INqtfA; Domain=google.com; expires=Sun, 16-Jul-2023 10:50:23 GMT; HttpOnly; Path=/; SameSite=lax; Secure
wsa-set-cookie: NID=511=NA_YjaUbg4Kjv-aI0OdQmrzqa2J_7VpIFxREdAo0iZE-itboMkEw57kIvg9594Yhm2NpCZ0vMWY-WnAoFtd0oCXHGC1vleqNVHCkngL5rZGv9mWKl_M_JTMJCjHVTV9HqGthLsOrzm-Qy0NdFu5q9ozLCP3GMa1ih_9wxMDaS10; Domain=google.com; expires=Wed, 19-Jul-2023 10:50:23 GMT; HttpOnly; Path=/; SameSite=none; Secure
strict-transport-security: max-age=15724800; includeSubDomains

Response format: JSON

GET https://webscraperapi.datashake.com/?response_format=json&url=https%3A%2F%2Fwww.google.com%2F&api_key=qwerty
Response body
{
    "success": true,
    "details": "Scraping successful",
    "response": {
        "url": "https://www.google.com/",
        "headers": {
            "Date": "Tue, 17 Jan 2023 10:45:02 GMT",
            "Expires": "-1",
            "Cache-Control": "private, max-age=0",
            "Content-Type": "text/html; charset=UTF-8",
            "Strict-Transport-Security": "max-age=31536000",
            ...
        },
        "cookies": [
            {
                "name": "1P_JAR",
                "value": "2023-01-17-10",
                "expires": "Thu, 16-Feb-2023 10:45:02 GMT",
                "path": "/"
            },
            {
                "name": "AEC",
                "value": "ARSKqsK2pwpuSbuO42ga70tjjf639w4a4jzGn7TTmRqCav-Y8xSyaAE_7sg",
                "expires": "Sun, 16-Jul-2023 10:45:02 GMT",
                "path": "/"
            },
            {
                "name": "NID",
                "value": "511=I_L3hpVRy4FJQXtJRThEN699BfG-7gD42mUrl63vwn7g4jGKSsIfEznn5qr6N8Qj0zZW-A4kb51iSRYFMkiVI3c3LmsqBCxOqMYEJcjcqVTjRMSjFMU0_YxeehNBtvz5qxamOsW2o8xFjt9Q0sJZQgDz7tPWnsjGSgCDTggxV9E",
                "expires": "Wed, 19-Jul-2023 10:45:02 GMT",
                "path": "/"
            }
        ],
        "json_body": null,
        "body": "<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org/WebPage\" lang=\"en\"><head><meta charset=\"UTF-8\"><meta content=\"origin\" name=\"referrer\"><meta content=\"/images/branding/googleg/1x/googleg_standard_color_128dp.png\" itemprop=\"image\"><link href=\"/manifest?pwa=webhp\" crossorigin=\"use-credentials\" rel=\"manifest\"><title>Google</title><script...",
        "status_code": 200
    }
} 
Response headers
date: Tue, 17 Jan 2023 10:45:03 GMT
content-type: application/json
content-length: 151802
wsa-task-id: 16733242-8854bdfa
wsa-original-status: 200
wsa-url: https%3A//www.google.com/
wsa-set-cookie: 1P_JAR=2023-01-17-10; Domain=google.com; expires=Thu, 16-Feb-2023 10:45:02 GMT; Path=/; SameSite=none; Secure
wsa-set-cookie: AEC=ARSKqsK2pwpuSbuO42ga70tjjf639w4a4jzGn7TTmRqCav-Y8xSyaAE_7sg; Domain=google.com; expires=Sun, 16-Jul-2023 10:45:02 GMT; HttpOnly; Path=/; SameSite=lax; Secure
wsa-set-cookie: NID=511=I_L3hpVRy4FJQXtJRThEN699BfG-7gD42mUrl63vwn7g4jGKSsIfEznn5qr6N8Qj0zZW-A4kb51iSRYFMkiVI3c3LmsqBCxOqMYEJcjcqVTjRMSjFMU0_YxeehNBtvz5qxamOsW2o8xFjt9Q0sJZQgDz7tPWnsjGSgCDTggxV9E; Domain=google.com; expires=Wed, 19-Jul-2023 10:45:02 GMT; HttpOnly; Path=/; SameSite=none; Secure
strict-transport-security: max-age=15724800; includeSubDomains