Scraping Fish API

Scraping Fish API is simple to use but at the same time enables powerful features.

Below is the full API reference of the Scraping Fish API.

GET/api/v1/

Scrape website

This endpoint allows you to retrieve HTML content of the desired website.

Required parameters

  • Name
    api_key
    Type
    string
    Description

    Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.

  • Name
    url
    Type
    string
    Description

    URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

  • Name
    render_js
    Type
    bool
    Description

    Enable JavaScript rendering to wait for background network requests. See: JS Rendering.

  • Name
    js_scenario
    Type
    object
    Description

    Actions to execute after the website is loaded. See: JS Scenario.

  • Name
    intercept_request
    Type
    string
    Description

    Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.

  • Name
    headers
    Type
    object
    Description

    Headers to be set or overridden. See: Forward HTTP headers.

  • Name
    forward_original_status
    Type
    bool
    Description

    Whether to forward original status in Sf-Original-Status-Code response header. See: Original status code

  • Name
    extract_rules
    Type
    object
    Description

    Rules to be applied to resulting HTML to extract desired data in JSON format. See: Extraction rules

  • Name
    cookies
    Type
    array[object]
    Description

    Cookies to be sent with the request. See: Cookies

  • Name
    total_timeout_ms
    Type
    integer
    Description

    Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds) and greater than or equal to trial_timeout_ms. Defaults to 90000 (90 seconds). See: Timeouts

  • Name
    trial_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds before timing out for a single request execution trial. It must be between 10000 (10 seconds) and 600000 (600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to 30000` (30 seconds). See: Timeouts

  • Name
    render_js_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout

  • Name
    screenshot
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot

  • Name
    screenshot_base64
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot

  • Name
    preload_local_storage
    Type
    object
    Description

    Object of key/values that will be preloaded as localStorage. See: Preload localStorage

  • Name
    session
    Type
    string
    Description

    A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.

Request

GET
/api/v1/
import requests

payload = {
"api_key": "[your API key]",
"url": "https://example.com",
}

response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)

Response

<!doctype html>
<html>

<body>
<div>
  <h1>Example Domain</h1>
  <p>
    This domain is for use in illustrative examples in documents. You may
    use this domain in literature without prior coordination or asking for
    permission.
  </p>
  <p>
    <a href="https://www.iana.org/domains/example">More information...</a>
  </p>
</div>
</body>
</html>

POST/api/v1/

POST to website

This endpoint allows you to retrieve HTML content of the desired website responding to POST request.

Required parameters

  • Name
    api_key
    Type
    string
    Description

    Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.

  • Name
    url
    Type
    string
    Description

    URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

  • Name
    render_js
    Type
    bool
    Description

    Enable JavaScript rendering to wait for background network requests. See: JS Rendering.

  • Name
    js_scenario
    Type
    object
    Description

    Actions to execute after the website is loaded. See: JS Scenario.

  • Name
    intercept_request
    Type
    string
    Description

    Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.

  • Name
    headers
    Type
    object
    Description

    Headers to be set or overridden. See: Forward HTTP headers.

  • Name
    forward_original_status
    Type
    bool
    Description

    Whether to forward original status in Sf-Original-Status-Code response header. See: Original status code

  • Name
    extract_rules
    Type
    object
    Description

    Rules to be applied to resulting HTML to extract desired content. See: Extraction rules

  • Name
    cookies
    Type
    array[object]
    Description

    Cookies to be sent with the request. See: Cookies

  • Name
    total_timeout_ms
    Type
    integer
    Description

    Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds) and greater than or equal to trial_timeout_ms. Defaults to 90000 (90 seconds). See: Timeouts

  • Name
    trial_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds before timing out for a single request execution trial. It must be between 10000 (10 seconds) and 600000 (600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to 30000` (30 seconds). See: Timeouts

  • Name
    render_js_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout

  • Name
    screenshot
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot

  • Name
    screenshot_base64
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot

  • Name
    preload_local_storage
    Type
    object
    Description

    Object of key/values that will be preloaded as localStorage. See: Preload localStorage

  • Name
    session
    Type
    string
    Description

    A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.

Request

POST
/api/v1/

import requests
import json

payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/post",
"headers": json.dumps({"x-custom-header": "value"}),
}

data = {"key1": "value1", "key2": "value2"}

response = requests.post(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)

Response

{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c1f6c-4c8f5f7f312dd7d91700cd06",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/post"
}

PUT/api/v1/

PUT to website

This endpoint allows you to retrieve HTML content of the desired website responding to PUT request.

Required parameters

  • Name
    api_key
    Type
    string
    Description

    Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.

  • Name
    url
    Type
    string
    Description

    URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

  • Name
    render_js
    Type
    bool
    Description

    Enable JavaScript rendering to wait for background network requests. See: JS Rendering.

  • Name
    js_scenario
    Type
    object
    Description

    Actions to execute after the website is loaded. See: JS Scenario.

  • Name
    intercept_request
    Type
    string
    Description

    Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.

  • Name
    headers
    Type
    object
    Description

    Headers to be set or overridden. See: Forward HTTP headers.

  • Name
    forward_original_status
    Type
    bool
    Description

    Whether to forward original status in Sf-Original-Status-Code response header. See: Original status code

  • Name
    extract_rules
    Type
    object
    Description

    Rules to be applied to resulting HTML to extract desired content. See: Extraction rules

  • Name
    cookies
    Type
    array[object]
    Description

    Cookies to be sent with the request. See: Cookies

  • Name
    total_timeout_ms
    Type
    integer
    Description

    Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds) and greater than or equal to trial_timeout_ms. Defaults to 90000 (90 seconds). See: Timeouts

  • Name
    trial_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds before timing out for a single request execution trial. It must be between 10000 (10 seconds) and 600000 (600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to 30000` (30 seconds). See: Timeouts

  • Name
    render_js_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout

  • Name
    screenshot
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot

  • Name
    screenshot_base64
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot

  • Name
    preload_local_storage
    Type
    object
    Description

    Object of key/values that will be preloaded as localStorage. See: Preload localStorage

  • Name
    session
    Type
    string
    Description

    A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.

Request

PUT
/api/v1/

import requests
import json

payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/put",
"headers": json.dumps({"x-custom-header": "value"}),
}

data = {"key1": "value1", "key2": "value2"}

response = requests.put(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)

Response

{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c200f-53995abb051f677b10480cc3",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/put"
}