Scraping Fish API

Scraping Fish API is simple to use but at the same time enables powerful features.

Below is the full API reference of the Scraping Fish API.

GET/api/v1/

Scrape website

This endpoint allows you to retrieve HTML content of the desired website.

Required parameters

  • Name
    api_key
    Type
    string
    Description

    Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.

  • Name
    url
    Type
    string
    Description

    URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

  • Name
    render_js
    Type
    bool
    Description

    Enable JavaScript rendering. See: JS Rendering.

  • Name
    js_scenario
    Type
    object
    Description

    Actions to execute after the website is loaded. See: JS Scenario.

  • Name
    headers
    Type
    object
    Description

    Headers to be set or overridden. See: Forward HTTP headers.

  • Name
    forward_original_status
    Type
    bool
    Description

    Whether to forward original status in Sf-Original-Status-Code. See: Original status code

  • Name
    extract_rules
    Type
    object
    Description

    Rules to be applied to resulting HTML to extract desired content. See: Extraction rules

  • Name
    cookies
    Type
    array[object]
    Description

    Cookies to be sent with the request. See: Cookies

  • Name
    total_timeout_ms
    Type
    integer
    Description

    Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds). Defaults to 90000 (90 seconds). If it's value is smaller than trial_timeout_ms, trial_timeout_ms value will be set to this one. See: Timeouts

  • Name
    trial_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds before timing out for a single trial. It must be between 10000 (10 seconds) and 600000 (600 seconds). Defaults to 30000 (30 seconds). If it's value is higher than total_timeout_ms it will be set to the value of total_timeout_ms. See: Timeouts

  • Name
    screenshot
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot

  • Name
    preload_local_storage
    Type
    object
    Description

    Object of key/values that will be preloaded as localStorage. See: Preload localStorage

  • Name
    session
    Type
    string
    Description

    An identifier of session. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 length. See: Sticky Session.

Request

GET
/api/v1/

import requests

payload = {
"api_key": "[your API key]",
"url": "https://example.com",
}

response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)

Response

<!doctype html>
<html>

<body>
<div>
  <h1>Example Domain</h1>
  <p>
    This domain is for use in illustrative examples in documents. You may
    use this domain in literature without prior coordination or asking for
    permission.
  </p>
  <p>
    <a href="https://www.iana.org/domains/example">More information...</a>
  </p>
</div>
</body>
</html>

POST/api/v1/

POST to website

This endpoint allows you to retrieve HTML content of the desired website responding to POST request.

Required parameters

  • Name
    api_key
    Type
    string
    Description

    Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.

  • Name
    url
    Type
    string
    Description

    URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

  • Name
    render_js
    Type
    bool
    Description

    Enable JavaScript rendering. See: JS Rendering.

  • Name
    js_scenario
    Type
    object
    Description

    Actions to execute after the website is loaded. See: JS Scenario.

  • Name
    headers
    Type
    object
    Description

    Headers to be set or overridden. See: Forward HTTP headers.

  • Name
    forward_original_status
    Type
    bool
    Description

    Whether to forward original status in Sf-Original-Status-Code. See: Original status code

  • Name
    extract_rules
    Type
    object
    Description

    Rules to be applied to resulting HTML to extract desired content. See: Extraction rules

  • Name
    cookies
    Type
    array[object]
    Description

    Cookies to be sent with the request. See: Cookies

  • Name
    total_timeout_ms
    Type
    integer
    Description

    Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds). Defaults to 90000 (90 seconds). If it's value is smaller than trial_timeout_ms, trial_timeout_ms value will be set to this one. See: Timeouts

  • Name
    trial_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds before timing out for a single trial. It must be between 10000 (10 seconds) and 600000 (600 seconds). Defaults to 30000 (30 seconds). If it's value is higher than total_timeout_ms it will be set to the value of total_timeout_ms. See: Timeouts

  • Name
    screenshot
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot

  • Name
    preload_local_storage
    Type
    object
    Description

    Object of key/values that will be preloaded as localStorage. See: Preload localStorage

  • Name
    session
    Type
    string
    Description

    An identifier of session. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 length. See: Sticky Session.

Request

POST
/api/v1/

import requests
import json

payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/post",
"headers": json.dumps({"x-custom-header": "value"}),
}

data = {"key1": "value1", "key2": "value2"}

response = requests.post(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)

Response

{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c1f6c-4c8f5f7f312dd7d91700cd06",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/post"
}

PUT/api/v1/

PUT to website

This endpoint allows you to retrieve HTML content of the desired website responding to PUT request.

Required parameters

  • Name
    api_key
    Type
    string
    Description

    Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.

  • Name
    url
    Type
    string
    Description

    URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

  • Name
    render_js
    Type
    bool
    Description

    Enable JavaScript rendering. See: JS Rendering.

  • Name
    js_scenario
    Type
    object
    Description

    Actions to execute after the website is loaded. See: JS Scenario.

  • Name
    headers
    Type
    object
    Description

    Headers to be set or overridden. See: Forward HTTP headers.

  • Name
    forward_original_status
    Type
    bool
    Description

    Whether to forward original status in Sf-Original-Status-Code. See: Original status code

  • Name
    extract_rules
    Type
    object
    Description

    Rules to be applied to resulting HTML to extract desired content. See: Extraction rules

  • Name
    cookies
    Type
    array[object]
    Description

    Cookies to be sent with the request. See: Cookies

  • Name
    total_timeout_ms
    Type
    integer
    Description

    Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds). Defaults to 90000 (90 seconds). If it's value is smaller than trial_timeout_ms, trial_timeout_ms value will be set to this one. See: Timeouts

  • Name
    trial_timeout_ms
    Type
    integer
    Description

    Maximum number of milliseconds before timing out for a single trial. It must be between 10000 (10 seconds) and 600000 (600 seconds). Defaults to 30000 (30 seconds). If it's value is higher than total_timeout_ms it will be set to the value of total_timeout_ms. See: Timeouts

  • Name
    screenshot
    Type
    bool
    Description

    If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot

  • Name
    preload_local_storage
    Type
    object
    Description

    Object of key/values that will be preloaded as localStorage. See: Preload localStorage

  • Name
    session
    Type
    string
    Description

    An identifier of session. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 length. See: Sticky Session.

Request

PUT
/api/v1/

import requests
import json

payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/put",
"headers": json.dumps({"x-custom-header": "value"}),
}

data = {"key1": "value1", "key2": "value2"}

response = requests.put(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)

Response

{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c200f-53995abb051f677b10480cc3",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/put"
}