Scraping Fish API

Scraping Fish API is simple to use but at the same time enables powerful features.

Below is the full API reference of the Scraping Fish API.

GET/api/v1/

Scrape website

This endpoint allows you to retrieve HTML content of the desired website.

Required parameters

Name
api_key
Type
string
Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
Name
url
Type
string
Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

Name
render_js
Type
bool
Description
Enable JavaScript rendering to wait for background network requests. See: JS Rendering.
Name
js_scenario
Type
object
Description
Actions to execute after the website is loaded. See: JS Scenario.
Name
intercept_request
Type
string
Description
Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.
Name
headers
Type
object
Description
Headers to be set or overridden. See: Forward HTTP headers.
Name
forward_original_status
Type
bool
Description
Whether to forward original status in Sf-Original-Status-Code response header. See: Original status code
Name
extract_rules
Type
object
Description
Rules to be applied to resulting HTML to extract desired data in JSON format. See: Extraction rules
Name
cookies
Type
array[object]
Description
Cookies to be sent with the request. See: Cookies
Name
total_timeout_ms
Type
integer
Description
Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds) and greater than or equal to trial_timeout_ms. Defaults to 90000 (90 seconds). See: Timeouts
Name
trial_timeout_ms
Type
integer
Description
Maximum number of milliseconds before timing out for a single request execution trial. It must be between 10000 (10 seconds) and 600000 (600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to 30000` (30 seconds). See: Timeouts
Name
render_js_timeout_ms
Type
integer
Description
Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout
Name
screenshot
Type
bool
Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
Name
screenshot_base64
Type
bool
Description
If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot
Name
preload_local_storage
Type
object
Description
Object of key/values that will be preloaded as localStorage. See: Preload localStorage
Name
session
Type
string
Description
A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.

Request

GET

/api/v1/

import requests

payload = {
"api_key": "[your API key]",
"url": "https://example.com",
}

response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)

Response

<!doctype html>
<html>
…
<body>
<div>
  <h1>Example Domain</h1>
  <p>
    This domain is for use in illustrative examples in documents. You may
    use this domain in literature without prior coordination or asking for
    permission.
  </p>
  <p>
    <a href="https://www.iana.org/domains/example">More information...</a>
  </p>
</div>
</body>
</html>

POST/api/v1/

POST to website

This endpoint allows you to retrieve HTML content of the desired website responding to POST request.

Required parameters

Name
api_key
Type
string
Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
Name
url
Type
string
Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

Name
render_js
Type
bool
Description
Enable JavaScript rendering to wait for background network requests. See: JS Rendering.
Name
js_scenario
Type
object
Description
Actions to execute after the website is loaded. See: JS Scenario.
Name
intercept_request
Type
string
Description
Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.
Name
headers
Type
object
Description
Headers to be set or overridden. See: Forward HTTP headers.
Name
forward_original_status
Type
bool
Description
Whether to forward original status in Sf-Original-Status-Code response header. See: Original status code
Name
extract_rules
Type
object
Description
Rules to be applied to resulting HTML to extract desired content. See: Extraction rules
Name
cookies
Type
array[object]
Description
Cookies to be sent with the request. See: Cookies
Name
total_timeout_ms
Type
integer
Description
Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds) and greater than or equal to trial_timeout_ms. Defaults to 90000 (90 seconds). See: Timeouts
Name
trial_timeout_ms
Type
integer
Description
Maximum number of milliseconds before timing out for a single request execution trial. It must be between 10000 (10 seconds) and 600000 (600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to 30000` (30 seconds). See: Timeouts
Name
render_js_timeout_ms
Type
integer
Description
Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout
Name
screenshot
Type
bool
Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
Name
screenshot_base64
Type
bool
Description
If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot
Name
preload_local_storage
Type
object
Description
Object of key/values that will be preloaded as localStorage. See: Preload localStorage
Name
session
Type
string
Description
A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.

Request

POST

/api/v1/


import requests
import json

payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/post",
"headers": json.dumps({"x-custom-header": "value"}),
}

data = {"key1": "value1", "key2": "value2"}

response = requests.post(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)

Response

{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c1f6c-4c8f5f7f312dd7d91700cd06",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/post"
}

PUT/api/v1/

PUT to website

This endpoint allows you to retrieve HTML content of the desired website responding to PUT request.

Required parameters

Name
api_key
Type
string
Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
Name
url
Type
string
Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.

Optional parameters

Name
render_js
Type
bool
Description
Enable JavaScript rendering to wait for background network requests. See: JS Rendering.
Name
js_scenario
Type
object
Description
Actions to execute after the website is loaded. See: JS Scenario.
Name
intercept_request
Type
string
Description
Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.
Name
headers
Type
object
Description
Headers to be set or overridden. See: Forward HTTP headers.
Name
forward_original_status
Type
bool
Description
Whether to forward original status in Sf-Original-Status-Code response header. See: Original status code
Name
extract_rules
Type
object
Description
Rules to be applied to resulting HTML to extract desired content. See: Extraction rules
Name
cookies
Type
array[object]
Description
Cookies to be sent with the request. See: Cookies
Name
total_timeout_ms
Type
integer
Description
Maximum total number of milliseconds before timing out. It must be between 10000 (10 seconds) and 600000 (600 seconds) and greater than or equal to trial_timeout_ms. Defaults to 90000 (90 seconds). See: Timeouts
Name
trial_timeout_ms
Type
integer
Description
Maximum number of milliseconds before timing out for a single request execution trial. It must be between 10000 (10 seconds) and 600000 (600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to 30000` (30 seconds). See: Timeouts
Name
render_js_timeout_ms
Type
integer
Description
Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout
Name
screenshot
Type
bool
Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
Name
screenshot_base64
Type
bool
Description
If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot
Name
preload_local_storage
Type
object
Description
Object of key/values that will be preloaded as localStorage. See: Preload localStorage
Name
session
Type
string
Description
A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.

Request

PUT

/api/v1/


import requests
import json

payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/put",
"headers": json.dumps({"x-custom-header": "value"}),
}

data = {"key1": "value1", "key2": "value2"}

response = requests.put(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)

Response

{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c200f-53995abb051f677b10480cc3",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/put"
}