Scraping Fish API
Scraping Fish API is simple to use but at the same time enables powerful features.
Below is the full API reference of the Scraping Fish API.
Scrape website
This endpoint allows you to retrieve HTML content of the desired website.
Required parameters
- Name
api_key
- Type
- string
- Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
- Name
url
- Type
- string
- Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.
Optional parameters
- Name
render_js
- Type
- bool
- Description
Enable JavaScript rendering to wait for background network requests. See: JS Rendering.
- Name
js_scenario
- Type
- object
- Description
Actions to execute after the website is loaded. See: JS Scenario.
- Name
intercept_request
- Type
- string
- Description
Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.
- Name
headers
- Type
- object
- Description
Headers to be set or overridden. See: Forward HTTP headers.
- Name
forward_original_status
- Type
- bool
- Description
Whether to forward original status in
Sf-Original-Status-Code
response header. See: Original status code
- Name
extract_rules
- Type
- object
- Description
Rules to be applied to resulting HTML to extract desired data in JSON format. See: Extraction rules
- Name
total_timeout_ms
- Type
- integer
- Description
Maximum total number of milliseconds before timing out. It must be between
10000
(10 seconds) and600000
(600 seconds) and greater than or equal totrial_timeout_ms
. Defaults to90000
(90 seconds). See: Timeouts
- Name
trial_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds before timing out for a single request execution trial. It must be between
10000
(10 seconds) and600000
(600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to
30000` (30 seconds). See: Timeouts
- Name
render_js_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout
- Name
screenshot
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
- Name
screenshot_base64
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot
- Name
preload_local_storage
- Type
- object
- Description
Object of key/values that will be preloaded as
localStorage
. See: Preload localStorage
- Name
session
- Type
- string
- Description
A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.
Request
import requests
payload = {
"api_key": "[your API key]",
"url": "https://example.com",
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)
Response
<!doctype html>
<html>
…
<body>
<div>
<h1>Example Domain</h1>
<p>
This domain is for use in illustrative examples in documents. You may
use this domain in literature without prior coordination or asking for
permission.
</p>
<p>
<a href="https://www.iana.org/domains/example">More information...</a>
</p>
</div>
</body>
</html>
POST to website
This endpoint allows you to retrieve HTML content of the desired website responding to POST request.
Required parameters
- Name
api_key
- Type
- string
- Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
- Name
url
- Type
- string
- Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.
Optional parameters
- Name
render_js
- Type
- bool
- Description
Enable JavaScript rendering to wait for background network requests. See: JS Rendering.
- Name
js_scenario
- Type
- object
- Description
Actions to execute after the website is loaded. See: JS Scenario.
- Name
intercept_request
- Type
- string
- Description
Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.
- Name
headers
- Type
- object
- Description
Headers to be set or overridden. See: Forward HTTP headers.
- Name
forward_original_status
- Type
- bool
- Description
Whether to forward original status in
Sf-Original-Status-Code
response header. See: Original status code
- Name
extract_rules
- Type
- object
- Description
Rules to be applied to resulting HTML to extract desired content. See: Extraction rules
- Name
total_timeout_ms
- Type
- integer
- Description
Maximum total number of milliseconds before timing out. It must be between
10000
(10 seconds) and600000
(600 seconds) and greater than or equal totrial_timeout_ms
. Defaults to90000
(90 seconds). See: Timeouts
- Name
trial_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds before timing out for a single request execution trial. It must be between
10000
(10 seconds) and600000
(600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to
30000` (30 seconds). See: Timeouts
- Name
render_js_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout
- Name
screenshot
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
- Name
screenshot_base64
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot
- Name
preload_local_storage
- Type
- object
- Description
Object of key/values that will be preloaded as
localStorage
. See: Preload localStorage
- Name
session
- Type
- string
- Description
A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.
Request
import requests
import json
payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/post",
"headers": json.dumps({"x-custom-header": "value"}),
}
data = {"key1": "value1", "key2": "value2"}
response = requests.post(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)
Response
{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c1f6c-4c8f5f7f312dd7d91700cd06",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/post"
}
PUT to website
This endpoint allows you to retrieve HTML content of the desired website responding to PUT request.
Required parameters
- Name
api_key
- Type
- string
- Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
- Name
url
- Type
- string
- Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.
Optional parameters
- Name
render_js
- Type
- bool
- Description
Enable JavaScript rendering to wait for background network requests. See: JS Rendering.
- Name
js_scenario
- Type
- object
- Description
Actions to execute after the website is loaded. See: JS Scenario.
- Name
intercept_request
- Type
- string
- Description
Intercept background XHR/Fetch request matching specified pattern and obtain its response. See: Intercept XHR request.
- Name
headers
- Type
- object
- Description
Headers to be set or overridden. See: Forward HTTP headers.
- Name
forward_original_status
- Type
- bool
- Description
Whether to forward original status in
Sf-Original-Status-Code
response header. See: Original status code
- Name
extract_rules
- Type
- object
- Description
Rules to be applied to resulting HTML to extract desired content. See: Extraction rules
- Name
total_timeout_ms
- Type
- integer
- Description
Maximum total number of milliseconds before timing out. It must be between
10000
(10 seconds) and600000
(600 seconds) and greater than or equal totrial_timeout_ms
. Defaults to90000
(90 seconds). See: Timeouts
- Name
trial_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds before timing out for a single request execution trial. It must be between
10000
(10 seconds) and600000
(600 seconds) and lower than or equal to ``total_timeout_ms. Defaults to
30000` (30 seconds). See: Timeouts
- Name
render_js_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds to wait for background network requests to finish when JS rendering is enabled. See: JS Rendering timeout
- Name
screenshot
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
- Name
screenshot_base64
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as base64-encoded image data together with the HTML content in JSON response. See: Screenshot
- Name
preload_local_storage
- Type
- object
- Description
Object of key/values that will be preloaded as
localStorage
. See: Preload localStorage
- Name
session
- Type
- string
- Description
A session identifier. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 characters long. See: Sticky Session.
Request
import requests
import json
payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/put",
"headers": json.dumps({"x-custom-header": "value"}),
}
data = {"key1": "value1", "key2": "value2"}
response = requests.put(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)
Response
{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c200f-53995abb051f677b10480cc3",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/put"
}