Scraping Fish API
Scraping Fish API is simple to use but at the same time enables powerful features.
Below is the full API reference of the Scraping Fish API.
Scrape website
This endpoint allows you to retrieve HTML content of the desired website.
Required parameters
- Name
api_key
- Type
- string
- Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
- Name
url
- Type
- string
- Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.
Optional parameters
- Name
render_js
- Type
- bool
- Description
Enable JavaScript rendering. See: JS Rendering.
- Name
js_scenario
- Type
- object
- Description
Actions to execute after the website is loaded. See: JS Scenario.
- Name
headers
- Type
- object
- Description
Headers to be set or overridden. See: Forward HTTP headers.
- Name
forward_original_status
- Type
- bool
- Description
Whether to forward original status in
Sf-Original-Status-Code
. See: Original status code
- Name
extract_rules
- Type
- object
- Description
Rules to be applied to resulting HTML to extract desired content. See: Extraction rules
- Name
cookies
- Type
- array[object]
- Description
Cookies to be sent with the request. See: Cookies
- Name
total_timeout_ms
- Type
- integer
- Description
Maximum total number of milliseconds before timing out. It must be between
10000
(10 seconds) and600000
(600 seconds). Defaults to90000
(90 seconds). If it's value is smaller thantrial_timeout_ms
,trial_timeout_ms
value will be set to this one. See: Timeouts
- Name
trial_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds before timing out for a single trial. It must be between
10000
(10 seconds) and600000
(600 seconds). Defaults to30000
(30 seconds). If it's value is higher thantotal_timeout_ms
it will be set to the value oftotal_timeout_ms
. See: Timeouts
- Name
screenshot
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
- Name
preload_local_storage
- Type
- object
- Description
Object of key/values that will be preloaded as
localStorage
. See: Preload localStorage
- Name
session
- Type
- string
- Description
An identifier of session. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 length. See: Sticky Session.
Request
import requests
payload = {
"api_key": "[your API key]",
"url": "https://example.com",
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)
Response
<!doctype html>
<html>
…
<body>
<div>
<h1>Example Domain</h1>
<p>
This domain is for use in illustrative examples in documents. You may
use this domain in literature without prior coordination or asking for
permission.
</p>
<p>
<a href="https://www.iana.org/domains/example">More information...</a>
</p>
</div>
</body>
</html>
POST to website
This endpoint allows you to retrieve HTML content of the desired website responding to POST request.
Required parameters
- Name
api_key
- Type
- string
- Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
- Name
url
- Type
- string
- Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.
Optional parameters
- Name
render_js
- Type
- bool
- Description
Enable JavaScript rendering. See: JS Rendering.
- Name
js_scenario
- Type
- object
- Description
Actions to execute after the website is loaded. See: JS Scenario.
- Name
headers
- Type
- object
- Description
Headers to be set or overridden. See: Forward HTTP headers.
- Name
forward_original_status
- Type
- bool
- Description
Whether to forward original status in
Sf-Original-Status-Code
. See: Original status code
- Name
extract_rules
- Type
- object
- Description
Rules to be applied to resulting HTML to extract desired content. See: Extraction rules
- Name
cookies
- Type
- array[object]
- Description
Cookies to be sent with the request. See: Cookies
- Name
total_timeout_ms
- Type
- integer
- Description
Maximum total number of milliseconds before timing out. It must be between
10000
(10 seconds) and600000
(600 seconds). Defaults to90000
(90 seconds). If it's value is smaller thantrial_timeout_ms
,trial_timeout_ms
value will be set to this one. See: Timeouts
- Name
trial_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds before timing out for a single trial. It must be between
10000
(10 seconds) and600000
(600 seconds). Defaults to30000
(30 seconds). If it's value is higher thantotal_timeout_ms
it will be set to the value oftotal_timeout_ms
. See: Timeouts
- Name
screenshot
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
- Name
preload_local_storage
- Type
- object
- Description
Object of key/values that will be preloaded as
localStorage
. See: Preload localStorage
- Name
session
- Type
- string
- Description
An identifier of session. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 length. See: Sticky Session.
Request
import requests
import json
payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/post",
"headers": json.dumps({"x-custom-header": "value"}),
}
data = {"key1": "value1", "key2": "value2"}
response = requests.post(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)
Response
{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c1f6c-4c8f5f7f312dd7d91700cd06",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/post"
}
PUT to website
This endpoint allows you to retrieve HTML content of the desired website responding to PUT request.
Required parameters
- Name
api_key
- Type
- string
- Description
Your Scraping Fish API key. If you don't have it, you can get one by buying a Request Pack.
- Name
url
- Type
- string
- Description
URL to scrape. If it contains non-ASCII or reserved characters, it needs to be URL encoded.
Optional parameters
- Name
render_js
- Type
- bool
- Description
Enable JavaScript rendering. See: JS Rendering.
- Name
js_scenario
- Type
- object
- Description
Actions to execute after the website is loaded. See: JS Scenario.
- Name
headers
- Type
- object
- Description
Headers to be set or overridden. See: Forward HTTP headers.
- Name
forward_original_status
- Type
- bool
- Description
Whether to forward original status in
Sf-Original-Status-Code
. See: Original status code
- Name
extract_rules
- Type
- object
- Description
Rules to be applied to resulting HTML to extract desired content. See: Extraction rules
- Name
cookies
- Type
- array[object]
- Description
Cookies to be sent with the request. See: Cookies
- Name
total_timeout_ms
- Type
- integer
- Description
Maximum total number of milliseconds before timing out. It must be between
10000
(10 seconds) and600000
(600 seconds). Defaults to90000
(90 seconds). If it's value is smaller thantrial_timeout_ms
,trial_timeout_ms
value will be set to this one. See: Timeouts
- Name
trial_timeout_ms
- Type
- integer
- Description
Maximum number of milliseconds before timing out for a single trial. It must be between
10000
(10 seconds) and600000
(600 seconds). Defaults to30000
(30 seconds). If it's value is higher thantotal_timeout_ms
it will be set to the value oftotal_timeout_ms
. See: Timeouts
- Name
screenshot
- Type
- bool
- Description
If set to true, screenshot of the website will be taken and returned as bytes. See: Screenshot
- Name
preload_local_storage
- Type
- object
- Description
Object of key/values that will be preloaded as
localStorage
. See: Preload localStorage
- Name
session
- Type
- string
- Description
An identifier of session. For requests with the same value of this parameter, cookies and localStorage will be preserved. Must be between 1 and 512 length. See: Sticky Session.
Request
import requests
import json
payload = {
"api_key": "[your API key]",
"url": "https://httpbin.org/put",
"headers": json.dumps({"x-custom-header": "value"}),
}
data = {"key1": "value1", "key2": "value2"}
response = requests.put(f"https://scraping.narf.ai/api/v1/", params=payload, json=data)
print(response.content)
Response
{
"args": {},
"data": "{\"key1\":\"value1\",\"key2\":\"value2\"}",
"files": {},
"form": {},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "33",
"Content-Type": "application/octet-stream",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; rv:124.0) Gecko/20100101 Firefox/124.0",
"X-Amzn-Trace-Id": "Root=1-661c200f-53995abb051f677b10480cc3",
"X-Custom-Header": "value"
},
"json": {
"key1": "value1",
"key2": "value2"
},
"origin": "1.2.3.4",
"url": "https://httpbin.org/put"
}