Timeouts
Scraping Fish API allows you to set three types of timeouts:
- For the entire request, including JS scenario execution time, with the
total_timeout_ms
parameter (defaults to 90 seconds). - For one trial of request execution with the
trial_timeout_ms
parameter (defaults to 30 seconds). - For JavaScript rendering to wait for background requests to finish with the
render_js_timeout_ms
parameter (defaults to 5 seconds).
Timeouts should be specified in milliseconds, and it is possible to set all trial_timeout_ms
, trial_timeout_ms
, and render_js_timeout_ms
for the same request.
Total timeout
The total request timeout is by default set to 90,000 ms (90 seconds) and can be set to any value between 10 and 600 seconds, but not smaller than the total timeout.
This is an approximate value, and the actual timeout can happen within a 1,000 ms margin.
If you have a complex JS scenario use case and need more time, you have to adjust the total_timeout_ms
parameter as in the example below.
To simulate a long JS scenario, we go to example.com and simply wait for 100 seconds.
To make sure that we have enough time for the website to load and then to execute our dummy JS scenario, we have to set both trial_timeout_ms
and total_timeout_ms
to 110,000 ms.
import requests
import json
payload = {
"api_key": "[your API key]",
"url": "https://www.example.com",
"js_scenario": json.dumps(
{"steps": [{"wait": 100_000}]}
),
"trial_timeout_ms": 110_000,
"total_timeout_ms": 110_000,
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)
Single trial timeout
In addition to the total request timeout, it is possible to set a timeout for one trial of request execution using the trial_timeout_ms
query parameter.
By default, it is set to 30,000 ms (30 seconds) and can be adjusted to any value between 10 and 600 seconds, but not larger than the total timeout.
This also includes time needed for the JS scenario execution, if provided.
In case loading the website fails for any reason within one trial timeout, Scraping Fish attempts to load it again until it succeeds (or until interrupted by the total request timeout).
In the example below, we expect example.com to load very quickly and want to force Scraping Fish API to retry the request if it fails within 10 seconds.
import requests
payload = {
"api_key": "[your API key]",
"url": "https://www.example.com",
"trial_timeout_ms": 10_000,
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)
The value of total_timeout_ms
must me larger or equal to the value of trial_timeout_ms
.
JS rendering timeout
Finally, there is a timeout which controls how long we wait for background requests to finish using the render_js_timeout_ms
query parameter.
It is set to 5,000 ms (5 seconds) by default and is only applicable when JS rendering is enabled and ignored otherwise.
Configuration of JS rendering timeout might be needed when the website executs a large number of long running background requests and you want to wait for one of them, which can take more than 5 seconds to complete. This is a rare situation and you usually don't have to worry about it.