Intercept XHR request
Scraping Fish API allows you to intercept and obtain response from a background XHR/Fetch request. There are two options for incorporating request interception:
- Get the response from a specific XHR/Fetch request as a response from the Scraping Fish API.
- Wait for a specific XHR/Fetch request to complete and then get the HTML content from requested URL.
If the request you want to intercept is never executed, your Scraping Fish request will wait for it until timeout.
Response from intercepted request
To intercept XHR/Fetch request and get its response, use intercept_request
query parameter and provide the pattern to match desired request.
Scraping Fish automatically waits for the request and until the response is available. There is no need for you to take other measures to ensure this.
Example
For http://httpbin.org
URL, the API specification is dynamically loaded via background request to http://httpbin.org/spec.json
which we will intercept to get its response in JSON format.
This can be achieved by specifying intercept_request=**/spec**
query parameter:
import requests
payload = {
"api_key": "[your API key]",
"url": "http://httpbin.org",
"intercept_request": "**/spec*",
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.json())
Response content for this request to Scraping Fish API will be forwarded from the response of intercepted background request to http://httpbin.org/spec.json
since it is the first (and only) completed request matching **/spec**
pattern:
Response
{
"basePath": "/",
"definitions": {},
"host": "httpbin.org",
"info": {
...
}
Wait for response
To wait for response from specific XHR/Fetch request until it is completed and then get the HTML content from requested URL, use wait_for_response
step in JS Scenario.
Example
The example below sends requests to http://httpbin.org
URL and waits for the request specified by the **/spec**
pattern.
The response is HTML content of the requested URL.
import requests
import json
payload = {
"api_key": "[your API key]",
"url": "http://httpbin.org",
"js_scenario": json.dumps(
{"steps": [{"wait_for_response": "**/spec**"}]}
),
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)
Response for this request will be the HTML content of requested URL, extracted after the first request matching **/spec**
pattern is completed.
Response
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>httpbin.org</title>
...