Intercept XHR request

Scraping Fish API allows you to intercept and obtain response from a background XHR/Fetch request. There are two options for incorporating request interception:

  1. Get the response from a specific XHR/Fetch request as a response from the Scraping Fish API.
  2. Wait for a specific XHR/Fetch request to complete and then get the HTML content from requested URL.

Response from intercepted request

To intercept XHR/Fetch request and get its response, use intercept_request query parameter and provide the pattern to match desired request.

Example

For http://httpbin.org URL, the API specification is dynamically loaded via background request to http://httpbin.org/spec.json which we will intercept to get its response in JSON format. This can be achieved by specifying intercept_request=**/spec** query parameter:

GET
/api/v1/
import requests

payload = {
  "api_key": "[your API key]",
  "url": "http://httpbin.org",
  "intercept_request": "**/spec*",
}

response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.json())

Response content for this request to Scraping Fish API will be forwarded from the response of intercepted background request to http://httpbin.org/spec.json since it is the first (and only) completed request matching **/spec** pattern:

Response

{
    "basePath": "/",
    "definitions": {},
    "host": "httpbin.org",
    "info": {
    ...
}

Wait for response

To wait for response from specific XHR/Fetch request until it is completed and then get the HTML content from requested URL, use wait_for_response step in JS Scenario.

Example

The example below sends requests to http://httpbin.org URL and waits for the request specified by the **/spec** pattern. The response is HTML content of the requested URL.

GET
/api/v1/
import requests
import json

payload = {
  "api_key": "[your API key]",
  "url": "http://httpbin.org",
  "js_scenario": json.dumps(
    {"steps": [{"wait_for_response": "**/spec**"}]}
  ),
}

response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)

Response for this request will be the HTML content of requested URL, extracted after the first request matching **/spec** pattern is completed.

Response

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <title>httpbin.org</title>

...