Intercept XHR request

Scraping Fish API allows you to intercept and obtain response from a background XHR/Fetch request. There are two options for incorporating request interception:

Get the response from a specific XHR/Fetch request as a response from the Scraping Fish API.
Wait for a specific XHR/Fetch request to complete and then get the HTML content from requested URL.

If the request you want to intercept is never executed, your Scraping Fish request will wait for it until timeout.

Response from intercepted request

To intercept XHR/Fetch request and get its response, use intercept_request query parameter and provide the pattern to match desired request.

Scraping Fish automatically waits for the request and until the response is available. There is no need for you to take other measures to ensure this.

Example

For http://httpbin.org URL, the API specification is dynamically loaded via background request to http://httpbin.org/spec.json which we will intercept to get its response in JSON format. This can be achieved by specifying intercept_request=**/spec** query parameter:

GET

/api/v1/

import requests

payload = {
  "api_key": "[your API key]",
  "url": "http://httpbin.org",
  "intercept_request": "**/spec*",
}

response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.json())

Response content for this request to Scraping Fish API will be forwarded from the response of intercepted background request to http://httpbin.org/spec.json since it is the first (and only) completed request matching **/spec** pattern:

Response

{
    "basePath": "/",
    "definitions": {},
    "host": "httpbin.org",
    "info": {
    ...
}

Wait for response

To wait for response from specific XHR/Fetch request until it is completed and then get the HTML content from requested URL, use wait_for_response step in JS Scenario.

Example

The example below sends requests to http://httpbin.org URL and waits for the request specified by the **/spec** pattern. The response is HTML content of the requested URL.

GET

/api/v1/

import requests
import json

payload = {
  "api_key": "[your API key]",
  "url": "http://httpbin.org",
  "js_scenario": json.dumps(
    {"steps": [{"wait_for_response": "**/spec**"}]}
  ),
}

response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)

Response for this request will be the HTML content of requested URL, extracted after the first request matching **/spec** pattern is completed.

Response

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <title>httpbin.org</title>

...