Scraping URLs containing query parameters
If the URL you want to scrape contains query parameters, it is required to first URL-encode it. This is important because otherwise our API wouldn't be able to differentiate between query params used in Scraping Fish API call vs. query params you want to pass to the desired web page.
import requests
payload = {
"api_key": "[your API key]",
"url": "https://example.com",
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)
URL Encoding
When using requests
Python package or axios
in NodeJS, parameters are automatically URL-encoded, unless you construct the URL manually with template strings as in the examples above. In this case, you should not encode parameters because that would result in double encoding and you'd get error response from API stating that the URL you provided is malformed.
If using cURL
however, you need to encode URLs containing query parameters. With cURL
, it's best practice to always encode your URL, regardless of wheter it contains query parameters or not.
cURL
curl -G --data-urlencode 'url=https://example.com?example=param&second=parameter' 'https://scraping.narf.ai/api/v1/?api_key=[your API key]'
While Python's requests
and NodeJS' axios
both automatically encode
parameters, it's not guaranteed that your language/environment does it too. In
such a case, make sure that you correctly encode the URL.
Encoding of other parameters
When using cURL
(or an HTTP client that does not automatically URL encodes parameters), you should also encode other API parameters that are containing non-ASCII or reserved characters. For example, when using JS Scenario you should encode it too.
cURL
curl -G --data-urlencode 'url=https://example.com' \
--data-urlencode 'js_scenario={"steps": [{"wait": 1000}, {"click_and_wait_for_navigation": "p > a"}]}' \
'https://scraping.narf.ai/api/v1/?api_key=[your API key]'