Scraping URLs containing query parameters
If the URL you want to scrape contains query parameters, it is required to first URL-encode it. This is important because otherwise our API wouldn't be able to differentiate between query params used in Scraping Fish API call vs. query params you want to pass to the web page.
import requests
payload = {
"api_key": "[your API key]",
"url": "https://example.com",
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)
URL Encoding
When using requests
Python package or axios
in NodeJS, parameters provided in params
argument are automatically URL-encoded as in the example above.
In this case, you should not encode parameters because that would result in double encoding and you'd get error response from API stating that the URL you provided is malformed.
You only have to apply URL-encoding when you construct the URL manually with template strings.
If using cURL
however, you need to encode URLs containing query parameters.
With cURL
, it is recommended to always encode your URL, regardless of wheter it contains query parameters or not.
cURL
curl -G --data-urlencode 'url=https://example.com?example=param&second=parameter' \
'https://scraping.narf.ai/api/v1/?api_key=[your API key]'
While Python's requests
and NodeJS' axios
both automatically encode
parameters, it's not guaranteed that your language/environment does it too. In
such a case, make sure that you correctly encode the URL.
Below is a list of links to documentation for URL-encoding methods in selected popular programming languages:
- Python: urllib.parse.quote_plus
- JavaScript: encodeURIComponent
- Java: java.net.URLEncoder.encode
- Ruby: URI.escape
- Rust: urlencoding::encode
- GO: url.QueryEscape
- PHP: urlencode
Encoding of other parameters
When using cURL
(or an HTTP client that does not automatically URL encodes parameters), you should also encode other API parameters that contain non-ASCII or reserved characters.
For example, when using JS Scenario you should encode it too.
cURL
curl -G --data-urlencode 'url=https://example.com' \
--data-urlencode 'js_scenario={"steps": [{"wait": 1000}, {"click_and_wait_for_navigation": "p > a"}]}' \
'https://scraping.narf.ai/api/v1/?api_key=[your API key]'