JS Scenario
In this guide, we will look at how to use JS Scenario to perform activities on the scraped website.
Scraping Fish API allows you to specify a series of steps to execute once the page is loaded.
You can use it, for example, to click a button or fill in a form.
Steps to perform are passed as JSON in js_scenario
query parameter.
Remember to encode this parameter like in the examples below.
Example
To give you an idea on how you can use this feature, let's see an example scenario which, once the page is loaded, waits for 1 second (1000 ms), clicks on the item selected by p > a
CSS selector, and waits for navigation to complete.
Execute example JS Scenario
import requests
import json
payload = {
"api_key": "[your API key]",
"url": "https://example.com",
"js_scenario": json.dumps({
"steps": [
{"wait": 1000},
{"click_and_wait_for_navigation": "p > a"}
]
})
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
print(response.content)
Response
<!doctype html>
<html>
<head>
<title>Example Domains</title>
…
</head>
<body>
…
</body>
</html>
Steps
The top level key for JS Scenario JSON must be steps
.
It is an array of objects which define action steps to be executed in a sequence.
Each object's only key is a name of the action to perform and the value is its argument.
For example:
{
"steps": [
{
"wait_for": "#button-id"
},
{
"select": {
"selector": "#select-id",
"options": "value1"
}
},
{
"click": "#button-id"
}
]
}
Execution of this scenario will start with waiting until #button-id
element is available, then select value1
option from the select element (drop-down list) with #select-id
id, and finally click the button.
In the following section, we provide all available predefined actions which you can use as steps in a JS scenario.
If you need to execute custom JavaScript code, use evaluate
action.
Available actions
- Name
click
- Type
- string
- Description
Clicks an element specified by a selector.
Click
{ "steps": [ {"click": "#a-button"} ] }
- Name
click_if_exists
- Type
- string
- Description
Clicks an element specified by a selector but only if the element exists and skips this step otherwise. It can be useful if you want to close a cookie banner or other model window which does not appear every time.
Click if exists
{ "steps": [ {"click_if_exists": "#a-button"} ] }
- Name
input
- Type
- object
- Description
Fills in given values to the input elements specified by selectors. It's an object mapping selectors to desired input values. If the order of filling in the inputs matters in your use case, you should specify each input field as a separate input action. You can optionally specify an option to "humanize" an input action. If set, actual key press events are sent. It may only be necessary if keyboard events are handled differently than usual input.
Input
{ "steps": [ { "input": { "#input1": "value1", "#input2": "value2" } } ] }
- Name
select
- Type
- object
- Description
Selects option(s) from a given
<select>
element (drop-down list). The argument for this action must be an object with"selector"
specifying the selector to find a desired<select>
element and"options"
(string
orarray
) specifying the options. Selecting multiple options is supported by using anarray
instead of astring
.Select
{ "steps": [ { "select": { "selector": "#select1", "options": "1" } } ] }
- Name
set_local_storage
- Type
- object
- Description
Sets key/value pairs in localStorage. Provided object's keys to values will be resembled set in localStorage.
Set localStorage values
{ "steps": [ { "set_local_storage": { "key1": "value1", "key2": "value2" } } ] }
- Name
scroll
- Type
- integer
- Description
Scrolls the web page vertically by a given number of pixels.
Scroll
{ "steps": [ {"scroll": 1000} ] }
- Name
wait
- Type
- integer | object
- Description
Waits for a fixed amount of time, specified in milliseconds. The argument for this action must be either a number or an object for random wait configuration. You may specify a range to randomize the time of wait. To do so, specify a config object with
min_ms
andmax_ms
values.Wait for timeout
{ "steps": [ {"wait": 1000} ] }
- Name
wait_for
- Type
- string | object
- Description
Waits for an element specified by a selector to become visible (default) or attached. The argument for this action must be a string and a valid selector or an object with
"selector"
and"state"
keys, where"state"
is one of"visible"
or"attached"
. If"state"
is set to"visible"
(default) the element you want to wait for must have non-empty bounding box (i.e. no"display: none"
) and no"visible: hidden"
. If you want to wait for an element to be present in DOM (but not necessarily visible), use"state": "attached"
.Wait for selector
{ "steps": [ {"wait_for": "#some-button"} ] }
- Name
wait_for_any
- Type
- array[string | object]
- Description
Waits for any of the specified elements to become visible (default) or attached. If you need to wait for any of the specified elements to be visible, you can use a simpler form and only provide selectors.
Wait for any
{ "steps": [ { "wait_for_any": ["#some-button", "#some-other-button"] } ] }
- Name
evaluate
- Type
- string
- Description
If the predefined actions we provide don't fit your needs and you want to evaluate custom JavaScript, this is a special action which you can use to execute arbitrary JavaScript code.
Custom JavaScript evaluation
{ "steps": [ { "evaluate": "console.log('Hello from Scraping Fish!')" } ] }
Timeout
All the steps from your JavaScript scenario must complete within single trial timeout, otherwise the request will time out.