JS Scenario
In this guide, we will look at how to use JS Scenario to perform activities on the scraped website.
Scraping Fish API allows you to specify a series of steps to execute once the page is loaded. You can use it, for example, to click a button or fill in a form. Steps to perform are passed as JSON in js_scenario
query parameter.
Remember to encode this parameter like in the examples below.
Example
To give you an idea on how you can use this feature, let's see an example scenario which, once the page is loaded, waits for 1 second (1000 ms), clicks on the item selected by p > a
CSS selector and waits for navigation to complete.
- Name
steps
- Type
- array
- Description
An array of objects which define action to execute. Steps are executed in a sequence. In this example we use:
wait
- waits for a given amount of milliseconds.click_and_wait_for_navigation
- clicks an element specified by the given selector and waits for navigation to complete.
- Name
url
- Type
- string
- Description
URL to navigate to. This is a standard parameter required even if you don't perform any actions.
- Name
api_key
- Type
- string
- Description
Your Scraping Fish API key. Required to authenticate your requests.
Execute JS Scenario
import requests
import json
payload = {
"api_key": "[your API key]",
"url": "https://example.com",
"js_scenario": json.dumps({
"steps": [
{"click_and_wait_for_navigation": "p > a"}
]
})
}
response = requests.get("https://scraping.narf.ai/api/v1/", params=payload)
Response
<!doctype html>
<html>
<head>
<title>Example Domains</title>
…
</head>
<body>
…
</body>
</html>
Steps
steps
is a list of objects each of which defines an action to execute in a JS scenario. Each object's only key is a name of the action to perform and the value is its argument. For example:
{
"steps": [
{
"wait_for": "#button-id"
},
{
"select": {
"selector": "#select-id",
"options": "value1"
}
},
{
"click": "#button-id"
}
]
}
Execution of this scenario will start with waiting until #button-id
element is available, select value1
option from the select element (drop-down list) with #select-id
id and then click the button.
In the following section, we provide all available predefined actions which you can use as steps in a JS scenario.
If you need to execute custom JavaScript code, use evaluate
action.
Available actions
- Name
click
- Type
- string
- Description
- Clicks an element specified by a selector.
- Name
click_if_exists
- Type
- string
- Description
- Clicks an element specified by a selector but only if the element exists and skips this step otherwise. It can be useful if you want to close a cookie banner or other model window which does not appear every time.
- Name
click_and_wait_for_navigation
- Type
- string
- Description
- Clicks an element specified by a selector and waits for the navigation to complete.
- Name
input
- Type
- object
- Description
- Fills in given values to the input elements specified by selectors. It's an object mapping selectors to desired input values. If the order of filling in the inputs matters in your use case, you should specify each input field as a separate input action. You can optionally specify an option to "humanize" an input action. If set, actual key press events are sent. It may only be necessary if keyboard events are handled differently than usual input.
- Name
select
- Type
- object
- Description
- Selects option(s) from a given
<select>
element (drop-down list). The argument for this action must be an object with"selector"
specifying the selector to find a desired<select>
element and"options"
(string
orarray
) specifying the options. Selecting multiple options is supported by using anarray
instead of astring
.
- Name
set_local_storage
- Type
- object
- Description
- Sets key/value pairs in localStorage. Provided object's keys to values will be resembled set in localStorage.
- Name
scroll
- Type
- integer
- Description
- Scrolls the web page vertically by a given number of pixels.
- Name
wait
- Type
- integer | object
- Description
- Waits for a fixed amount of time, specified in milliseconds. The argument for this action must be either a number or an object for random wait configuration. You may specify a range to randomize the time of wait. To do so, specify a config object with
min_ms
andmax_ms
values.
- Name
wait_for
- Type
- string | object
- Description
- Waits for an element specified by a selector to become visible (default) or attached. The argument for this action must be a string and a valid selector or an object with
"selector"
and"state"
keys, where"state"
is one of"visible"
or"attached"
. If"state"
is set to"visible"
(default) the element you want to wait for must have non-empty bounding box (i.e. no"display: none"
) and no"visible: hidden"
. If you want to wait for an element to be present in DOM (but not necessarily visible), use"state": "attached"
.
- Name
wait_for_any
- Type
- array[string | object]
- Description
- Waits for any of the specified elements to become visible (default) or attached. If you need to wait for any of the specified elements to be visible, you can use a simpler form and only provide selectors.
- Name
evaluate
- Type
- string
- Description
- If the predefined actions we provide don't fit your needs and you want to evaluate custom JavaScript, this is a special action which you can use to execute arbitrary JavaScript code.
Click
{
"steps": [
{"click": "#a-button"}
]
}
Click if exists
{
"steps": [
{"click_if_exists": "#a-button"}
]
}
Click and wait for navigation
{
"steps": [
{"click_and_wait_for_navigation": "#a-button"}
]
}
Input
{
"steps": [
{
"input": {
"#input1": "value1",
"#input2": "value2"
}
}
]
}
Select
{
"steps": [
{
"select": {
"selector": "#select1",
"options": "1"
}
}
]
}
Set localStorage values
{
"steps": [
{
"set_local_storage": {
"key1": "value1",
"key2": "value2"
}
}
]
}
Scroll
{
"steps": [
{"scroll": 1000}
]
}
Wait for timeout
{
"steps": [
{"wait": 1000}
]
}
Wait for selector
{
"steps": [
{"wait_for": "#some-button"}
]
}
Wait for any
{
"steps": [
{
"wait_for_any": ["#some-button", "#some-other-button"]
}
]
}
Custom JavaScript evaluation
{
"steps": [
{
"evaluate": "console.log('Hello from Scraping Fish!')"
}
]
}
Timeout
All the steps from your JavaScript scenario must complete within single trial timeout, otherwise the request will time out.