We developed a benchmark to test selected Web Scraping APIs. It involves scraping various web pages that are commonly targeted in web scraping workflows. The results let us evaluate Web Scraping APIs in terms of reliability, proxy quality, speed and cost.
Python script which we used to run the benchmark is publicly available in a GitHub repository. It can also be used to run a scraping job with Scraping Fish API by providing an input file with a list of URLs to scrape.
The benchmark includes URLs from 5 categories:
For each category, we made 1,000 requests and recorded:
More details on methodology and instruction to reproduce the results are provided in the GitHub repository.
Test | Successful | Failed | Blocked | Processing time | Cost |
---|---|---|---|---|---|
Alexa | 99.9% | 0.1% | 0% | 2.63 | $2 |
Amazon | 100% | 0% | 0% | 3.37 | $2 |
100% | 0% | 0% | 1.63 | $2 | |
99.9% | 0.1% | 0% | 1.9 | $2 | |
Similarweb | 100% | 0% | 0% | 2.50 | $2 |
Total | 99.96% | 0.04% | 0% | 2.4 | $10 |
$0.002 per each successfully scraped URL. The highest overall success rate and the best processing time.
Benchmarks were run using --api “https://api.scrapingant.com/v1/general/?proxy_type=residential&“
parameter and the code was adjusted to pass API key as a header instead of query parameter.
Test | Successful | Failed | Blocked | Processing time | Cost |
---|---|---|---|---|---|
Alexa | 100% | 0% | 0% | 6.92 | $19 |
Amazon | 98% | 2% | 0% | 9.84 | $19 |
95% | 5% | 0% | 13.8 | $19 | |
99.5% | 0.5% | 0% | 6.76 | $19 | |
Similarweb | 96% | 4% | 0% | 7.40 | $19 |
Total | 97.7% | 2.3% | 0% | 8.94 | $49 |
$49 Startup subscription required to scrape 5,000 URLs in total (each consuming 50 or 250 API credits) and using 5 concurrent connections.
Benchmarks were run using --api “https://app.scrapingbee.com/api/v1/?premium_proxy=true&“ and custom_google
parameter set to true for Google benchmark
Test | Successful | Failed | Blocked | Processing time | Cost |
---|---|---|---|---|---|
Alexa | 81% | 18% | 1% | 4.86 | $99 |
Amazon | 99% | 1% | 0% | 11.48 | $99 |
100% | 0% | 0% | 3.74 | $99 | |
99% | 1% | 0% | 18.52 | $59 | |
Similarweb | 90% | 8% | 2% | 4.70 | $99 |
Total | 93.8% | 5.6% | 0.6% | 8.66 | $99 |
$99 Startup subscription required to scrape 5,000 URLs in total (each consuming 10, 20, or 25 API credits) and using 5 concurrent connections.
Benchmarks were run using --api “http://api.scraperapi.com/?premium=true“
parameter.
Test | Successful | Failed | Blocked | Processing time | Cost |
---|---|---|---|---|---|
Alexa | 95.5% | 4.5% | 0% | 7.19 | $49 |
Amazon | 96% | 4% | 0% | 10.97 | $49 |
100% | 0% | 0% | 4.5 | $49 | |
Instagram* | 0% | 100% | 0% | 0 | - |
Similarweb | 90% | 8% | 2% | 4.70 | $49 |
Total | 76.3% | 23.3% | 0.4% | 6.84 | $49 |
* Scraping Instagram is not allowed and returns 403 status code.
$49 Hobby subscription required to scrape 5,000 URLs in total (each consuming 10 or 25 API credits) and using 5 concurrent connections.
Scraping Fish 🐟 achieved the highest total success rate of 99.96% with the best average processing time of 3.23 seconds/URL. Moreover, thanks to Scraping Fish API simple and transparent pricing, the total cost of running the benchmark was 5-10 times smaller compared to other tested APIs.
To run the scraping script for your use case, you can get a starter pack of 1,000 API requests for only $2.