Web Scraping Benchmark

We developed a benchmark to test selected Web Scraping APIs. It involves scraping various web pages that are commonly targeted in web scraping workflows. The results let us evaluate Web Scraping APIs in terms of reliability, proxy quality, speed and cost.

Python script which we used to run the benchmark is publicly available in a GitHub repository. It can also be used to run a scraping job with Scraping Fish API by providing an input file with a list of URLs to scrape.

Metodology overview

The benchmark includes URLs from 5 categories:

Alexa: URLs from the top 1,000 Alexa rank
Amazon: Amazon product URLs
Google: Google search queries
Instagram: the top 10 Instagram profiles (as of 2022)
Similarweb: websites from the similarweb ranking (excluding adult and russian websites)

For each category, we made 1,000 requests and recorded:

Successful requests
Failed requests
Blocked requests
Average requests processing time (seconds / URL)
Cost of running the benchmark (1000 requests)

More details on methodology and instruction to reproduce the results are provided in the GitHub repository.

Results

Scraping Fish

Test	Successful	Failed	Blocked	Processing time	Cost
Alexa	99.9%	0.1%	0%	2.63	$2
Amazon	100%	0%	0%	3.37	$2
Google	100%	0%	0%	1.63	$2
Instagram	99.9%	0.1%	0%	1.9	$2
Similarweb	100%	0%	0%	2.50	$2
Total	99.96%	0.04%	0%	2.4	$10

$0.002 per each successfully scraped URL. The highest overall success rate and the best processing time.

Other web scraping APIs

Scraping Ant

Benchmarks were run using --api “https://api.scrapingant.com/v1/general/?proxy_type=residential&“ parameter and the code was adjusted to pass API key as a header instead of query parameter.

Test	Successful	Failed	Blocked	Processing time	Cost
Alexa	100%	0%	0%	6.92	$19
Amazon	98%	2%	0%	9.84	$19
Google	95%	5%	0%	13.8	$19
Instagram	99.5%	0.5%	0%	6.76	$19
Similarweb	96%	4%	0%	7.40	$19
Total	97.7%	2.3%	0%	8.94	$49

$49 Startup subscription required to scrape 5,000 URLs in total (each consuming 50 or 250 API credits) and using 5 concurrent connections.

ScrapingBee

Benchmarks were run using --api “https://app.scrapingbee.com/api/v1/?premium_proxy=true&“ and custom_google parameter set to true for Google benchmark

Test	Successful	Failed	Blocked	Processing time	Cost
Alexa	81%	18%	1%	4.86	$99
Amazon	99%	1%	0%	11.48	$99
Google	100%	0%	0%	3.74	$99
Instagram	99%	1%	0%	18.52	$59
Similarweb	90%	8%	2%	4.70	$99
Total	93.8%	5.6%	0.6%	8.66	$99

$99 Startup subscription required to scrape 5,000 URLs in total (each consuming 10, 20, or 25 API credits) and using 5 concurrent connections.

ScraperAPI

Benchmarks were run using --api “http://api.scraperapi.com/?premium=true“ parameter.

Test	Successful	Failed	Blocked	Processing time	Cost
Alexa	95.5%	4.5%	0%	7.19	$49
Amazon	96%	4%	0%	10.97	$49
Google	100%	0%	0%	4.5	$49
Instagram*	0%	100%	0%	0	-
Similarweb	90%	8%	2%	4.70	$49
Total	76.3%	23.3%	0.4%	6.84	$49

* Scraping Instagram is not allowed and returns 403 status code.

$49 Hobby subscription required to scrape 5,000 URLs in total (each consuming 10 or 25 API credits) and using 5 concurrent connections.

Conclusions

Scraping Fish 🐟 achieved the highest total success rate of 99.96% with the best average processing time of 3.23 seconds/URL. Moreover, thanks to Scraping Fish API simple and transparent pricing, the total cost of running the benchmark was 5-10 times smaller compared to other tested APIs.

Try Scraping Fish API

To run the scraping script for your use case, you can get a starter pack of 1,000 API requests for only $2.

Try it for just $2