Skip to main content

Web Scraping APIs benchmark

We developed a benchmark to test selected Web Scraping APIs. It involves scraping various web pages that are commonly targeted in web scraping workflows. The results let us evaluate Web Scraping APIs in terms of reliability, proxy quality, speed and cost.

note

Python script which we used to run the benchmark is publicly available in a GitHub repository. It can also be used to run a scraping job with Scraping Fish API by providing an input file with a list of URLs to scrape.

Methodolgy overview

The benchmark includes URLs from 5 categories:

  1. Alexa: URLs from the top 1,000 Alexa rank
  2. Amazon: Amazon product URLs
  3. Google: Google search queries
  4. Instagram: the top 10 Instagram profiles (as of 2022)
  5. Similarweb: websites from the similarweb ranking (excluding adult and russian websites)

For each category, we made 1,000 requests and recorded:

  • ✅ successful URLs
  • ❌ failed URLs
  • ⛔️ blocked URLs
  • ⏱ average URL processing time (seconds/URL)
  • 💰 cost of running the benchmark (1000 requests)
info

More details on methodology and instruction to reproduce the results are provided in a GitHub repository.

Results

Scraping Fish 🐟

Test✅ Successful❌ Failed⛔️ Blocked⏱ Processing time💰 Cost
Alexa99.9%0.1%0%2.63$2
Amazon100.0%0%0%3.37$2
Google100.0%0%0%1.63$2
Instagram99.9%0.1%0%1.9$2
Similarweb100.0%0%0%2.50$2
Total99.96%0.04%0.0%2.4$10

📝 $0.002 per each successfully scraped URL. The highest overall success rate and the best processing time.

Other Web Scraping APIs

ScrapingAnt

Benchmarks run using --api "https://api.scrapingant.com/v1/general/?proxy_type=residential&" parameter and adjusted code to pass API key as a header instead of query parameter.

Test✅ Successful❌ Failed⛔️ Blocked⏱ Processing time💰 Cost
Alexa100.0%0%0%6.92$19
Amazon98.0%2.0%0%9.84$19
Google95.0%5.0%0%13.80$19
Instagram99.5%0.5%0%6.76$19
Similarweb96.0%4.0%0%7.40$19
Total97.7%2.3%0.0%8.94$49

📝 $49 Startup subscription required to scrape 5,000 URLs in total (each consuming 50 or 250 API credits) and using 5 concurrent connections.

ScrapingBee

Benchmarks run using --api "https://app.scrapingbee.com/api/v1/?premium_proxy=true&" and custom_google parameter set to true for Google benchmark.

Test✅ Successful❌ Failed⛔️ Blocked⏱ Processing time💰 Cost
Alexa81.0%18.0%1.0%4.86$99
Amazon99.0%1.0%0%11.48$99
Google100.0%0%0%3.74$99
Instagram99.0%1.0%0%18.52$99
Similarweb90.0%8.0%2.0%4.70$99
Total93.8%5.6%0.6%8.66$99

📝 $99 Startup subscription required to scrape 5,000 URLs in total (each consuming 10, 20, or 25 API credits) and using 5 concurrent connections.

ScraperAPI

Benchmarks run using --api "http://api.scraperapi.com/?premium=true&" parameter.

Test✅ Successful❌ Failed⛔️ Blocked⏱ Processing time💰 Cost
Alexa95.5%4.5%0%7.19$49
Amazon96.0%4.0%0%10.97$49
Google100.0%0%0%4.50$49
Instagram*0.0%100.0%0%--
Similarweb90.0%8.0%2.0%4.70$49
Total76.3%23.3%0.4%6.84$49

* Scraping Instagram is not allowed and returns 403 status code.

📝 $49 Hobby subscription required to scrape 5,000 URLs in total (each consuming 10 or 25 API credits) and using 5 concurrent connections.

Conclusions

Scraping Fish 🐟 achieved the highest total success rate of 99.96% with the best average processing time of 3.23 seconds/URL. Moreover, thanks to Scraping Fish API simple and transparent pricing, the total cost of running the benchmark was 5-10 times smaller compared to other tested APIs.

Try Scraping Fish API

To run the scraping script for your use case, you can get a starter pack of 1,000 API requests for only $2.

Try Scraping Fish