Skip to main content

Mateusz Buda
Paweł Kobojek

Scraping Fish Product Updates 2024

Greetings, web scraping enthusiasts! At Scraping Fish, we've been working tirelessly behind the scenes, and we're thrilled to announce some major enhancements to our platform. These updates are not just incremental improvements, they represent a significant leap forward in making web scraping more efficient, user-friendly, and adaptable to your unique needs. Whether you're a long-time user or considering Scraping Fish for your web scraping tasks, these updates are designed to streamline your experience and expand your capabilities.

Mateusz Buda
Paweł Kobojek
Jordan Hansen

State of Web Scraping 2023 Survey Results

Welcome to the inaugural 2023 Web Scraping Survey!

This survey is our attempt to understand the evolving landscape of web scraping, capturing insights from a diverse pool of developers, data scientists, and hobbyists. We've collected responses on a broad range of topics, from technical skills and tool preferences to the ethical considerations and financial aspects of web scraping.

In the spirit of openness and community learning, we are excited to share the raw data containing all responses. For those interested in a deeper dive, you can access the data at this link.

Our goal with this survey is to paint a detailed picture of the current state of web scraping, the prevailing trends, practices, challenges, and opportunities associated with it. The rich insights garnered from these responses should help inform and stimulate valuable discussions around the future of web scraping. Now, let's dive into the findings and see what the data has to say!

Mateusz Buda
Paweł Kobojek

State of Web Scraping 2023 Survey

Hello everyone,

We are excited to announce our latest initiative: the 'State of Web Scraping 2023' survey. As a part of our ongoing efforts to better understand and contribute to the rapidly evolving landscape of web scraping, we are reaching out to everyone in the community - web scrapers, software developers, business owners, freelancers, and more, to participate in this comprehensive survey.

We understand that your time is valuable. That's why, as a token of our appreciation, all participants will receive a 50% discount on Scraping Fish, our robust web scraping API designed to simplify and streamline your web scraping tasks.

Mateusz Buda
Paweł Kobojek

Scraping Google SERP with geolocation

Google Search Engine Result Pages (SERP) 🔍 are an important source of information for many businesses. SERP includes organic search results, ads, products, places, knowledge graphs, etc. For example, by scraping Google SERP you can learn which brands have paid ads for specific keywords or analyze how your website is positioned.

One of the most important factors affecting what the user can see on a keyword's SERP is geolocation 🗺. We'll show you how to scrape Google SERP for a keyword from any location, regardless of the origin of your IP address. This can be done simply by setting search URL query parameters. It's also possible to parametrize and automate scraping 🤖 using a python script we share below.

Mateusz Buda
Paweł Kobojek

Scraping Walmart

In previous posts, we've already covered scraping publicly available data from Instagram and Airbnb. This time, we'll show you how to scrape walmart.com to collect data about products by category. Walmart is a rich source of data containing:

  • product details (including category and description) 📝
  • price 💰
  • features (e.g. nutrition facts for food) 🥦
  • availability 🛒
  • reviews ⭐️

As always, we share the code in GitHub repository to let you play with it and apply to your use case. To be able to run it and actually scrape the data, you need Scraping Fish API key which you can get here: Scraping Fish Requests Packs. A starter pack of 1,000 API requests costs only $2. Without Scraping Fish API you are likely to see captcha instead of useful product information.

Scraping use case

As a 🏃‍♂️ running example, we'll identify products from 🍔 food categories and scrape nutrition facts data for them. Based on collected data for over 15,000 products, we'll find out:

  • What is the share of products having sugar as the main nutrient?
  • Is there any relation between product rating and its nutrients?

Mateusz Buda
Paweł Kobojek

Build Your Own Mobile Proxy for Web Scraping

In this guide, we show how you can build your own mobile proxy pool step by step. The most common use case for mobile proxies is web scraping. If you have low success rate and keep getting blocked by websites you want to scrape, this guide is for you.

info

This guide is only applicable for a small, home-scale mobile proxy setup and does not cover some advanced intricacies of running mobile proxies, recovery from various modem failures, rotating proxies, etc.

If you need access to a reliable production-grade mobile proxy pool for web scraping, consider using our product.

What is a mobile proxy

One of the most important factors affecting success rate of web scraping is proxy quality. There are three main types of proxies:

  • Datacenter: offer large pool of cheap IP addresses belonging to datacenters and cloud server providers that are often blacklisted and usually not suitable for web scraping
  • Residential: provide IP addresses from Internet Service Providers (ISP) pool that are shared with other users
  • Mobile: the best class of proxies for web scraping that is based on ephemeral IP addresses which are frequently exchanged with mobile network users who move between Base Transceiver Stations (BTS)

Mobile proxies are the most expensive ones, but it can pay off to build your own pool by following this guide. We will show you how to change the IP address on demand so that you can generate thousands of IP addresses daily.

Paweł Kobojek
Mateusz Buda

Indie Hackers is a wonderful community for founders, a vast ocean of shared knowledge and a place to showcase your product journey. A lot of people are sharing their products there along with the revenue, often verified by Stripe. We analyzed the available data to check how much revenue are those products making and what are the most money making categories!

At the time of our analysis, there were total of 937 products with Stripe verified revenue. The information we gathered to make this analysis are revenue and product's categories.

Paweł Kobojek
Mateusz Buda

Are Most Rust Jobs In Crypto?

Rust is a relatively new programming language without established and highly developed job market yet. Being developer's the most loved language though, many people would like to have a full-time job using it. There's a prevalent feeling among Rustaceans that most of the Rust jobs out there involve blockchain tech. Indeed, anyone who was ever seeking for the job in Rust surely stumbled upon a cryptocurrency job offer. Is it really true that most offers are in crypto? We have scraped job ads to find the answer!

Paweł Kobojek
Mateusz Buda

Execute JavaScript steps on scraped website

Today, we are pleased to introduce the much awaited feature - JavaScript scenario execution! A lot of our customers were asking for the possibility of interacting with the scraped website - clicking buttons, filling out forms, selecting <select> options etc. This is now possible with Scraping Fish API without compromising our commitment to keep our product usability as simple as possible.

Why do I need it?

In certain web scraping situations you might want to not only load a website but also, e.g., select an option, wait for data to be loaded and click a button which is only enabled after performing some action. Or maybe you need to fill out a form before the data you desire is available. In all these cases JavaScript scenario execution is your friend.

Paweł Kobojek
Mateusz Buda

San Francisco Bay Area, due to it's tech business importance as well as numerous iconic landmarks is a common destination for Airbnb guests. Have you ever wondered how does the hosting landscape look like there? What is the price for a stay you should expect to pay (or charge as a host)? These questions can be answered using web scraping and basic data analysis skills. We will show you how to effortlessly scrape Airbnb even though it's using JavaScript to render its content and showcase some basic data exploration.

info

It's important to point out that we are scraping Airbnb as of May 2022. If Airbnb changes something on their website that we rely on, the code in this post may no longer work and will have to be adjusted. If you experience any problem, feel free to open an issue on GitHub and we will investigate it.

How to get the data? Web scraping to the rescue!

To the best of our knowledge, Airbnb by itself doesn't publish its offers in an easily digestible form (e.g. a csv file or an API). However, Airbnb is a publicly available website and you can query and browse offers without logging in. With the help of Scraping Fish API you don't need to be an expert in web scraping to easily get the data about Bay Area's Airbnb apartments and use it to gain the desired insights.