Skip to main content

Mateusz Buda
Paweł Kobojek

Scraping Walmart

In previous posts, we've already covered scraping publicly available data from Instagram and Airbnb. This time, we'll show you how to scrape walmart.com to collect data about products by category. Walmart is a rich source of data containing:

  • product details (including category and description) 📝
  • price 💰
  • features (e.g. nutrition facts for food) 🥦
  • availability 🛒
  • reviews ⭐️

As always, we share the code in GitHub repository to let you play with it and apply to your use case. To be able to run it and actually scrape the data, you need Scraping Fish API key which you can get here: Scraping Fish Requests Packs. A starter pack of 1,000 API requests costs only $2. Without Scraping Fish API you are likely to see captcha instead of useful product information.

Scraping use case

As a 🏃‍♂️ running example, we'll identify products from 🍔 food categories and scrape nutrition facts data for them. Based on collected data for over 15,000 products, we'll find out:

  • What is the share of products having sugar as the main nutrient?
  • Is there any relation between product rating and its nutrients?

Mateusz Buda
Paweł Kobojek

Build Your Own Mobile Proxy for Web Scraping

In this guide, we show how you can build your own mobile proxy pool step by step. The most common use case for mobile proxies is web scraping. If you have low success rate and keep getting blocked by websites you want to scrape, this guide is for you.

info

This guide is only applicable for a small, home-scale mobile proxy setup and does not cover some advanced intricacies of running mobile proxies, recovery from various modem failures, rotating proxies, etc.

If you need access to a reliable production-grade mobile proxy pool for web scraping, consider using our product.

What is a mobile proxy

One of the most important factors affecting success rate of web scraping is proxy quality. There are three main types of proxies:

  • Datacenter: offer large pool of cheap IP addresses belonging to datacenters and cloud server providers that are often blacklisted and usually not suitable for web scraping
  • Residential: provide IP addresses from Internet Service Providers (ISP) pool that are shared with other users
  • Mobile: the best class of proxies for web scraping that is based on ephemeral IP addresses which are frequently exchanged with mobile network users who move between Base Transceiver Stations (BTS)

Mobile proxies are the most expensive ones, but it can pay off to build your own pool by following this guide. We will show you how to change the IP address on demand so that you can generate thousands of IP addresses daily.

Paweł Kobojek
Mateusz Buda

Indie Hackers is a wonderful community for founders, a vast ocean of shared knowledge and a place to showcase your product journey. A lot of people are sharing their products there along with the revenue, often verified by Stripe. We analyzed the available data to check how much revenue are those products making and what are the most money making categories!

At the time of our analysis, there were total of 937 products with Stripe verified revenue. The information we gathered to make this analysis are revenue and product's categories.

Paweł Kobojek
Mateusz Buda

Are Most Rust Jobs In Crypto?

Rust is a relatively new programming language without established and highly developed job market yet. Being developer's the most loved language though, many people would like to have a full-time job using it. There's a prevalent feeling among Rustaceans that most of the Rust jobs out there involve blockchain tech. Indeed, anyone who was ever seeking for the job in Rust surely stumbled upon a cryptocurrency job offer. Is it really true that most offers are in crypto? We have scraped job ads to find the answer!

Paweł Kobojek
Mateusz Buda

Execute JavaScript steps on scraped website

Today, we are pleased to introduce the much awaited feature - JavaScript scenario execution! A lot of our customers were asking for the possibility of interacting with the scraped website - clicking buttons, filling out forms, selecting <select> options etc. This is now possible with Scraping Fish API without compromising our commitment to keep our product usability as simple as possible.

Why do I need it?

In certain web scraping situations you might want to not only load a website but also, e.g., select an option, wait for data to be loaded and click a button which is only enabled after performing some action. Or maybe you need to fill out a form before the data you desire is available. In all these cases JavaScript scenario execution is your friend.

Paweł Kobojek
Mateusz Buda

San Francisco Bay Area, due to it's tech business importance as well as numerous iconic landmarks is a common destination for Airbnb guests. Have you ever wondered how does the hosting landscape look like there? What is the price for a stay you should expect to pay (or charge as a host)? These questions can be answered using web scraping and basic data analysis skills. We will show you how to effortlessly scrape Airbnb even though it's using JavaScript to render its content and showcase some basic data exploration.

info

It's important to point out that we are scraping Airbnb as of May 2022. If Airbnb changes something on their website that we rely on, the code in this post may no longer work and will have to be adjusted. If you experience any problem, feel free to open an issue on GitHub and we will investigate it.

How to get the data? Web scraping to the rescue!

To the best of our knowledge, Airbnb by itself doesn't publish its offers in an easily digestible form (e.g. a csv file or an API). However, Airbnb is a publicly available website and you can query and browse offers without logging in. With the help of Scraping Fish API you don't need to be an expert in web scraping to easily get the data about Bay Area's Airbnb apartments and use it to gain the desired insights.

Mateusz Buda
Paweł Kobojek

This blog post is a comprehensive tutorial for scraping public Instagram profile information and posts using Scraping Fish API. We will be scraping posts from a profile that lists old houses for sale to find the best deal.

We prepared accompanying python notebook shared on GitHub repository: instagram-scraping-fish. To be able to run it and actually scrape the data, you will need Scraping Fish API key which you can get here: Scraping Fish Requests Packs. A starter pack of 1,000 API requests costs only $2 and will let you run this tutorial and play with the API on your own ⛹️. Without Scraping Fish API key you are likely to get blocked instantly ⛔️.

info

It’s important to point out that we are using Instagram private (undocumented) API for scraping and the code we share works as of August 2022. If Instagram changes something in their API that we rely on, this tutorial may no longer work and will have to be adjusted. If you experience any problem, feel free to open an issue on GitHub and we will investigate it.

Scraping use case

As an example to test Scraping Fish capabilities to scrape Instagram we will fetch and parse data from posts shared by a public profile Stare domy 🏚 (Old Houses). It is an aggregate listing of old houses for sale in Poland. Post descriptions in this profile provide fairly structured data about the property, including location, price, size, etc.