How IPs for web scraping are sourced

Real life Web Scraping requires IP addresses. They can be obtained by different means, some of which are certainly not ethical. Unfortunately, a large number of proxy providers and web scraping APIs source their IPs in unethical ways.

Below we'll show you how IPs may be sourced and why those approaches are unethical.

Why is it important?

Web Scraping is an industry by itself. People and businesses around the world make a living out of it by creating real value and contributing to the society. We find it important to keep it professional and respectful to the Internet's worldwide community. Ignoring the fact that a lot of IPs in this field are used without knowledge and consent of the people contributes to the bad image of Web Scraping. Multitude of proxy and Web Scraping API providers do not care where they get their IPs and benefit from it.

How do we know about these practices?

This is not a secret nor restricted knowledge. Part of the issue is that everyone knows about it but almost no one cares to oppose it. One of the credible sources on the topic is research of Xianghang Mi and his colleagues. We refer to their research extensively and are thankful for their contribution to the science community.

Compromised IoT Devices

In the article Resident Evil: Understanding Residential IP Proxy as a Dark Service, the researchers checked residential proxy providers and analyzed over 6 million IP addresses. According to their study, some of the IPs were collected from compromised network or IoT devices such as routers or webcams.

These IPs are not obtained with consent of the user as providers utilizing them do not provide their software for such devices.

Mobile SDKs, Desktop SDKs, Browser Extensions

Based on the article Your Phone is My Proxy: Detecting and Understanding Mobile Proxy Networks, some of other mobile proxy providers use mobile SDKs to gather their IPs in an unethical way. They approach mobile apps developers (e.g. by email) and offer them a payment in exchange for integrating their SDKs. An example amount is $500 per 10,000 users per month which might be a lucrative opportunity for the owner of the app. After the integration of the SDK, the developer doesn't have control over how the proxy provider uses their users' bandwidth. In theory, users of the app agree to be part of such a proxy pool as it is stated in the Terms of Service. In practice, however, they often don't do so knowingly.

A similar story can be told about browser extensions. Browser extensions developers are emailed by proxy providers, similarly to mobile apps developers, offering money in exchange for users' bandwidth.

We find a “consent“ hidden inside the ToS without more vocal information to the user that their bandwidth is used as part of a proxy pool to be an act of treating users without respect.

Freemium, P2P VPNs

It is, again, a similar approach to getting IPs preying on uninformed users. Some providers ship a free VPN service. In exchange, users of such a service, allow some of their bandwidth to be used by the provider. Compared to other VPN services they usually provide poor quality, e.g. no encryption, all requests are tracked and being logged etc.

Similarly to previous ways, in theory, people accept that they provide bandwidth to the pool. In practice, however, they are not properly informed about the fact and what it means.

What About Web Scraping APIs?

Web Scraping APIs are a thin layer over the proxies. Building a Web Scraping API is much easier than building a large, international proxy pool. However, under the hood these APIs usually use the same proxy providers, effectively supporting and reinforcing bad, unethical practices. This is further reflected in their pricing schemas which are direct consequences of their underlying proxy providers of questionable ethics.

At Scraping Fish, we have built our own proxy pool, without any shady tactics.

We do not rely on any proxy provider. We find this the only way to ensure you get ethically sourced IPs.

This also allows us to offer you the transparent pricing.

Give Us a Try

Don't reinforce bad behavior in Web Scraping. Use trusted, ethically sourced proxy with our Web Scraping API.