Digital marketing professionals have long been using data to make better-informed decisions. In the world where search engines represent the massive source of data, incorporating valuable data obtained from search engine results pages has become a mainstream business practice.
A common way for businesses to extract data from Google, Bing, Yahoo, and other search engines is scraping. This process gives businesses significant commercial benefits, but it can also raise certain legal concerns. In this article, we’ll break down some key aspects of scraping Google SERPs from the legal viewpoint.
- There’re no precedents of Google suing businesses over scraping its results pages.
- Scraping of Google SERPs isn’t a violation of DMCA or CFAA.
- However, sending automated queries to Google is a violation of its ToS.
- Violation of Google ToS is not necessarily a violation of the law.
- It’s highly unlikely Google will find a technical way to block DataForSEO from collecting necessary data in the nearest future.
- Using services of DataForSEO is legal and isn’t in violation of the law.
- Our customers don’t violate Google ToS or Webmaster Guidelines by using any of DataForSEO services.
What is web scraping?
In a broad sense, web scraping is a form of data scraping used for extracting data from websites. Here’s a brief overview of what data scraping is from Wikipedia:
“Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program.”
Scrapers can access website data either through “reading” data that users see on a screen or from the underlying HTML code. Since websites are designed Because these are designed to be easily accessible by users, it is technically simple to pull data from them.
Scraping has existed for decades and is often cited as a key concept underpinning the Interner. It is widely used in different branches of the web economy. To give you an example, search engines use scraping to index content available on the web, media outlets fetch data from blogs and other news websites, and digital marketers rely on data extracted by scraping to gauge different website performance metrics.
What’s wrong with web scraping?
The reputation of web scraping has gotten significantly deteriorated over the last several years.
- First of all, scraped data is used by many businesses to gain an advantage over their competitors. Instead of making stuff up and spending lots of money in the process, why couldn’t you just scrape data, add value to it, and sell something better to your customers? To give you an example, in 2011 Bing was caught red-handed in copying Google’s search results.
- Secondly, companies that use web scraping completely ignore copyright of scraped data and Terms of Service (ToS) of resources they scrape it from. For instance, in April 2016 Getty Images filed a completion law compliant, accusing Google of scraping copyrighted content and using it in Google Images without prompting users to visit the original source website.
- Web scrapers often send much more requests per second that what humans would do, creating a huge load on scraped sites. Scraping can potentially harm critical website infrastructure (which sometimes is also called “electronic trespass”) and breach its security measures. Back in 2001, eBay won a lawsuit against Bidder’s Edge, preventing the latter from scraping data off of its pages. Bidder’s Edge was accessing eBay listings about 100,000 times a day, constituting about 1.53% of eBay’s total daily requests. Although it may seem like a relatively small number, it’s big enough to imply electronic trespassing.
Hundreds and thousands of companies and individuals leverage web scraping. Search engines, for instance, rely on it to index content on the web, what generally benefits owners of scraped websites. That, however, doesn’t mean that this technique isn’t being used in an abusive manner and won’t create any legal issues for users of scrapers. In the following paragraphs, we’ll discuss the most common legal issues of scraping and try to figure out whether scraping search engine (Google’s, in particular) results pages is legal.
Is web scraping legal?
Using web scraping or developing a scraper in itself isn’t at all illegal. For example, you may want to scrape content on your own website. However, problems are likely to arise if you try scraping the content of somebody else’s website without permission. All that boils down to breaking the source’s ToS, violating copyright legislation (i.e., DMCA – Digital Millenium Copyright Act) and CFAA (Computer Fraud and Abuse Act).
1. Web scraping and CFAA
The Computer Fraud and Abuse Act (CFAA) was introduced back in 1986 as an anti-hacking measure that forbids unauthorized access to computers, which imposes a criminal penalty on “a party who intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains information from any protected computer. ” CFAA-based claims are popular among data hosts because they provide for pressing criminal charges against scrapers.
In 2016 LinkedIn started sending cease-and-decease letters to hiQ Labs, the startup that was relying on data scraped from the popular social network to provide analysis of publicly available user profiles. Letters contained warnings that unauthorized access to a LinkedIn website was the violation of CFAA. Instead of ceasing its operations, the team of hiQ took LinkedIn’s accusations directly to court, claiming that the automated access of publicly available data isn’t a violation of the CFAA, while also asking the court to prohibit LinkedIn from “preventing hiQ’s access, copying, or use of public profiles on LinkedIn’s website (i.e., information that LinkedIn members have designated public).”
The court decided in favor of hiQ, allowing the company to scrape LinkIn’s public, non-password protected data. In this article we won’t dig deeply into the ruling – you can view the full text here. It’s sufficient to note that although scraping in many cases breaks ToS of the scraped website, it’s not necessarily the violation of the Computer Fraud and Abuse Act.
2. Web Scraping and DMCA
Copyright claims are often brought by data hosts against scrapers. In the United States, copyrighted work is protected by the Digital Millenium Copyright Act (DMCA).
Nonetheless, it’s widely known that facts alone can’t be copyrighted, so DMCA and similar legislation won’t protect data hosts against scrapers unless they have full control over the copyright of the stored content. The point is that the transfer of copyright ownership generally requires a written agreement signed by the copyright owner. On the other hand, however, it’s true that “the creative selection, coordination and arrangement of information and materials forming a database or compilation may be protected by copyright.” – cendi.gov. However, this protection doesn’t extend to the facts stored in the database. Put simply, copyright is meant to protect originality and creativity, not facts.
Let’s take a Google results page as an example. Its design and layout may be regarded as creative work and hence can be copyrighted. At the same time, facts contained in a results page, including headlines and snippets, are not owned by Google; in fact, they are being pulled from other people’s websites without transferring copyright ownership.
3. Web Scraping and ToS
Nowadays almost every well-established website has a Terms of Service (ToS) page, which – in most cases – prohibits scraping, and crawling their content. Google isn’t an exception, with its ToS explicitly prohibiting “the sending of automated queries of any sort to our system without express permission in advance from Google.”
While the need to comply with the Terms of Service is widely perceived as a must, a lid on scraping appears to be an exception. There are hundreds and thousands of companies making use of scraping, including search engines, news aggregators, SEO software companies, etc. So, does it mean that all these companies are doing it illegally? And if so, are they getting sued over it?
Well, this is a rather grey area. On the one hand, by violating website’s ToS scrapers may also break the CFAA, which – as we already explained in the previous paragraphs – can result in criminal charges against scrapers. On the other hand, there have been cases in which a court dismissed CFAA violation claims and ruled that people are authorized to access publicly available information (even if they might be scraping it).
For example, in the Craiglist v. 3taps litigation, the court held 3taps liable under the CFAA, but also acknowledged that scrapers are authorized to access publicly available data for as long as their access isn’t restricted by companies that host the data. Such restrictions can include different measures, such as cease-and-decease letters, IP blocking, captchas, etc. However, all three are seldom considered as legitimate access restrictions that implicate CFAA violations.
- Cease-and-decease letters are often cited as use restrictions, as opposed to access restrictions.
- IP blocking might be a good way to block a scraper from accessing data, but masking your IP address isn’t a crime. So, it’s only logical that switching IPs when scraping websites isn’t hacking and therefore can’t be deemed a CFAA violation.
- Captchas are not the same as passwords, so bypassing them isn’t a CFAA violation as well.
The LinkedIn v. hiQ case outcomes prove that the mere violation of websites ToS might be a breach of contract, but doesn’t constitute a crime. What’s more, if we take a look at Google’s attitude towards violations of its Terms of Service, we can clearly see that the search engine has never taken any legal actions against scrapers.
Nevertheless, there are other measures Google can resort to, including revoking access to its APIs.
- In 2012 the team of Google Adwords warned Raven Tools about the possibility of revoking the SEO software’s AdWords API token if they don’t comply with Google’s anti-scraping policy. Raven Tools eventually was forced to remove its SERP Reporting feature, which was using data scraped from Google’s results pages to provide a ranking and keyword data to its customers.
- By the same token – and also in 2012 – the team of Moz lost access to Adwords API. The company eventually came up with a workaround, substituting clickstream data for that of AdWords’.
At the same time, however, what started as Google’s crackdown on SEO software developers’ access to AdWords API ended up being an inconsistent series of ToS enforcement. What’s more, to date, the search engine continues to avoid the full-scale enforcement of its anti-scraping policies.
The bottom line
Although scraping is legal by itself, it’s possible for data hosts to mount legal defenses against scrapers, including CFAA and DMCA violation claims.
On the other hand, the outcomes of recent lawsuits filed against scrapers prove that there are a lot of grey areas in current legislation on this matter, and courts may stand in favor of open access to publicly available information (see Sanding v. Sessions ruling). And even though data hosts may prevail against scrapers in courts, it’s often against their interest to sue. For example, if it weren’t for crawling public websites and scraping data from them, Google probably wouldn’t even exist.
After all, we don’t violate CFAA or DMCA when scraping publicly available information from Google SERPs. You can rest assured: using DataForSEO services is legal, it’s not the violation of the law. By the same token, you can’t break Google ToS by simply getting data through our APIs.
Disclaimer: This article does not constitute legal advice. You should seek the counsel of an attorney on your specific matter to comply with the laws in your jurisdiction.