Is Scraping Google SERPs Legal?
Digital marketing professionals have long been using data to make better-informed decisions. In the world where search engines represent the massive source of data, incorporating valuable data obtained from search engine results pages has become a mainstream business practice.
A common way for businesses to extract data from Google, Bing, Yahoo, and other search engines is scraping. This process gives businesses significant commercial benefits, but it can also raise certain legal concerns. In this article, weโll break down some key aspects of scraping Google SERPs from the legal viewpoint.
TL;DR
- Thereโre no precedents of Google suing businesses over scraping its results pages.
- Scraping of Google SERPs isnโt a violation of DMCA or CFAA.
- However, sending automated queries to Google is a violation of its ToS.
- Violation of Google ToS is not necessarily a violation of the law.
- Itโs highly unlikely Google will find a technical way to block DataForSEO from collecting necessary data in the nearest future.
- Using services of DataForSEO is legal and isnโt in violation of the law.
- Our customers donโt violate Google ToS or Webmaster Guidelines by using any of DataForSEO services.
What is web scraping?
In a broad sense, web scraping is a form of data scraping used for extracting data from websites. Hereโs a brief overview of what data scraping is from Wikipedia:
โData scraping is a technique in which a computer program extracts data from human-readable output coming from another program.โ
Scrapers can access website data either through โreadingโ data that users see on a screen or from the underlying HTML code. Since websites are designed to be easily accessible by users, it is technically simple to pull data from them.
Scraping has existed for decades and is often cited as a key concept underpinning the Internet. It is widely used in different branches of the web economy. To give you an example, search engines use scraping to index content available on the web, media outlets fetch data from blogs and other news websites, and digital marketers rely on data extracted by scraping to gauge different website performance metrics.
Here, at DataForSEO we use web scraping to provide SEO software developers, digital marketers, and researchers alike with up-to-date and accurate SEO data.
Whatโs wrong with web scraping?
The reputation of web scraping has gotten significantly deteriorated over the last several years.
- First of all, scraped data is used by many businesses to gain an advantage over their competitors. Instead of making stuff up and spending lots of money in the process, why couldnโt you just scrape data, add value to it, and sell something better to your customers? To give you an example, in 2011 Bing was caught red-handed in copying Googleโs search results.
- Secondly, companies that use web scraping completely ignore copyright of scraped data and Terms of Service (ToS) of resources they scrape it from. For instance, in April 2016 Getty Images filed a completion law compliant, accusing Google of scraping copyrighted content and using it in Google Images without prompting users to visit the original source website.
- Web scrapers often send much more requests per second that what humans would do, creating a huge load on scraped sites. Scraping can potentially harm critical website infrastructure (which sometimes is also called โelectronic trespassโ) and breach its security measures. Back in 2001, eBay won a lawsuit against Bidderโs Edge, preventing the latter from scraping data off of its pages. Bidderโs Edge was accessing eBay listings about 100,000 times a day, constituting about 1.53% of eBayโs total daily requests. Although it may seem like a relatively small number, itโs big enough to imply electronic trespassing.
Hundreds and thousands of companies and individuals leverage web scraping. Search engines, for instance, rely on it to index content on the web, what generally benefits owners of scraped websites. That, however, doesnโt mean that this technique isnโt being used in an abusive manner and wonโt create any legal issues for users of scrapers. In the following paragraphs, weโll discuss the most common legal issues of scraping and try to figure out whether scraping search engine (Googleโs, in particular) results pages is legal.
Is web scraping legal?
Using web scraping or developing a scraper in itself isnโt at all illegal. For example, you may want to scrape content on your own website. However, problems are likely to arise if you try scraping the content of somebody elseโs website without permission. All that boils down to breaking the sourceโs ToS, violating copyright legislation (i.e., DMCA โ Digital Millennium Copyright Act) and CFAA (Computer Fraud and Abuse Act).
1. Web scraping and CFAA
The Computer Fraud and Abuse Act (CFAA) was introduced back in 1986 as an anti-hacking measure that forbids unauthorized access to computers, which imposes a criminal penalty on โa party who intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains information from any protected computer. โ CFAA-based claims are popular among data hosts because they provide for pressing criminal charges against scrapers.
In 2016 LinkedIn started sending cease-and-decease letters to hiQ Labs, the startup that was relying on data scraped from the popular social network to provide analysis of publicly available user profiles. Letters contained warnings that unauthorized access to a LinkedIn website was the violation of CFAA. Instead of ceasing its operations, the team of hiQ took LinkedInโs accusations directly to court, claiming that the automated access of publicly available data isnโt a violation of the CFAA, while also asking the court to prohibit LinkedIn from โpreventing hiQโs access, copying, or use of public profiles on LinkedInโs website (i.e., information that LinkedIn members have designated public).โ
The court decided in favor of hiQ, allowing the company to scrape LinkedInโs public, non-password protected data. In this article we wonโt dig deeply into the ruling โ you can view the full text here. Itโs sufficient to note that although scraping in many cases breaks ToS of the scraped website, itโs not necessarily the violation of the Computer Fraud and Abuse Act.
Since information on Google SERPs is also publicly available and non-password protected, DataForSEO doesnโt break CFAA by scraping it.
2. Web Scraping and DMCA
Copyright claims are often brought by data hosts against scrapers. In the United States, copyrighted work is protected by the Digital Millennium Copyright Act (DMCA).
Nonetheless, itโs widely known that facts alone canโt be copyrighted, so DMCA and similar legislation wonโt protect data hosts against scrapers unless they have full control over the copyright of the stored content. The point is that the transfer of copyright ownership generally requires a written agreement signed by the copyright owner. On the other hand, however, itโs true that โthe creative selection, coordination and arrangement of information and materials forming a database or compilation may be protected by copyright.โ โ cendi.gov. However, this protection doesnโt extend to the facts stored in the database. Put simply, copyright is meant to protect originality and creativity, not facts.
Letโs take a Google results page as an example. Its design and layout may be regarded as creative work and hence can be copyrighted. At the same time, facts contained in a results page, including headlines and snippets, are not owned by Google; in fact, they are being pulled from other peopleโs websites without transferring copyright ownership.
Given that results on Google SERPs are not protected by copyright, itโs only logical that DataForSEO doesnโt violate DMCA by scraping them.
3. Web Scraping and ToS
Nowadays almost every well-established website has a Terms of Service (ToS) page, which โ in most cases โ prohibits scraping, and crawling their content. Google isnโt an exception, with its ToS explicitly prohibiting โthe sending of automated queries of any sort to our system without express permission in advance from Google.โ
While the need to comply with the Terms of Service is widely perceived as a must, a lid on scraping appears to be an exception. There are hundreds and thousands of companies making use of scraping, including search engines, news aggregators, SEO software companies, etc. So, does it mean that all these companies are doing it illegally? And if so, are they getting sued over it?
Well, this is a rather grey area. On the one hand, by violating websiteโs ToS scrapers may also break the CFAA, which โ as we already explained in the previous paragraphs โ can result in criminal charges against scrapers. On the other hand, there have been cases in which a court dismissed CFAA violation claims and ruled that people are authorized to access publicly available information (even if they might be scraping it).
For example, in the Craiglist v. 3taps litigation, the court held 3taps liable under the CFAA, but also acknowledged that scrapers are authorized to access publicly available data for as long as their access isnโt restricted by companies that host the data. Such restrictions can include different measures, such as cease-and-decease letters, IP blocking, captchas, etc. However, all three are seldom considered as legitimate access restrictions that implicate CFAA violations.
- Cease-and-decease letters are often cited as use restrictions, as opposed to access restrictions.
- IP blocking might be a good way to block a scraper from accessing data, but masking your IP address isnโt a crime. So, itโs only logical that switching IPs when scraping websites isnโt hacking and therefore canโt be deemed a CFAA violation.
- Captchas are not the same as passwords, so bypassing them isnโt a CFAA violation as well.
The LinkedIn v. hiQ case outcomes prove that the mere violation of websites ToS might be a breach of contract, but doesnโt constitute a crime. Whatโs more, if we take a look at Googleโs attitude towards violations of its Terms of Service, we can clearly see that the search engine has never taken any legal actions against scrapers.
Nevertheless, there are other measures Google can resort to, including revoking access to its APIs.
- In 2012 the team of Google Adwords warned Raven Tools about the possibility of revoking the SEO softwareโs AdWords API token if they donโt comply with Googleโs anti-scraping policy. Raven Tools eventually was forced to remove its SERP Reporting feature, which was using data scraped from Googleโs results pages to provide a ranking and keyword data to its customers.
- By the same token โ and also in 2012 โ the team of Moz lost access to Adwords API. The company eventually came up with a workaround, substituting clickstream data for that of AdWordsโ.
At the same time, however, what started as Googleโs crackdown on SEO software developersโ access to AdWords API ended up being an inconsistent series of ToS enforcement. Whatโs more, to date, the search engine continues to avoid the full-scale enforcement of its anti-scraping policies.
Here, at DataForSEO, weโve taken all the necessary measures to ensure that our clients keep receiving data they need, regardless the Googleโs attitude to continue enforcing its Terms of Service.
The bottom line
Although scraping is legal by itself, itโs possible for data hosts to mount legal defenses against scrapers, including CFAA and DMCA violation claims.
On the other hand, the outcomes of recent lawsuits filed against scrapers prove that there are a lot of grey areas in current legislation on this matter, and courts may stand in favor of open access to publicly available information (see Sanding v. Sessions ruling). And even though data hosts may prevail against scrapers in courts, itโs often against their interest to sue. For example, if it werenโt for crawling public websites and scraping data from them, Google probably wouldnโt even exist.
After all, we donโt violate CFAA or DMCA when scraping publicly available information from Google SERPs. You can rest assured: using DataForSEO services is legal, itโs not the violation of the law. By the same token, you canโt break Google ToS by simply getting data through our APIs.
Disclaimer: This article does not constitute legal advice. You should seek the counsel of an attorney on your specific matter to comply with the laws in your jurisdiction.