Three biggest legal cases about data scraping
The legality of web scraping is still a grey area. Many businesses forbid scraping in their ToS, but these aren’t enforceable in many places. Data scraping businesses need to remain alert and be aware of the implications of legal cases. Let’s review the three biggest legal cases about the current web data scraping ecosystem.
Data is valuable, and many businesses try to get their hands on it. One of the most common ways to acquire data is web scraping. Web scraping uses bots to automatically access and gather publicly available data. Many websites and services forbid scraping in their ToS, but whether this is enforceable varies around the world. This is why we want to review a couple of seminal cases that shape the current web data scraping ecosystem.
Craigslist vs 3Taps
The case between Craigslist and 3Taps set a number of precedents regarding the legality of data scraping, as well as the right of businesses to deny access to publicly available data. It involved three companies: Craigslist, 3Taps, and PadMapper.
Craigslist is a website and platform that allows users to post classified ads in a variety of different categories. Users can and do advertise just about anything on Craigslist. One of the key things that Craiglist is used for is finding housing and spare rooms.
PadMapper is a business that aggregates housing ads and enables users to search for available housing near them. In order to achieve this, PadMapper scrapes data from a variety of different sources. In 2012, Craigslist was one of the many sites that PadMapper was scraping for data, using it to produce a map that showed where available rooms were. Another business, 3Taps, was also scraping Craigslist for data as part of their usual operation, which involves gathering large amounts of data from publicly available datasets.
Both PadMapper and 3Taps stood to benefit financially from the data that they scraped from Craigslist, albeit indirectly. However, the data that was scraped was publicly available and Craigslist had made no attempt to restrict access to the data. Craigslist founder Craig Newmark had already written an article lamenting the existence of services that strained Craigslists’ own resources, remarking that, “we take issue with only services which consume a lot of bandwidth”. In June of 2012, Craigslist sent a cease-and-desist letter to PadMapper, insisting that they cease scraping data from the site regarding real estate listings.
Craigslist blocked the IP addresses of both PadMapper and 3Taps from accessing the website, effectively cutting off their access to Craigslists’ data. It was Craigslists’ contention that this action constituted restricting access to the data. In other words, they took the position that after blocking the IP addresses of these businesses, their access to data that was otherwise publicly available should be treated as if that data was password-protected.
Under the Computer Fraud and Abuse Act, accessing data without authorization is illegal. It is not illegal to access publicly available data, so the case centered on whether Craigslist could use the act to prevent individual users from accessing otherwise public data. This is a significant question for the entire data scraping industry. In 2012, the industry was still in a relatively early stage, it has certainly evolved considerably over the last seven years.
3Taps used proxy services to bypass the IP block and From create jobs 51continue scraping data. PadMapper then began to access Craigslist data through 3Taps. Craigslist then sued both businesses for unlawful behavior. They claimed a violation of the CFAA, as well as infringement of Craigslist’s’ copyright. 3Taps rejected that it was in violation of the CFAA as the data in question was publicly available, meaning that everybody was a de facto authorized user. They also argued that Craigslists’ cease and desist set limits on how data could be used but did not restrict access. Finally, they claimed that there would be negative consequences to enforcing a vague access restriction that could open the door to other businesses abusing their ability to arbitrarily revoke access to data.
The court sided with Craigslist, affirming that the IP block and cease and desist could both individually be regarded as sufficient notice of access revocation under the CFAA. The case was settled out of court, Craigslist was paid $1 million, which was donated to the Electronic Frontier Foundation – who were very critical of Craigslist for pursuing the case.
The precedent set by this case is that if a website blocks your IP address, continued access to their servers through a proxy or VPN could be considered a violation of the CFAA and be classified as unauthorized data access. Clearly, this would be an untenable position for many scraping businesses.
LinkedIn vs hiQ
LinkedIn’s dispute with hiQ Labs, a data scraping business from Silicon Valley that is not at all dissimilar to 3Taps, has echoes of the above case. The dispute is very similar in nature, revolving around whether LinkedIn can prevent the startup from accessing data that is publicly available across LinkedIn.
Just like Craigslist, LinkedIn sent a cease and desist letter to hiQ, demanding that they immediately halt their scraping of data from LinkedIn’s server. They also claimed that the scraping was a violation of the CFAA, as well as the Digital Millennium Copyright Act.
HiQ responded by filing their own suit against LinkedIn, asking the court to provide an injunction while the case between them and LinkedIn was decided. The court granted the injunction and LinkedIn was forced to allow hiQ access to their servers until the case had been decided. LinkedIn appealed unsuccessfully against the injunction and the case is still pending.
This case hasn’t been decided, but the fact that the court has not simply deferred to the Craigslist Vs 3Taps case suggests that the legal system recognizes how the landscape has changed. Data scraping today is very different from seven years ago.
Ryanair vs PR Aviation
This case was argued in the European Court of Justice but is the same situation as both of the above. PR Aviation enabled users to compare flight prices and was scraping Ryanair’s servers for data. Unlike the US courts, the EUCJ was swift in arriving at its decision. Ryanair had argued that the scraping was a violation of the ToS, as well as a copyright infringement.
The case centered on whether Ryanair could impose restrictions on access to its publicly available database or whether it would be covered by the Database Directive. The court ruled that the owners of publicly available databases did have the right to impose their own access restrictions. It would be for national courts to litigate TOS enforcement and decide whether a database was covered by the directive or not.
This means that in the EU, many owners of public databases are allowed to impose their own access restrictions.
The legality of web scraping is still a grey area. Many businesses forbid scraping in their ToS, but these aren’t enforceable in many places. Data scraping businesses need to remain alert and be aware of the implications of legal cases.