Web scraping is the process of using bots to extract information from a website. In recent years, the debate on web scraping becomes increasingly complex as business intelligence and data privacy issues arise.
The practice of web scraping has lasted almost as long as there have been websites. To be fair, there is “good” web scraping that, in fact, is a fundamental foundation of the internet. Here are some examples of practicing “good” web scraping:
- The “good” search engine crawlers crawl websites to index, analyze and rank their content
- Price comparison sites deploy bots to automatically research product prices and descriptions for allied seller websites, allowing consumers to compare prices of goods and services and make more informed purchasing choices
- Market research companies use web scrapers to mine data from forums and social media to gauge public opinion (i.e., report on “what’s trending”).
This, however, is where the good part of the web scraping story ends. Bad bots, which according to Imperva Bad Bots Report 2022 accounted for 27.7% of all web, mobile and API traffic, an increase of 2.1% over the previous year, retrieve content from a website with the intention of use it for purposes beyond the control of the site owner. Apart from web scraping, cyber criminals use bad bots to conduct various harmful activities including denial of service attackscompetitive data mining, online fraud, account takeover, data theft, intellectual property theft, unauthorized vulnerability scans, spam and digital ad fraud.
The two main ways malicious actors use web scraping maliciously are lowering prices to gain an unfair competitive advantage and stealing copyrighted content and intellectual property. The question remains, is it illegal?
The case of LinkedIn and hiQ Labs
In the summer of 2017, LinkedIn sued hiQ Labs, a San Francisco-based startup. hiQ scrapes publicly available LinkedIn profiles to offer clients, according to its website, “a crystal ball that helps you identify skill gaps or turnover risks months in advance.”
The idea that your public LinkedIn profile could be used against you by your employer is quite troubling. However, on August 14, 2017, a judge ruled that everything was fine. Judge Edward Chen of the U.S. District Court in San Francisco accepted hiQ’s claim in a lawsuit that Microsoft-owned LinkedIn violated antitrust laws by blocking the startup from accessing that data. He ordered LinkedIn to remove the barriers within 24 hours. LinkedIn appealed.
The decision goes against previous court rulings that suggested cracking down on web scraping. And it raises myriad questions about the privacy of social media users and the right of businesses to protect themselves against data breaches. There is also the issue of fairness. LinkedIn has spent years creating something of real value. Why should he have to hand it over to hiQ – paying for servers and bandwidth to host all that bot traffic on top of their own human users, just so hiQ can surf LinkedIn?
What’s the verdict on web scraping?
As discussed here, the legality of web scraping is unsettled as website owners continue to pursue legal actions to prevent their sites from being scraped. As the courts attempt to further determine the legality of web scraping, you could likely have your data stolen and your website’s business logic abused. Instead of seeking legal remedies to overcome this technological challenge, consider solving it with advanced bot protection and anti-scraping technology today.
*** This is a syndicated blog from the Security Bloggers Network of Blog written by Bruce Lynch. Read the original post at: https://www.imperva.com/blog/is-it-illegal-to-scrape-a-website-for-content/