Web Scraping and Bots

23 November 2020

Web Scraping and Bots

A few years ago, LinkedIn didn’t allow a company named hiQ, a data analytics company that uses automated bots, to “scrape" information which LinkedIn members include on their public profiles for the purpose of selling the collected data to hiQ’s business clients, access to its publicly available LinkedIn member profiles.

On September 9, 2019, the United States Court of Appeals for the Ninth Circuit issued a decision in favour of hiQ that LinkedIn could not deny a web scraping company such access. The court held that members of LinkedIn as they post their stuff publicly, have very less expectation of privacy from it and that they intend their information to be accessed by others, including use for commercial purposes. Although many lawsuits challenging the practice of web scraping are currently pending in other courts in the United States.  

But what is web scraping?

According to Govind Kumar Chaturvedi, Trademark Attorney, LS Davar & Co, web scraping is a form of gathering data from third-party websites, wherein a crawler or a bot visits different websites and collects data from these websites.

“It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis,” he says. “This is done without the approval of that person or the website leading to infringement of the right to privacy.”

Especially in this time of pandemic, this issue is very significant because it is helpful to gather the COVID-19 data from different government websites and at the same time, notifying the website you are taking the data from.

“The main challenge remains to set up a framework wherein the permission before taking the data the from a third party website for commercial purposes so not to lead to loss of privacy for an individual,” says Chaturvedi. “This can be done to make certain safeguards on these websites like blocking bots as they have identifiable signatures, use of CSS Sprites for the data being hosted on the social media platforms, blocking specific IP addresses.”  

He adds that this has been a major issue in the USA, EU and other countries.

“However, what I can predict is that job market growth will be affected as the headhunting will take place,” he says. “It will lead to the creation of data reports being generated quicker and most of all, it will be used to generate new leads for businesses in order to have more people to sell their product which could lead to more undesired calls for the consumers.”

 

Excel V. Dyquiangco


Law firms

Please wait while the page is loading...

loader