当前位置:网站首页>After five years of losing the lawsuit, the trillion reptile army is ready to move

After five years of losing the lawsuit, the trillion reptile army is ready to move

2022-06-11 13:04:00 Deep learning and python

compile | Tina、 Nuka Cola

The Internet crawler war will never end .

This is a landmark ruling in the history of the reptile struggle . This Monday , US court ruling , Data analysis company HiQ Indictment LinkedIn The case upheld the original judgment , And that collecting personal data from public websites is completely legal .

LinkedIn It is a professional social platform under Microsoft , Users can go to LinkedIn Establish personal files on the website , Including educational background 、 Work experience 、 Skills and other information .HiQ Is a data analysis company , from LinkedIn Crawl public data , After sorting and analyzing, sell the processing results to relevant enterprises .

LinkedIn Although you have data , However, the data itself is provided by the user to LinkedIn Of . In the age of big data , Some Internet platforms have accumulated a lot of user data , And establish their own resource advantages : In the competition with other Internet enterprises and platforms , The more user data, the better to use , The easier it is to attract more users , So as to be in a more favorable position . This snowball effect makes Internet companies often regard data as the core asset in the competition .

1 There were two or three lawsuits , Ultimately in the public interest

Before this case , Any visit LinkedIn People on the website can get these data . At data analytics HiQ For a long time LinkedIn The behavior of crawling the website ,LinkedIn towards HiQ Sent a prohibition notice , And quoted in the letter 《 Computer fraud and abuse act 》(Computer Fraud and Abuse Act, “CFAA”).

2017 year ,HiQ A pre-emptive strike , As plaintiff , Indictment LinkedIn Pass the law 、 Technology and other ways to prevent its replication LinkedIn User's public profile , He also applied to the court for a temporary injunction .

although HiQ The company is right LinkedIn The website has implemented a web crawler , But the judge of the United States Court held that , This reptile behavior is not against the law , because LinkedIn The data on the website is public data , For public data , Even if it violates the other party's settings robot agreement , It should also be allowed by law .

It's like opening an unlocked store during the day and going in and looking , It cannot be regarded as illegal intrusion . therefore , The court not only failed to find HiQ The company's reptile behavior is illegal , Even the reverse LinkedIn Anti - crawler technology is illegal .

The magistrate in charge of the case appealed to HiQ A preliminary injunction was granted , prohibit LinkedIn Interfere with... During the trial of the case HiQ Data crawling for . The judge held that ,《 Computer fraud and abuse act 》( take “ unaccredited ” Or with “ Access rights granted are exceeded ” To access a protected computer , Deemed a criminal act ) It doesn't apply to HiQ from LinkedIn The act of a website collecting public data .

In the face of disadvantage ,LinkedIn Choice of appeal . As early as 2019 year , The court of appeal once supported the lower court in 2017 In HiQ Lawsuit LinkedIn The decision made in the case , Determine that web crawling does not belong to “ Unauthorized access to protected computers ” Behavior , The ruling also upheld the original judgment .LinkedIn Choose to appeal again . But two years later , The Ninth Circuit Court is still standing HiQ On one side , And send the case back to the Northern District Court of California .

LinkedIn Of course, I'm not satisfied with this , Then he appealed to the United States Supreme Court .2020 year 3 month ,LinkedIn Ask the Supreme Court to review the decision of the Ninth Circuit Court . The company defended that , It uses technical means to prevent network crawling 、 The act of sending a termination notice at the same time , Shall be deemed to meet the requirements of the normal authorization mechanism . actually , As a social media website owned by Microsoft 、LinkedIn We have been trying to prevent the results in the station from being directly viewed by the outside world , But I don't want to be cut off from search engines due to excessive closure .

LinkedIn In his appeal to the Supreme Court, the lawyer wrote ,“ According to the ruling of the Ninth Circuit , Unless the website is completely blocked by password mechanism , Otherwise, any company that decides to partially disclose the content of the website —— Include Ticketmaster、Amazon Wait for online retailers , Even Twitter Wait for social networking platforms —— Will be exposed to intrusive crawlers deployed in bulk .”

“ And once you choose password blocking , The website will not be retrieved by the search engine , As a result, people cannot find the information through the main information access channels on the Internet .”

2021 year 6 month 3 Japan , The United States Supreme Court in another similar case , namely Van Buren In v. the United States government , Shrunk 《 Fraud and abuse act 》 Scope of control .Nathan Van Buren It's a police officer in Georgia , Have the authority to search computer records about license plates for law enforcement purposes . He was trapped by the FBI , Search these records for personal purposes ( At the request of an FBI informant , The informant offered to pay thousands of dollars for the information ). Finally, the US court sentenced him to 18 Months in prison . The bill has been criticized for failing to “ unaccredited ” and “ Beyond the scope of Authorization ” Make a clear definition .

The United States high court in Van Buren The case said , Simply violating the terms of service does not comply with 《 Fraud and abuse act 》 Proposed in “ Beyond the scope of Authorization ” Conditions . However , Whether the credential based locking mechanism is sufficient to determine “ unaccredited ” The only way to access , The United States high court still failed to give a clear answer .

Two weeks later , The United States Supreme Court decided to HiQ Lawsuit LinkedIn The case went back to the Ninth Circuit , Hope to combine Van Buren Re-examine the jurisprudence of the case 《 Fraud and abuse act 》 Scope of application . But from the results , Although the court of appeal referred to Van Buren case law , However, a ruling was finally made to maintain the original judgment two years ago .

In the Ninth Circuit Court ruling ,“ One of the basic features of public websites , That is, the publicly visible part is not subject to access restrictions ; In other words , These sections will be open to any visitor with a web browser .”

“ in other words , If these computers hosting public pages are regarded as houses , Then the public website equipment has not been set up at the beginning of deployment “ Front door ”, Naturally, there is no such thing as raising or lowering the access threshold . therefore ,Van Buren The case reinforces our ruling , namely “ unaccredited ” The concept really doesn't apply to public websites .”

But the hospital ruling did not solve HiQ And LinkedIn The enmity between , It's just a simple prohibition LinkedIn Continue to interfere with HiQ Collect data from its public website 、 And expressed no support according to 《 Fraud and abuse act 》 Yes HiQ Analyze the business and make a claim . The real core of the case is unfair competition 、 Privacy violations and other issues remain unresolved .

In the email statement ,LinkedIn The spokesman said the company would not abandon the lawsuit , Will continue to seek reasonable results in court .“ We are disappointed with the result , But this is only a preliminary ruling 、 The case is far from over . We will continue to work hard to protect LinkedIn members , In particular, protect their ability to control personal information on the website .”

The impact of this case

Data capture behavior is now widely used in social life , Not just for commercial use , There are also applications in academic research and so on . therefore , The judgment in this case has also received great attention . The ruling of this case has been hailed and praised by the American media , Think the decision of the Ninth Circuit Court is the archivist 、 scholars 、 Of researchers and journalists “ A major victory ”.

For the controversial ownership of data and privacy , The case was also discussed to some extent . From the point of view of the Ninth Circuit Court of appeal , Its ruling supports that the user is the owner of the data , The platform only uses these data according to the user's authorization , You can't have all this data .

stay Reddit On , Netizens are right LinkedIn The spokesman's explanation of the appeal launched a lot of ridicule :“ Such an explanation, if not absurd , It's also presumptuous , Users who provide data never get feedback from the platform ”,“ The claim of protecting customers' privacy is exaggerated ”,“ Now who would believe that such an explanation is meaningful ?”.......

On the other hand , Data capture is also an important part of modern Internet Ecology , according to Akamai The statistics of , In global Internet traffic , near 40% The traffic is occupied by reptiles . stay 2021 Second quarter , Global reptile attacks have reached 700 100 million times , Year-on-year growth 15%. The United States Court ruled that , It also means that from then on, 10 billion crawlers will capture the public information of online retailers and social networking platforms , It is in accordance with American Law .

Chinese and American laws are different , Careful use of crawler technology

Perhaps because of the importance of data , In recent years, there have been endless disputes about data at home and abroad . In China, , There are not a few cases of improper disputes caused by reptile behavior . Once a lawyer of Changde law firm 《 Climb into “ Unfair competition ” Insect , It's expensive 》 The article , In the article, he said , They are in “ Reptiles ” And other keywords are searched and screened in Peking University magic weapon 2016 Reptile related cases since 49 Pieces of , Most of them are criminal cases , Crimes involving copyright infringement 、 Crime of illegal operation 、 Violation of personal information of citizens 、 Crime of fraud 、 Extortion, etc , It also includes some civil and commercial law cases , It mainly involves copyright and unfair competition disputes .

One of the typical cases is the public comment v. Baidu case .

2016 year , Baidu uses a lot of crawlers to capture the comment information of public comments , Display in Baidu map , Later, he was sued by public comments to the court . The court heard that , Baidu's behavior violates the recognized business ethics and the principle of good faith , Constitute unfair competition .

In the second instance judgment of public comment v. Baidu , The judge made it clear that :“ In freedom 、 In the open market economic order , Business resources and business opportunities are scarce , The rights and interests of operators can not obtain the same protection intensity as legal property rights , Operators must appropriately tolerate damage as a result of competition . Case , The protected interests claimed by Hantao company are not absolute rights , Its damage does not necessarily mean that it should receive legal relief , As long as other people's competitive behavior itself is legitimate , Then the act is not reprehensible .”

Although technology is neutral , But there are boundaries in the application of Technology . at present , The data ownership of the platform cannot be clearly defined , Therefore, the process of defining legal liability is still relatively complex . Therefore , With the development of Internet technology ,“ Reptiles ” The word "two" has gradually taken on in the Chinese context “ derogatory sense ” color .

For programmers who write web crawlers , If you climb to the data you shouldn't climb , There is the possibility of breaking the law .“ Reptiles write well , Prison food is eaten early ” The existence of nicknames , It also shows that we need to be cautious about reptile Technology . It's like LinkedIn platform , There are generally two options for obtaining public data : Use reptiles /scraper ( Free but risky ), Use API( Not free, but safe ), If you must use these public data , We need to make a careful choice .

Reference link :

https://www.theregister.com/2022/04/19/scraping_public_data_linkedin/

《 Where is the boundary of data capture ?》:http://rmfyb.chinacourt.org/paper/html/2020-03/19/content_166271.htm?div=-1

《 Climb into “ Unfair competition ” Insect , It's expensive 》:http://www.dehenglaw.com/CN/tansuocontent/0008/023370/7.aspx?MID=0902

原网站

版权声明
本文为[Deep learning and python]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206111252037871.html