{"id":40478,"date":"2023-08-29T12:01:31","date_gmt":"2023-08-29T16:01:31","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=40478"},"modified":"2023-08-29T12:01:31","modified_gmt":"2023-08-29T16:01:31","slug":"web-scraping-for-me-but-not-for-thee","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2023\/08\/29\/web-scraping-for-me-but-not-for-thee\/","title":{"rendered":"Web Scraping for Me, But Not for Thee"},"content":{"rendered":"<p><a href=\"https:\/\/blog.ericgoldman.org\/archives\/2023\/08\/web-scraping-for-me-but-not-for-thee-guest-blog-post.htm\">Kieran McCarthy<\/a> (via <a href=\"https:\/\/mas.to\/@carnage4life\/110956475708818415\">Dare Obasanjo<\/a>):<\/p>\n<blockquote cite=\"https:\/\/blog.ericgoldman.org\/archives\/2023\/08\/web-scraping-for-me-but-not-for-thee-guest-blog-post.htm\"><p>Some of the biggest companies on earth&mdash;including Meta and Microsoft&mdash;take aggressive, litigious approaches to prohibiting web scraping on their own properties, while taking liberal approaches to scraping data on other companies&rsquo; properties.\n\nWhen we talk about web scraping, what we&rsquo;re really talking about is data access. All the world&rsquo;s knowledge is available for the taking on the Internet, and web scraping is how companies acquire it at scale. But the question of who can access and use that data, and for what purposes, is a tricky legal question, which gets trickier the deeper you dig.<\/p><p>[&#8230;]<\/p><p>But make no mistake, these companies view this data, generated by their users on their platforms, as <em>their<\/em> property. This is true even though the law does not recognize that they have a property interest in it, and even though they expressly disclaim any property rights in that data in their terms of use.<\/p><p>Since the law does not give them a cognizable property interest in this data, they must resort to other legal theories to prevent others from taking it and using it.<\/p><\/blockquote>\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/08\/14\/zoom-tos-allowed-training-ai-on-user-content-with-no-opt-out\/\">Zoom ToS Allowed Training AI on User Content With No Opt Out<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/07\/28\/the-mess-at-stack-overflow\/\">The Mess at Stack Overflow<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/06\/08\/apollo-shutting-down-june-30th\/\">Apollo Shutting Down June 30th<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/07\/03\/twitter-now-requires-logging-in\/\">Twitter Now Requires Logging In<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/04\/10\/twitter-restricts-substack-links\/\">Twitter Restricts Substack Links<\/a><\/li><li><a href=\"https:\/\/mjtsai.com\/blog\/2018\/03\/19\/cambridge-analytica-harvested-50-million-facebook-profiles\/\">Cambridge Analytica Harvested 50 Million Facebook Profiles<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Kieran McCarthy (via Dare Obasanjo): Some of the biggest companies on earth&mdash;including Meta and Microsoft&mdash;take aggressive, litigious approaches to prohibiting web scraping on their own properties, while taking liberal approaches to scraping data on other companies&rsquo; properties. When we talk about web scraping, what we&rsquo;re really talking about is data access. All the world&rsquo;s knowledge [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2023-08-29T16:01:34Z","apple_news_api_id":"dcf51a19-e228-4210-a382-7c19bf52da46","apple_news_api_modified_at":"2023-08-29T16:01:34Z","apple_news_api_revision":"AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/w==","apple_news_api_share_url":"https:\/\/apple.news\/A3PUaGeIoQhCjgnwZv1LaRg","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[313,167,25,51,209,436,2137,37,2361,343,1365,49,96],"class_list":["post-40478","post","type-post","status-publish","format-standard","hentry","category-technology","tag-bing","tag-copyright","tag-facebook","tag-google","tag-legal","tag-linkedin","tag-meta","tag-microsoft","tag-openai","tag-search","tag-trademark","tag-twitter","tag-web"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/40478","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=40478"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/40478\/revisions"}],"predecessor-version":[{"id":40479,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/40478\/revisions\/40479"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=40478"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=40478"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=40478"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}