{"id":46984,"date":"2025-03-07T10:40:59","date_gmt":"2025-03-07T15:40:59","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=46984"},"modified":"2025-03-07T10:40:59","modified_gmt":"2025-03-07T15:40:59","slug":"private-github-data-lingers-in-copilot-training","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2025\/03\/07\/private-github-data-lingers-in-copilot-training\/","title":{"rendered":"Private GitHub Data Lingers in Copilot Training"},"content":{"rendered":"<p><a href=\"https:\/\/techcrunch.com\/2025\/02\/26\/thousands-of-exposed-github-repositories-now-private-can-still-be-accessed-through-copilot\/\">Carly Page<\/a>:<\/p>\n<blockquote cite=\"https:\/\/techcrunch.com\/2025\/02\/26\/thousands-of-exposed-github-repositories-now-private-can-still-be-accessed-through-copilot\/\">\n<p>Security researchers are warning that data exposed to the internet, even for a moment, can linger in online generative AI chatbots like Microsoft Copilot long after the data is made private.<\/p>\n<p>[&#8230;]<\/p>\n<p>Lasso co-founder Ophir Dror told TechCrunch that the company found content from its own GitHub repository appearing in Copilot because it had been indexed and cached by Microsoft&rsquo;s Bing search engine. Dror said the repository, which had been mistakenly made public for a brief period, had since been set to private, and accessing it on GitHub returned a &ldquo;page not found&rdquo; error.<\/p>\n<p>[&#8230;]<\/p>\n<p>Lasso extracted a list of repositories that were public at any point in 2024 and identified the repositories that had since been deleted or set to private. Using Bing&rsquo;s caching mechanism, the company found more than 20,000 since-private GitHub repositories still had data accessible through Copilot, affecting more than 16,000 organizations.<\/p>\n<\/blockquote>\n<p>Any passwords or keys that were ever made public, however briefly, should be revoked. However, there may be other information of interest that&rsquo;s now stored, and it was not obvious to me that it would be accessible via Copilot when it doesn&rsquo;t show up in Bing.<\/p>\n\n<p>Previously:<\/p>\n<ul>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2025\/01\/03\/openai-failed-to-deliver-opt-out-tool\/\">OpenAI Failed to Deliver Opt-out Tool<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2024\/07\/01\/microsofts-suleyman-on-ai-scraping\/\">Microsoft&rsquo;s Suleyman on AI Scraping<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2024\/06\/24\/ai-companies-ignoring-robots-txt\/\">AI Companies Ignoring Robots.txt<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2024\/05\/21\/slack-ai-privacy\/\">Slack AI Privacy<\/a><\/li>\n<li><a href=\"https:\/\/mjtsai.com\/blog\/2023\/02\/24\/chatgpt-is-ingesting-corporate-secrets\/\">ChatGPT Is Ingesting Corporate Secrets<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Carly Page: Security researchers are warning that data exposed to the internet, even for a moment, can linger in online generative AI chatbots like Microsoft Copilot long after the data is made private. [&#8230;] Lasso co-founder Ophir Dror told TechCrunch that the company found content from its own GitHub repository appearing in Copilot because it [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2025-03-07T15:41:01Z","apple_news_api_id":"0aef1076-f8e2-4cff-ba75-2233d263e4e6","apple_news_api_modified_at":"2025-03-07T15:41:02Z","apple_news_api_revision":"AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/w==","apple_news_api_share_url":"https:\/\/apple.news\/ACu8QdvjiTP-6dSIz0mPk5g","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[2],"tags":[1351,313,2717,524,355,48,96],"class_list":["post-46984","post","type-post","status-publish","format-standard","hentry","category-technology","tag-artificial-intelligence","tag-bing","tag-copilot-ai","tag-github","tag-privacy","tag-security","tag-web"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/46984","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=46984"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/46984\/revisions"}],"predecessor-version":[{"id":46985,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/46984\/revisions\/46985"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=46984"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=46984"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=46984"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}