Rosyna Keller:
I wholly and utterly believe in the principle behind Apple’s App Tracking Transparency initiative. I therefore consider anything that is both
[…]
While Apple has fixed 3-4 (search for my name) of the 21 privacy bugs (and one kernel panic) I reported, Apple decided they weren’t eligible for the bug bounty.
[…]
When I first reported OE11020806152810, it was almost immediately closed as “Not to be fixed”. I had to gently poke a few bears to get it back to “we’ll fix this.”
However, Apple never assigned a CVE while reluctantly fixing this serious bug/privacy leak.
Previously:
App Tracking Transparency Apple Security Bounty iOS iOS 18 iOS 26 Privacy Security
Mike Isaac (PDF):
Practically overnight, a class of companies like SerpApi — known as “data scrapers” — found a new business selling data scraped from Google to companies looking to train their A.I. chatbots.
On Wednesday, the internet message board Reddit decided to fight the data scrapers. It filed a lawsuit in the U.S. District Court for the Southern District of New York claiming that four companies had illegally stolen its data by scraping Google search results in which Reddit content appeared.
Three of those companies — SerpApi; a Lithuanian start-up, Oxylabs; and a Russian company, AWMProxy — sold data to A.I. companies like OpenAI and Meta, according to the lawsuit. The fourth company, Perplexity, is a San Francisco start-up that makes an A.I. search engine.
Via John Gruber (Mastodon):
The entire premise of their business is crazy. SerpApi prints the crime right on the tin, describing their service as a “Google Search API” and “Scrape Google and other search engines from our fast, easy, and complete API.” What makes this so crazy is that Google doesn’t offer a search API. SerpApi is offering the Google search API that Google itself doesn’t offer, and charging companies money for it. Everyone, upon hearing the premise and nature of SerpApi, asks the same question: How is this legal? The answer is, it probably isn’t. But right on SerpApi’s home page they claim to offer customers a “U.S. Legal Shield”[…]
[…]
Why Google hasn’t sued them yet, I don’t understand.
This is a weird case. SerpApi is not like Common Crawl, building an index by scraping the Web. It’s scraping Google search results. Google actually does have legal access to scrape Reddit. And SerpApi is probably right that there’s First Amendment protection for indexing public search results, just as there is for indexing other public content. But, obviously, they’re trying to get at the Reddit data without paying to license it, and maybe the means for doing this violate the DMCA. On the one hand, hiring a hitman is illegal; you don’t get a legal shield by contracting out the crime. On the other hand, it’s not exactly clear to me which step of this chain is illegal, especially if Google seems not to object. Whatever, the result, I expect it to have far-reaching consequences for the Web.
Mike Masnick:
Reddit is NOT arguing that these companies are illegally scraping Reddit, but rather that they are illegally scraping… Google (which is not a party to the lawsuit) and in doing so violating the DMCA’s anti-circumvention clause, over content Reddit holds no copyright over. And, then, Perplexity is effectively being sued for linking to Reddit.
[…]
And, incredibly, within their lawsuit, Reddit defends its arguments by claiming it’s filing this lawsuit to protect the open internet. It is not. It is doing the exact opposite.
[…]
Reddit has a license to the content users post in order to operate the service, but they don’t hold the copyright on it. Indeed, Reddit’s terms state clearly that users retain “any ownership rights you have in Your content.” Because of Reddit’s agreement that it can license content, the deal with Google could sorta squeeze under that term, but that doesn’t give Reddit the right to then sue over users’ copyrights (as it’s doing in this case).
[…]
But here, Reddit is doing something even crazier. Because it’s saying that since these companies (allegedly) get around Google’s technological measures, then somehow Reddit can accuse them of violating 1201.
Nick Heer:
I am glad Masnick wrote about this despite my disagreement with his views on how much control a website owner ought to have over scraping. This is a necessary dissection of the suit, though I would appreciate views on it from actual intellectual property lawyers. They might be able to explain how a positive outcome of this case for Reddit would have clear rules delineating this conduct from the ways in which artificial intelligence companies have so far benefitted from a generous reading of fair use and terms of service documents.
Jeff Johnson:
OpenAI is blatantly ignoring my robots.txt User-agent: ChatGPT-User Disallow: /
ClaudeBot too, apparently.
John Gruber (Mastodon):
At the bottom of their “Use Cases” page, SerpApi lists the following companies and organizations as customers (“They trust us. You are in good company. Join them.”)
[…]
Was Apple removed from the list because they’re no longer (or never were?) a customer, or because they remain a customer but don’t want to be listed?
Previously:
Apple Intelligence Artificial Intelligence ChatGPT Claude Copyright Digital Millennium Copyright Act (DMCA) Google Google Search Lawsuit Legal Perplexity Reddit Web Web Crawlers
Howard Oakley:
There’s a bug in Spotlight that can prevent it from indexing any of the contents of susceptible text files. This has been present since macOS 13 Ventura if not before, and is still present in Tahoe 26.0.1.
[…]
To demonstrate this bug, all you need is a single UTF-8 plain text file, created by TextEdit or any other app capable of saving plain text. Start the text with the two characters L and G, both in capitals.
[…]
This isn’t the first bug in the RichText.mdimporter. In macOS Catalina 10.15.6, the same mdimporter (then build 319.60.100) introduced a bug that broke indexing of Rich Text (RTF) files.
Drew:
The same thing happens for ‘HPA’. I suspect this might have something to do with the magic entry for Arhangel archive data (/usr/share/file/magic/archive; see also HPA archive data), or something that is trying to make an equivalent check. Notice that ‘file’ reports such a text file as being ‘Arhangel archive data’.
Like Oakley, I wouldn’t expect this to matter. Spotlight uses the file extension to determine the UTI (and therefore the importer), rather than using “magic” to look at the contents of the file. But it appears the problem is occurring after that and that the importer itself is using “magic.”
Howard Oakley:
What happens is that saving a text file starting with forbidden characters correctly triggers Spotlight’s indexing service. That identifies the file as having the UTI public.plain-text and hands it over for its contents to be indexed. But the indexer inspects those first few characters, decides it’s a different type of file altogether, and promptly returns an error 4864 for an NSCoderReadCorruptError without going any further.
[…]
It turns out that files starting with the characters Draw were characteristic of a binary vector graphics format used by the !Draw app for RISC OS 2 in 1989. Rather than believing the file’s UTI for one of the most common types of files in macOS, Spotlight’s indexer therefore decided that it was trying to import file data that must now be as rare as hens’ teeth, and wouldn’t go any further.
Previously:
Bug Mac macOS 13 Ventura macOS 14 Sonoma macOS 15 Sequoia macOS Tahoe 26 Spotlight Uniform Type Identifier