Archive for July 25, 2024

Thursday, July 25, 2024

SearchGPT

Kylie Robison (Hacker News):

OpenAI is announcing its much-anticipated entry into the search market, SearchGPT, an AI-powered search engine with real-time access to information across the internet.

The search engine starts with a large textbox that asks the user “What are you looking for?” But rather than returning a plain list of links, SearchGPT tries to organize and make sense of them. In one example from OpenAI, the search engine summarizes its findings on music festivals and then presents short descriptions of the events followed by an attribution link.

[…]

Publishers will have a way to “manage how they appear in OpenAI search features,” the company writes. They can opt out of having their content used to train OpenAI’s models and still be surfaced in search.

Previously:

Update (2024-07-26): Juli Clover:

SearchGPT is available to a small group of users and publishers at the current time, with OpenAI seeking feedback on the product. The prototype is temporary at the current time, but "best" of the features will be integrated into ChatGPT in the future.

Only Google Can Crawl Reddit

Emanuel Maiberg (Hacker News):

Google is now the only search engine that can surface results from Reddit, making one of the web’s most valuable repositories of user generated content exclusive to the internet’s already dominant search engine. If you use Bing, DuckDuckGo, Mojeek, Qwant or any other alternative search engine that doesn’t rely on Google’s indexing and search Reddit by using “site:reddit.com,” you will not see any results from the last week.

DuckDuckGo is currently turning up seven links when searching Reddit, but provides no data on where the links go or why, instead only saying that “We would like to show you a description here but the site won't allow us.” Older results will still show up, but these search engines are no longer able to “crawl” Reddit, meaning that Google is the only search engine that will turn up results from Reddit going forward. Searching for Reddit still works on Kagi, an independent, paid search engine that buys part of its search index from Google.

Simon Willison:

Is this a direct result of Google’s deal to license Reddit content for AI training, rumored at $60 million? That’s not been confirmed but it looks likely, especially since accessing that robots.txt using the Google Rich Results testing tool (hence proxied via their IP) appears to return a different file, via this comment, my copy here.

As he says, this is depressing.

Dare Obasanjo:

The pay-to-play internet is here. […] This pretty much kills any chance of disrupting Google with AI as they can outspend everyone on content exclusivity.

Sriram Karra:

“Pay to play” arrived years ago… Just that folks were not paying attention..

Microsoft did this with GitHub. You haven’t been able to find any GitHub responses in Google searches for years.

Previously:

Update (2024-08-08): Nick Heer:

It is unclear to me whether this is a deal only available to Google, or if it is open to any search engine that wants to pay. Even if it was intended to be exclusive, I have a feeling it might not be for much longer. But it seems like something Reddit would only care about doing with Google because other search engines basically do not matter in the United States or worldwide.1 What amount of money do you think Microsoft would need to pay for Bing to be the sole permitted crawler of Reddit in exchange for traffic from its measly market share? I bet it is a lot more than $60 million.

Maybe that is one reason this agreement feels uncomfortable to me. Search engines are marketed as finding results across the entire web but, of course, that is not true: they most often obey rules declared in robots.txt files, but they also do not necessarily index everything they are able to, either. These are not explicit limitations. Yet it feels like it violates the premise of a search engine to say that it will be allowed to crawl and link to other webpages. The whole thing about the web is that the links are free. There is no guarantee the actual page will be freely accessible, but the link itself is not restricted. It is the central problem with link tax laws, and this pay-to-index scheme is similarly restrictive.

[…]

The government attorneys said Bing is required to pay for structured data owing to its smaller size, while Google is able to obtain structured data for free because it sends partners so much traffic. The judge ultimately rejected their argument Microsoft struggled to sign these agreements or it was impeded in doing so, but did not dispute the difference in negotiating power between the two companies.

Emanuel Maiberg:

Microsoft and Reddit are offering conflicting explanations for why Microsoft’s search engine, Bing, is currently blocked from crawling Reddit and offering links from the site in its search results.

Reddit, which now demands payment from anyone crawling the site and using its data to train AI products, claims that Bing’s crawler is being used to power AI products. Microsoft claims it has made it easy for any site to block its crawler that’s used for AI products, while still allowing a crawler that is only used for search results, and that Reddit’s decision to block Bing is “impacting competition” in the search engine space.

The conflicting reasonings behind the block are further proof that the massive, indiscriminate scraping of the internet to create AI training data in a way that violates long-respected norms about how to access information on the web are eroding trust, making the internet less open, and causing tech companies to beef about this issue in public.

Previously:

Apple Commits to Opening NFC in EU

Tim Hardwick:

The European Union has accepted commitments from Apple to open its mobile payments system and give competitors access to the iPhone's NFC technology, bringing an end to a lengthy investigation by EU regulators into the technology.

According to the announcement, Apple has agreed to open up its payments system to other providers free of charge for a decade. Apple will let users set a third-party wallet app as their default, rather than its own Apple Wallet. It will also allow rivals full access to key iOS features, such as double click to launch wallet apps, along with Face ID, Touch ID, and passcodes for authentication.

As John Siracusa says, it’s unclear what this will mean in practice. Maybe the APIs will be unexpectedly limited or Apple will stonewall or reject apps that attempt to use them. And what happens after 10 years?

Previously:

Swift’s AnyObject

Jordan Rose:

You can also use AnyObject as a constraint on protocols: protocol MyDelegate: AnyObject. Now the implementers are known to have reference semantics, and with T: MyDelegate you can have weak references to T, as before. You can even have weak references to any MyDelegate, allowing swapping between delegates of different types.

What you might run into, though, is that any MyDelegate is not itself AnyObject.

[…]

Because it carries more information than just a single object reference: it also has a “witness table” pointer, the run-time representation of a protocol conformance.

[…]

But wait, Objective-C never had this problem! The id <MyDelegate> type doesn’t take up more than a single-object-reference to store! But that’s because ObjC protocols aren’t represented as tables of methods; they’re just promises that the implementing class has methods with particular names.

Previously:

Books for iPad Gets the Photos Treatment

Federico Viticci:

So, uhm, the UI changes to the Books app for iPad are pretty concerning…?

The app went from having a rich sidebar in iPadOS 17 with sections and collections always available to a simplified layout where sections are hidden away in a popover. Less flexible and discoverable than before.

Does Apple want to make iPad apps less desktop class now?

Previously: