Siri Super Bowl Regression
Paul Kafasis (MacRumors, TidBITS):
With the absolute most charitable interpretation, Siri correctly provided the winner of just 20 of the 58 Super Bowls that have been played. That’s an absolutely abysmal 34% completion percentage.
[…]
At its worst, it got an amazing 15 in a row wrong (Super Bowls XVII through XXXII). Most amusingly, it credited the Philadelphia Eagles with an astonishing 33 Super Bowl wins they haven’t earned, to go with the one 1 they have.
[…]
Inexplicably, for this one lone Super Bowl, Siri offered to search the web or use ChatGPT.
John Gruber (Mastodon, Bluesky, Hacker News):
Other answer engines handle the same questions with aplomb. I haven’t run a comprehensive test from Super Bowls 1 through 60 because I’m lazy, but a spot-check of a few random numbers in that range indicates that every other ask-a-question-get-an-answer agent I personally use gets them all correct. I tried ChatGPT, Kagi, DuckDuckGo, and Google. Those four all even fare well on the arguably trick questions regarding the winners of Super Bowls 59 and 60, which haven’t yet been played.
[…]
New Siri — powered by Apple Intelligence™ with ChatGPT integration enabled — gets the answer completely but plausibly wrong, which is the worst way to get it wrong. It’s also inconsistently wrong — I tried the same question four times, and got a different answer, all of them wrong, each time. It’s a complete failure.
[…]
What makes Siri’s ineptitude baffling is that ChatGPT is Siri’s much-heralded partner for providing “world knowledge” answers. Siri with Apple Intelligence is so bad that it gets the answer to this question wrong even with the ostensible help of ChatGPT, which when used directly gets it perfectly right. And Siri-with-ChatGPT seemingly gets it wrong in a completely different way, citing different winners and losers (all wrong) each time.
[…]
But it’s even worse than that, because old Siri, without Apple Intelligence, at least recognizes that Siri itself doesn’t know the answer and provides a genuinely helpful response by providing a list of links to the web, all of which contain accurate information pertaining to the question. Siri with Apple Intelligence, with ChatGPT integration enabled, is a massive regression.
The regression is notable, though I still care far more about Siri’s failures in responding to basic commands—music and audio control, creating reminders, manipulating timers—than about its lack of world knowledge.
It’s also funny that Siri gives a warning about checking ChatGPT’s answers for mistakes, when it falls back to asking ChatGPT.
Because looking at the results: Siri is WAY worse than ChatGPT.
For comparison this is where Samsung and Google are now, in a released product.
[John Giannandrea] Scorecard: Hired in 2018 to be the head of AI + ML + Siri.
AI = F. Entirely missed this decade’s innovation.
Siri = F. Has gotten WORSE. All search, including Spotlight is further behind.
ML = B. Autocorrect was trash for 2 years. Photos search is on par. Music ML is total trash. Camera ML is on par. Siri suggests are unused. Photo Clean Up is industry-leading. Maps routing is on par.
The headline, of course, is meant to be provocative. But I’m also not sure it’s in Betteridge’s Law territory. Because I’m not sure that Apple shouldn’t consider outsourcing their AI layer on the assistant front to a third-party, at least temporarily while Siri is brought up to speed.
I worked, fortunately briefly, in Apple’s AI/ML organization.
It was difficult to believe the overhead, inefficiency, and cruft. Status updates in a wiki page tens of thousands of words long in tables too large and ill-formatted for anyone to possibly glean. Several teams clamboring to work on the latest hot topic for that year’s WWDC — in my year it was “privacy-preserving ML”. At least four of five teams that I knew of.
They have too much money and don’t want to do layoffs because they’re afraid of leaks, so they just keep people around forever doing next to nothing, since it’s their brand and high-margin hardware that drives the business. It was baked into the Apple culture to “go with the flow”, a refrain I heard many times, which I understood to mean stand-by and pretend to be busy while layers of bureaucracy obscure the fact that a solid half of the engineers could vanish to very little detriment.
Apple Inc. executive Kim Vorrath, a company veteran known for fixing troubled products and bringing major projects to market, has a new job: whipping artificial intelligence and Siri into shape.
Via John Gruber:
My sense is that it’s less about Siri and Apple Intelligence being more important than VisionOS, and more about Siri being a mess. More about urgency than importance. But perhaps it’s both more urgent and more important long-term. Either way, assigning Vorrath — perhaps Apple’s best fixer, and without question one of Apple’s best fixers — makes sense.
Previously:
1 Comment RSS · Twitter · Mastodon
One irony is that LLMs are bad for searches and facts, but weirdly helpful for fact-checking and discovering ‘unknown unknowns’. Mike Caulfield has been working on this front recently, to great effect:
https://substack.com/home/post/p-154544059
As an aside, I really wish that instead of over-hiring for ‘AI’, Apple instead over-hired for OS and software engineering… at least then we’d have fewer bugs!