Turmoil Behind Siri
Late last year, a trio of engineers who had just helped Apple modernize its search technology began working on the type of technology underlying ChatGPT, the chatbot from OpenAI that has captivated the public since it launched last November. For Apple, there was only one problem: The engineers no longer worked there. They had left the company to work on the technology, known as large-language models, at Google.
A new report from The Information today goes in-depth on the apparent chaos inside teams at Apple working on Siri and artificial intelligence. According to the story, “organizational dysfunction and a lack of ambition” have plagued Apple’s efforts to improve Siri and the backend technology that powers it.
This dysfunction has led to Apple falling further and further behind competitors like OpenAI, Microsoft, and Google, leading some Apple employees to question the future.
Today’s report is based on “interviews with more than three dozen former Apple employees who worked in its AI and machine learning groups.” The report follows a similar story from The New York Times earlier this month, which explained how Siri is built on a “clunky” database that ultimately leads to it taking “weeks” for Siri to be updated with “basic features.”
Apple’s virtual assistant is apparently “widely derided” inside the company for its lack of functionality and minimal improvement over time.
[…]
Apple executives are said to have dismissed proposals to give Siri the ability to conduct extended back-and-forth conversations, claiming that the feature would be difficult to control and gimmicky. Apple’s uncompromising stance on privacy has also created challenges for enhancing Siri, with the company pushing for more of the virtual assistant’s functions to be performed on-device.
This is weird because I think the main problem with Siri is not the missing sophisticated stuff like conversations but that the basics remain unreliable. And for all the talk of on-device Siri, basic tasks like creating reminders still need network access, and tasks like controlling the audio, which should definitely run on device without any sophisticated AI, are still incredibly slow and buggy.
Apple’s design team repeatedly rejected the feature that enabled users to report a concern or issue with the content of a Siri answer, preventing machine-learning engineers from understanding mistakes, because it wanted Siri to appear “all-knowing.”
[…]
Most recently, the group working on Apple’s mixed reality headset were reportedly disappointed by the demonstrations provided by the Siri team on how the virtual assistant could control the headset. At one point in the device’s development, the headset team considered building an alternative method for controlling the device using voice commands because Siri was deemed to be unsatisfactory.
Maybe all that is true. But what I cannot understand is why anyone would think users would want to have a conversation with Siri, when many would probably settle for a version of that basic database association schema working correctly.
[…]
It is not the case that Siri is failing to understand what I am asking it to do. Rather, it is faltering at simple hurdles and functioning as an ad for other Apple services. I would be fine with Siri if it were a database that performed reliably and expectedly, and excited for the possibilities of one fronted by more capable artificial intelligence. What I am, though, is doubtful — doubtful that basic tasks like these will become meaningfully better, instead of a different set of bugs and obstacles I will need to learn.
Previously:
- GPT-4
- HomePod Late Adopter
- The Voice Assistant Battle 2023
- Siri’s 10-Year Anniversary
- The Disappointment of On-Device Siri
- Why Apple Believes It’s an AI Leader
- Apple Hires John Giannandrea
- What Went Wrong With Siri
Update (2023-05-03): John Gordon:
Why can’t Siri give me the Apple Music playlist I created? […] I just want Siri to do the simple things it weirdly can’t do. I don’t need Siri to be ChatGPT.
5 Comments RSS · Twitter · Mastodon
> Maybe all that is true. But what I cannot understand is why anyone would think users would want to have a conversation with Siri, when many would probably settle for a version of that basic database association schema working correctly.
Tell me you are disconnected from the world, without telling me you are disconnected from the world.
"Users" would want to have a conversation with Siri for the same reason "users" want to have that conversation with ChatGPT, Bing Chat and Bard; namely, technical or philosophical intrigue, wanting to explore the edges of the technology, trying to break it, etc. Microsoft was well aware this will happen, and they still created the product, because any news is good news for Bing. At this point, the same is true about Siri, the most mocked of all virtual assistants. It's an embarrassing product for Apple, much more than Bing was an embarrassing product for Microsoft. Now, Bing is leading the conversation for LLM products. Imagine that. But Nick Heer over here, is unable to "understand" simple human nature.
> "Users" would want to have a conversation with Siri for the same reason "users" want to have that conversation with ChatGPT, Bing Chat and Bard; namely, technical or philosophical intrigue, wanting to explore the edges of the technology, trying to break it, etc.
Fair point!
Perhaps I was not clear — that is on me — but what I was attempting to explain was the vast gulf between what Siri is marketed as doing today and the complaints I see and hear in the real world. Those are things which may not be solved by making Siri more conversational. This rationale — "intrigue", as you put it — is interesting, but not necessarily what many people want or expect out of Siri. Bing may be more interesting than it ever has been, but it isn't moving the needle on search (https://gs.statcounter.com/search-engine-market-share).
If Siri magically becomes a conversational tour de force, but is still incapable of reliably messaging someone, it is a more curious product and will make for some nice demos. But it is not a better digital assistant.
> But Nick Heer over here […]
Ay.
> Apple’s design team repeatedly rejected the feature that enabled users to report a concern or issue with the content of a Siri answer, preventing machine-learning engineers from understanding mistakes, because it wanted Siri to appear “all-knowing.”
I can kind of see why Apple mistakenly does this but why does everyone else in the industry? Google has this problem horribly everywhere from search to photos with no way to report problems and Amazon is similar. I assume there's some theory along the lines of being able to data-mine using metrics like repeat queries but it seems like there's some pervasive belief that allowing users to give feedback is bad even though it's one of the most useful data sources an ML engineer has available.
The idea of being able to tell Siri to pause audiobooks when doing yard work outside in the winter (gloves) and out of cellphone/WiFi range seemed really great until I tried it and it was very unreliable. That would be a feature I'd like fixed.
As to LLMs, I understand Apple's reluctance. I feel the same way. People are going to believe LLMs' hallucinations and that will be bad. Remember all the people who got lost/stuck/drove into rivers in the early days of SatNavs? Same thing will happen. Many people trust machines more than themselves. And then they'll believe that if it takes 5 hours to dry 5 shirts in the sun, it will take 30 hours to dry 30 shirts, just as the LLM tells them. An LLM passes the bar exam just like a book on passing the bar exam passes the bar exam. It's not what people think, but it'll probably cause a lot of disruption until normies realize they've been sold a pipe dream and grok it.
@OUG Agreed with first paragraph. Quite simply, even the basics don't work. I don't mind having a "layer" to get me access to some LLM, but the point is that the functionality comes first. And, I dunno, an LLM could easily help with that, e.g. better parsing, like S-GPT, where part of the solution relies on "general" knowledge that would be difficult to extract in a more programmatic way.