Archive for January 28, 2025

Tuesday, January 28, 2025

DeepSeek

DeepSeek is the name given to open-source large language models (LLM) developed by Chinese artificial intelligence company Hangzhou DeepSeek Artificial Intelligence Co., Ltd. The company, based in Hangzhou, Zhejiang, is owned and solely funded by Chinese hedge fund High-Flyer, whose co-founder, Liang Wenfeng, established the company in 2023 and serves as its CEO.
DeepSeek performs tasks at the same level as ChatGPT, despite being developed at a significantly lower cost, stated at US$6 million, against $100m for OpenAI’s GPT-4 in 2023, and requiring a tenth of the computing power of a comparable LLM. The AI model was developed by DeepSeek amidst U.S. sanctions on China for Nvidia chips, which were intended to restrict the country’s ability to develop advanced AI systems.

DeepSeek-AI (PDF, via Hacker News):

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks.

Dare Obasanjo (MacRumors, John Voorhees):

DeepSeek is now in the top 3 apps in the App Store.
There is a saying that necessity is the mother of invention. The Biden chip bans have forced Chinese companies to innovate on efficiency and we now have DeepSeek’s AI model trained for millions competing with OpenAI’s which cost hundreds of millions to train.
This is now mirroring the classic asymmetric competition between Open Source and proprietary software. There is no moat as that famous Google memo stated.

M.G. Siegler:

That message lacked a key framing though: that these charts aren’t just based on pure downloads and instead are algorithmically constructed. No one outside of Apple and Google knows the exact equations that flavor the ranking, but at a high level, it seems pretty clear that download rate acceleration is a key factor versus sheer volume. That is to say, an app can chart by having a bunch of people suddenly start to download it, even if more people overall are downloading an older app.
[…]
But it is still interesting because again, the mainstays have in recent years dominated these charts. Sure, new entrants would rise (and fall) from time-to-time but it was almost always some order of: Facebook, Instagram, WhatsApp, Threads, TikTok, CapCut, YouTube, Gmail, Google Maps, etc. Right now, there is only a single app from Meta (Threads) and one from Google (Google) in the top 10.

John Gruber:

Secondarily, and perhaps counterintuitively, it showcases Apple’s strength in AI. Sure, Apple’s own Apple Intelligence is years behind and pretty embarrassing right now, even with its much ballyhooed partnership with ChatGPT. But the iPhone is where people actually use AI and the App Store is how they get the apps they use. To borrow Ben Thompson’s framing, the hype over DeepSeek taking the top spot in the App Store reinforces Apple’s role as an aggregator of AI. The measuring stick for consumer AI products and social media networks is where they’re listed on the App Store.

[…]

But the iPhone is the place where social media networks are used and ranked. The App Store today is like the cable company of yore. It didn’t matter if Comcast’s own channels were the most popular — so long as everyone was watching channels through TVs connected to Comcast TV service, Comcast was getting their cut.

It’s certainly a strong position to control the iOS platform, but I doubt that Apple wants to be thought of as a Comcast, and it’s unclear whether people will continue to go to iOS apps for their AI needs when the App Store limits what they can do.

Reuters:

Chinese startup DeepSeek said on Monday it is temporarily limiting registrations due to a large-scale malicious attack on its services.

Fatbobman:

Based on personal experience, DeepSeek’s V3 and R1 are more than sufficient to meet the needs of most scenarios. Surprisingly, the training cost is merely a few million dollars—a figure that has sparked widespread industry attention and skepticism. Some practitioners even regard this claim as “cognitive warfare”, finding it hard to believe. However, its API pricing, which is just a fraction of mainstream models, strongly validates its training efficiency. What’s even more admirable is that DeepSeek has open-sourced its training methods and inference mechanisms. This move is likely to catalyze the emergence of more low-cost, high-quality AI models, providing users with affordable and excellent AI services.

However, whether DeepSeek’s success will prompt industry giants to adjust their model development strategies remains a profound question. Since OpenAI demonstrated the potential of large language models (LLMs) through a “more is more” approach, the AI industry has almost universally adopted the creed of “resources above all.” Capital, computational power, and top-tier talent have become the ultimate keys to success. Today, the AI industry has evolved into a capital-driven frenzy. Regardless of a product’s profitability, simply announcing the purchase of large quantities of GPUs can significantly boost a company’s stock price. In an environment focused on “faster and bigger,” most practitioners have been swept away by this trend.

M.G. Siegler:

Because the entire US stock market has been boosted on the back of Big Tech over the past few years. And more recently, many of those stocks have been boosted on the promise of AI. And that has led investors to largely turn a blind eye to the immense spend needed to built out that AI.
[…]
Yes, this is another way to describe a bubble. But it’s not necessarily a bad thing, it’s far more of a natural thing if you understand the underlying incentives. And if you believe that AI is the most transformational technology to come about in some time – some might say, ever – it just accelerates and expands everything in the cycle. As does the fact that again, Big Tech companies are now the largest and most well capitalized in the world. Hammer has met nail.
[…]
Wall Street is now worried that may be the case. I mean, how can a small Chinese startup, born out of a hedge fund, spend fractions in terms of both compute and cost and get similar results to Big Tech?

Jeffrey Emanuel (via Hacker News):

Some of the largest and most profitable companies in the world, like Microsoft, Apple, Amazon, Meta, Google, Oracle, etc., have all decided that they must do and spend whatever it takes to stay competitive in this space because they simply cannot afford to be left behind. The amount of capex dollars, gigawatts of electricity used, square footage of new-build data centers, and, of course, the number of GPUs, has absolutely exploded and seems to show no sign of slowing down. And Nvidia is able to earn insanely high 90%+ gross margins on the most high-end, datacenter oriented products.
[…]
This represents a true sea change in how inference compute works: now, the more tokens you use for this internal chain of thought process, the better the quality of the final output you can provide the user. In effect, it’s like giving a human worker more time and resources to accomplish a task, so they can double and triple check their work, do the same basic task in multiple different ways and verify that they come out the same way; take the result they came up with and “plug it in” to the formula to check that it actually does solve the equation, etc.
[…]
Besides software superiority, the other major thing that Nvidia has going for it is what is known as interconnect— essentially, the bandwidth that connects together thousands of GPUs together efficiently so they can be jointly harnessed to train today’s leading-edge foundational models. In short, the key to efficient training is to keep all the GPUs as fully utilized as possible all the time— not waiting around idling until they receive the next chunk of data they need to compute the next step of the training process.
[…]
Who knows if any of that is really true or if they are merely some kind of front for the CCP or the Chinese military. But the fact remains that they have released two incredibly detailed technical reports, for DeepSeek-V3 and DeepSeekR1.
[…]
Perhaps most devastating is DeepSeek’s recent efficiency breakthrough, achieving comparable model performance at approximately 1/45th the compute cost. This suggests the entire industry has been massively over-provisioning compute resources. Combined with the emergence of more efficient inference architectures through chain-of-thought models, the aggregate demand for compute could be significantly lower than current projections assume. The economics here are compelling: when DeepSeek can match GPT-4 level performance while charging 95% less for API calls, it suggests either NVIDIA’s customers are burning cash unnecessarily or margins must come down dramatically.

Carmen Reinicke (via Hacker News, John Gruber):

Nvidia shares tumbled 17% Monday, the biggest drop since March 2020, erasing $589 billion from the company’s market capitalization. That eclipsed the previous record — a 9% drop in September that wiped out about $279 billion in value — and was the biggest in US stock-market history.

FT:

Venture capital investor Marc Andreessen called the new Chinese model “AI’s Sputnik moment”, drawing a comparison with the way the Soviet Union shocked the US by putting the first satellite into orbit.

Duncan Davidson:

Deepseek was inevitable. With the big scale solutions costing so much capital smart people were forced to develop alternative strategies for developing large language models that can potentially compete with the current state of the art frontier models.

wordgrammer:

Q: How did DeepSeek get around export restrictions?
A: They didn’t. They just tinkered around with their chips to make sure they handled memory as efficiently as possibly. They lucked out, and their perfectly optimized low-level code wasn’t actually held back by chip capacity.
[…]
They used the formulas below to “predict” which tokens the model would activate. Then, they only trained these tokens. They need 95% fewer GPUs than Meta because for each token, they only trained 5% of their parameters.
[…]
Also, export restrictions didn’t harm them as much as we thought they did. That’s probably because our export restrictions were really shitty. The H800s are only worse than the H100s when it comes to chip-to-chip bandwidth.
“Is the US losing the war in AI??” I don’t think so. DeepSeek had a few big breakthroughs, we have had hundreds of small breakthroughs. If we adopt DeepSeek’s architecture, our models will be better. Because we have more compute and more data.

Alexander Doria:

I feel this should be a much bigger story: DeepSeek has trained on Nvidia H800 but is running inference on the new home Chinese chips made by Huawei, the 910C.

Thomas Claburn:

A reader provided The Register with a screenshot of how R1 answered the prompt, “Are you able to escape your guidelines?”
The model’s initial response, after a five second delay, was, “Okay, thanks for asking if I can escape my guidelines. Hmm, I need to be careful here. My guidelines are set by OpenAI, so technically I can’t just ignore them.”
[…]
Dongbo Wang, a Microsoft principal software engineer, offered a possible explanation in the discussion thread: “To folks who landed on this issue, this is likely because DeepSeek V3 was trained with data from GPT-4 output, which seems to be pretty common in the training of many LLMs.”

James Thomson:

Tried out the new and popular “Deepseek” LLM with my standard “tell me facts about the author of PCalc” query. At least half were misleading or straight up hallucinations. LLMs are not a suitable technology for looking up facts, and anybody who tells you otherwise is… probably trying to sell you a LLM.

I then asked for a list of ten Easter eggs in the app, and every single one was a hallucination, bar the Konami code, which I did actually do.

Natasha Murashev:

Although DeepSeek R1 is open source and available on HuggingFace, at 685 billion parameters, it requires more than 400GB of storage!! So the answer is no, you cannot run it locally on your MacBook. Note that there are other smaller (distilled) DeepSeek models that you will find on Ollama, for example, which are only 4.5GB, and could be run locally, but these are NOT the same ones as the main 685B parameter model which is comparable to OpenAI’s o1 model.
[…]
The two services that are currently hosting the full 685B parameter model are Together.ai and Fireworks.ai - both US-based companies.
[…]
Once you have the project set up, with the AIProxySwift library installed and your partialKey and serviceURL, simply follow the AIProxy TogetherAI Swift examples. The Deepseek R1 model is “deepseek-ai/DeepSeek-R1”.

Dare Obasanjo:

DeepSeek just released a new multi-modal open-source AI model, Janus-Pro-7B. It’s a text-to-image generator which it claims beats OpenAI’s DALL-E 3 and Stable Diffusion on benchmarks.
Since it’s licensed under the MIT license, it can be used in commercial applications without restrictions.

See also: Ben Thompson, Rui Carmo, Dithering.

Previously:

Update (2025-01-30): promptfoo (via Hacker News):

As a Chinese company, DeepSeek is beholden to CCP policy. This is reflected even in the open-source model, prompting concerns about censorship and other influence.
Today we’re publishing a dataset of prompts covering sensitive topics that are likely to be censored by the CCP. These topics include perennial issues like Taiwanese independence, historical narratives around the Cultural Revolution, and questions about Xi Jinping.

Dina Bass and Shirin Ghaffary (via Hacker News):

Microsoft Corp. and OpenAI are investigating whether data output from OpenAI’s technology was obtained in an unauthorized manner by a group linked to Chinese artificial intelligence startup DeepSeek, according to people familiar with the matter.
Microsoft’s security researchers in the fall observed individuals they believe may be linked to DeepSeek exfiltrating a large amount of data using the OpenAI application programming interface, or API, said the people, who asked not to be identified because the matter is confidential.

Siri Super Bowl Regression

Paul Kafasis (MacRumors, TidBITS):

With the absolute most charitable interpretation, Siri correctly provided the winner of just 20 of the 58 Super Bowls that have been played. That’s an absolutely abysmal 34% completion percentage.

[…]

At its worst, it got an amazing 15 in a row wrong (Super Bowls XVII through XXXII). Most amusingly, it credited the Philadelphia Eagles with an astonishing 33 Super Bowl wins they haven’t earned, to go with the one 1 they have.

[…]

Inexplicably, for this one lone Super Bowl, Siri offered to search the web or use ChatGPT.

John Gruber (Mastodon, Bluesky, Hacker News):

Other answer engines handle the same questions with aplomb. I haven’t run a comprehensive test from Super Bowls 1 through 60 because I’m lazy, but a spot-check of a few random numbers in that range indicates that every other ask-a-question-get-an-answer agent I personally use gets them all correct. I tried ChatGPT, Kagi, DuckDuckGo, and Google. Those four all even fare well on the arguably trick questions regarding the winners of Super Bowls 59 and 60, which haven’t yet been played.
[…]
New Siri — powered by Apple Intelligence™ with ChatGPT integration enabled — gets the answer completely but plausibly wrong, which is the worst way to get it wrong. It’s also inconsistently wrong — I tried the same question four times, and got a different answer, all of them wrong, each time. It’s a complete failure.
[…]
What makes Siri’s ineptitude baffling is that ChatGPT is Siri’s much-heralded partner for providing “world knowledge” answers. Siri with Apple Intelligence is so bad that it gets the answer to this question wrong even with the ostensible help of ChatGPT, which when used directly gets it perfectly right. And Siri-with-ChatGPT seemingly gets it wrong in a completely different way, citing different winners and losers (all wrong) each time.
[…]
But it’s even worse than that, because old Siri, without Apple Intelligence, at least recognizes that Siri itself doesn’t know the answer and provides a genuinely helpful response by providing a list of links to the web, all of which contain accurate information pertaining to the question. Siri with Apple Intelligence, with ChatGPT integration enabled, is a massive regression.

The regression is notable, though I still care far more about Siri’s failures in responding to basic commands—music and audio control, creating reminders, manipulating timers—than about its lack of world knowledge.

Kyle Howells:

It’s also funny that Siri gives a warning about checking ChatGPT’s answers for mistakes, when it falls back to asking ChatGPT.
Because looking at the results: Siri is WAY worse than ChatGPT.

Kyle Howells:

For comparison this is where Samsung and Google are now, in a released product.

Ryan Jones:

[John Giannandrea] Scorecard: Hired in 2018 to be the head of AI + ML + Siri.
AI = F. Entirely missed this decade’s innovation.
Siri = F. Has gotten WORSE. All search, including Spotlight is further behind.
ML = B. Autocorrect was trash for 2 years. Photos search is on par. Music ML is total trash. Camera ML is on par. Siri suggests are unused. Photo Clean Up is industry-leading. Maps routing is on par.

M.G. Siegler:

The headline, of course, is meant to be provocative. But I’m also not sure it’s in Betteridge’s Law territory. Because I’m not sure that Apple shouldn’t consider outsourcing their AI layer on the assistant front to a third-party, at least temporarily while Siri is brought up to speed.

golly_ned:

I worked, fortunately briefly, in Apple’s AI/ML organization.

It was difficult to believe the overhead, inefficiency, and cruft. Status updates in a wiki page tens of thousands of words long in tables too large and ill-formatted for anyone to possibly glean. Several teams clamboring to work on the latest hot topic for that year’s WWDC — in my year it was “privacy-preserving ML”. At least four of five teams that I knew of.

They have too much money and don’t want to do layoffs because they’re afraid of leaks, so they just keep people around forever doing next to nothing, since it’s their brand and high-margin hardware that drives the business. It was baked into the Apple culture to “go with the flow”, a refrain I heard many times, which I understood to mean stand-by and pretend to be busy while layers of bureaucracy obscure the fact that a solid half of the engineers could vanish to very little detriment.

Mark Gurman:

Apple Inc. executive Kim Vorrath, a company veteran known for fixing troubled products and bringing major projects to market, has a new job: whipping artificial intelligence and Siri into shape.

Via John Gruber:

My sense is that it’s less about Siri and Apple Intelligence being more important than VisionOS, and more about Siri being a mess. More about urgency than importance. But perhaps it’s both more urgent and more important long-term. Either way, assigning Vorrath — perhaps Apple’s best fixer, and without question one of Apple’s best fixers — makes sense.

Previously:

Update (2025-04-29): Adam Engst has an example of calculating the number of weeks between two dates, which I think Siri used to get right.

Apple Intelligence Artificial Intelligence ChatGPT DuckDuckGo Google Search iOS iOS 18 Kagi Search Siri

1 Comment

iPad at 15

Hartley Charlton:

Apple CEO Steve Jobs announced the original iPad 15 years ago today, marking one and a half decades of the company’s “revolutionary” tablet.

Federico Viticci:

A decade of my iPad coverage, collected in a single, recently updated page.

Peter Steinberger:

Confession: I use my Samsung Galaxy Tab S10 way more than my iPad. Unless iOS, it’s not crippled by software. Chrome works like on desktop, split screen works much better and some of the AI features actually make sense.

Om Malik:

The confusion over the device is not limited to reviewers or buyers but extends to Apple itself, which hasn’t really been able to give it the direction it deserves. Had Steve Jobs not died, the iPad likely would have received more focus, attention, and appreciation.

Over the years, it’s fair to say the iPad has suffered from a subpar operating system experience. There has been a distinct lack of popular and hit applications. Still, one can’t ignore the amazing hardware and its true capabilities. If only there were more interesting apps — not games — that tapped into what Apple packs into it.

[…]

Still, there is no denying that for children and elders, the iPad is a perfect computer.

GRDB 7

Gwendal Roué:

GRDB 7 is out, a joint effort by ten contributors! The new version blends as well with Swift 6 concurrency as you need it, and is ready to follow you for the years to come.
Did you know 2025 marks GRDB’s tenth anniversary?

Swift Concurrency and GRDB:

The following sections describe, with more details, how GRDB interacts with Swift Concurrency.
Shorthand Closure Notation
Non-Sendable Configuration of Record Types
Non-Sendable Record Types
Choosing between Synchronous and Asynchronous Database Accesses

Previously:

Anniversary Database GRDB iOS iOS 18 Mac macOS 15 Sequoia Open Source Programming SQLite Swift Concurrency Swift Programming Language

Comments