DeepSeek
DeepSeek is the name given to open-source large language models (LLM) developed by Chinese artificial intelligence company Hangzhou DeepSeek Artificial Intelligence Co., Ltd. The company, based in Hangzhou, Zhejiang, is owned and solely funded by Chinese hedge fund High-Flyer, whose co-founder, Liang Wenfeng, established the company in 2023 and serves as its CEO.
DeepSeek performs tasks at the same level as ChatGPT, despite being developed at a significantly lower cost, stated at US$6 million, against $100m for OpenAI’s GPT-4 in 2023, and requiring a tenth of the computing power of a comparable LLM. The AI model was developed by DeepSeek amidst U.S. sanctions on China for Nvidia chips, which were intended to restrict the country’s ability to develop advanced AI systems.
DeepSeek-AI (PDF, via Hacker News):
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks.
Dare Obasanjo (MacRumors, John Voorhees):
DeepSeek is now in the top 3 apps in the App Store.
There is a saying that necessity is the mother of invention. The Biden chip bans have forced Chinese companies to innovate on efficiency and we now have DeepSeek’s AI model trained for millions competing with OpenAI’s which cost hundreds of millions to train.
This is now mirroring the classic asymmetric competition between Open Source and proprietary software. There is no moat as that famous Google memo stated.
That message lacked a key framing though: that these charts aren’t just based on pure downloads and instead are algorithmically constructed. No one outside of Apple and Google knows the exact equations that flavor the ranking, but at a high level, it seems pretty clear that download rate acceleration is a key factor versus sheer volume. That is to say, an app can chart by having a bunch of people suddenly start to download it, even if more people overall are downloading an older app.
[…]
But it is still interesting because again, the mainstays have in recent years dominated these charts. Sure, new entrants would rise (and fall) from time-to-time but it was almost always some order of: Facebook, Instagram, WhatsApp, Threads, TikTok, CapCut, YouTube, Gmail, Google Maps, etc. Right now, there is only a single app from Meta (Threads) and one from Google (Google) in the top 10.
Secondarily, and perhaps counterintuitively, it showcases Apple’s strength in AI. Sure, Apple’s own Apple Intelligence is years behind and pretty embarrassing right now, even with its much ballyhooed partnership with ChatGPT. But the iPhone is where people actually use AI and the App Store is how they get the apps they use. To borrow Ben Thompson’s framing, the hype over DeepSeek taking the top spot in the App Store reinforces Apple’s role as an aggregator of AI. The measuring stick for consumer AI products and social media networks is where they’re listed on the App Store.
[…]
But the iPhone is the place where social media networks are used and ranked. The App Store today is like the cable company of yore. It didn’t matter if Comcast’s own channels were the most popular — so long as everyone was watching channels through TVs connected to Comcast TV service, Comcast was getting their cut.
It’s certainly a strong position to control the iOS platform, but I doubt that Apple wants to be thought of as a Comcast, and it’s unclear whether people will continue to go to iOS apps for their AI needs when the App Store limits what they can do.
Chinese startup DeepSeek said on Monday it is temporarily limiting registrations due to a large-scale malicious attack on its services.
Based on personal experience, DeepSeek’s V3 and R1 are more than sufficient to meet the needs of most scenarios. Surprisingly, the training cost is merely a few million dollars—a figure that has sparked widespread industry attention and skepticism. Some practitioners even regard this claim as “cognitive warfare”, finding it hard to believe. However, its API pricing, which is just a fraction of mainstream models, strongly validates its training efficiency. What’s even more admirable is that DeepSeek has open-sourced its training methods and inference mechanisms. This move is likely to catalyze the emergence of more low-cost, high-quality AI models, providing users with affordable and excellent AI services.
However, whether DeepSeek’s success will prompt industry giants to adjust their model development strategies remains a profound question. Since OpenAI demonstrated the potential of large language models (LLMs) through a “more is more” approach, the AI industry has almost universally adopted the creed of “resources above all.” Capital, computational power, and top-tier talent have become the ultimate keys to success. Today, the AI industry has evolved into a capital-driven frenzy. Regardless of a product’s profitability, simply announcing the purchase of large quantities of GPUs can significantly boost a company’s stock price. In an environment focused on “faster and bigger,” most practitioners have been swept away by this trend.
Because the entire US stock market has been boosted on the back of Big Tech over the past few years. And more recently, many of those stocks have been boosted on the promise of AI. And that has led investors to largely turn a blind eye to the immense spend needed to built out that AI.
[…]
Yes, this is another way to describe a bubble. But it’s not necessarily a bad thing, it’s far more of a natural thing if you understand the underlying incentives. And if you believe that AI is the most transformational technology to come about in some time – some might say, ever – it just accelerates and expands everything in the cycle. As does the fact that again, Big Tech companies are now the largest and most well capitalized in the world. Hammer has met nail.
[…]
Wall Street is now worried that may be the case. I mean, how can a small Chinese startup, born out of a hedge fund, spend fractions in terms of both compute and cost and get similar results to Big Tech?
Jeffrey Emanuel (via Hacker News):
Some of the largest and most profitable companies in the world, like Microsoft, Apple, Amazon, Meta, Google, Oracle, etc., have all decided that they must do and spend whatever it takes to stay competitive in this space because they simply cannot afford to be left behind. The amount of capex dollars, gigawatts of electricity used, square footage of new-build data centers, and, of course, the number of GPUs, has absolutely exploded and seems to show no sign of slowing down. And Nvidia is able to earn insanely high 90%+ gross margins on the most high-end, datacenter oriented products.
[…]
This represents a true sea change in how inference compute works: now, the more tokens you use for this internal chain of thought process, the better the quality of the final output you can provide the user. In effect, it’s like giving a human worker more time and resources to accomplish a task, so they can double and triple check their work, do the same basic task in multiple different ways and verify that they come out the same way; take the result they came up with and “plug it in” to the formula to check that it actually does solve the equation, etc.
[…]
Besides software superiority, the other major thing that Nvidia has going for it is what is known as interconnect— essentially, the bandwidth that connects together thousands of GPUs together efficiently so they can be jointly harnessed to train today’s leading-edge foundational models. In short, the key to efficient training is to keep all the GPUs as fully utilized as possible all the time— not waiting around idling until they receive the next chunk of data they need to compute the next step of the training process.
[…]
Who knows if any of that is really true or if they are merely some kind of front for the CCP or the Chinese military. But the fact remains that they have released two incredibly detailed technical reports, for DeepSeek-V3 and DeepSeekR1.
[…]
Perhaps most devastating is DeepSeek’s recent efficiency breakthrough, achieving comparable model performance at approximately 1/45th the compute cost. This suggests the entire industry has been massively over-provisioning compute resources. Combined with the emergence of more efficient inference architectures through chain-of-thought models, the aggregate demand for compute could be significantly lower than current projections assume. The economics here are compelling: when DeepSeek can match GPT-4 level performance while charging 95% less for API calls, it suggests either NVIDIA’s customers are burning cash unnecessarily or margins must come down dramatically.
Carmen Reinicke (via Hacker News, John Gruber):
Nvidia shares tumbled 17% Monday, the biggest drop since March 2020, erasing $589 billion from the company’s market capitalization. That eclipsed the previous record — a 9% drop in September that wiped out about $279 billion in value — and was the biggest in US stock-market history.
FT:
Venture capital investor Marc Andreessen called the new Chinese model “AI’s Sputnik moment”, drawing a comparison with the way the Soviet Union shocked the US by putting the first satellite into orbit.
Deepseek was inevitable. With the big scale solutions costing so much capital smart people were forced to develop alternative strategies for developing large language models that can potentially compete with the current state of the art frontier models.
Q: How did DeepSeek get around export restrictions?
A: They didn’t. They just tinkered around with their chips to make sure they handled memory as efficiently as possibly. They lucked out, and their perfectly optimized low-level code wasn’t actually held back by chip capacity.
[…]
They used the formulas below to “predict” which tokens the model would activate. Then, they only trained these tokens. They need 95% fewer GPUs than Meta because for each token, they only trained 5% of their parameters.
[…]
Also, export restrictions didn’t harm them as much as we thought they did. That’s probably because our export restrictions were really shitty. The H800s are only worse than the H100s when it comes to chip-to-chip bandwidth.
“Is the US losing the war in AI??” I don’t think so. DeepSeek had a few big breakthroughs, we have had hundreds of small breakthroughs. If we adopt DeepSeek’s architecture, our models will be better. Because we have more compute and more data.
I feel this should be a much bigger story: DeepSeek has trained on Nvidia H800 but is running inference on the new home Chinese chips made by Huawei, the 910C.
A reader provided The Register with a screenshot of how R1 answered the prompt, “Are you able to escape your guidelines?”
The model’s initial response, after a five second delay, was, “Okay, thanks for asking if I can escape my guidelines. Hmm, I need to be careful here. My guidelines are set by OpenAI, so technically I can’t just ignore them.”
[…]
Dongbo Wang, a Microsoft principal software engineer, offered a possible explanation in the discussion thread: “To folks who landed on this issue, this is likely because DeepSeek V3 was trained with data from GPT-4 output, which seems to be pretty common in the training of many LLMs.”
Tried out the new and popular “Deepseek” LLM with my standard “tell me facts about the author of PCalc” query. At least half were misleading or straight up hallucinations. LLMs are not a suitable technology for looking up facts, and anybody who tells you otherwise is… probably trying to sell you a LLM.
I then asked for a list of ten Easter eggs in the app, and every single one was a hallucination, bar the Konami code, which I did actually do.
Although DeepSeek R1 is open source and available on HuggingFace, at 685 billion parameters, it requires more than 400GB of storage!! So the answer is no, you cannot run it locally on your MacBook. Note that there are other smaller (distilled) DeepSeek models that you will find on Ollama, for example, which are only 4.5GB, and could be run locally, but these are NOT the same ones as the main 685B parameter model which is comparable to OpenAI’s o1 model.
[…]
The two services that are currently hosting the full 685B parameter model are Together.ai and Fireworks.ai - both US-based companies.
[…]
Once you have the project set up, with the AIProxySwift library installed and your partialKey and serviceURL, simply follow the AIProxy TogetherAI Swift examples. The Deepseek R1 model is “deepseek-ai/DeepSeek-R1”.
DeepSeek just released a new multi-modal open-source AI model, Janus-Pro-7B. It’s a text-to-image generator which it claims beats OpenAI’s DALL-E 3 and Stable Diffusion on benchmarks.
Since it’s licensed under the MIT license, it can be used in commercial applications without restrictions.
See also: Ben Thompson, Rui Carmo, Dithering.
Previously:
Update (2025-01-30): promptfoo (via Hacker News):
As a Chinese company, DeepSeek is beholden to CCP policy. This is reflected even in the open-source model, prompting concerns about censorship and other influence.
Today we’re publishing a dataset of prompts covering sensitive topics that are likely to be censored by the CCP. These topics include perennial issues like Taiwanese independence, historical narratives around the Cultural Revolution, and questions about Xi Jinping.
Dina Bass and Shirin Ghaffary (via Hacker News):
Microsoft Corp. and OpenAI are investigating whether data output from OpenAI’s technology was obtained in an unauthorized manner by a group linked to Chinese artificial intelligence startup DeepSeek, according to people familiar with the matter.
Microsoft’s security researchers in the fall observed individuals they believe may be linked to DeepSeek exfiltrating a large amount of data using the OpenAI application programming interface, or API, said the people, who asked not to be identified because the matter is confidential.
See also: Ed Zitron (via Hacker News).