Ars Technica’s AI-Fabricated Quotes
Maggie Harrison Dupré (Hacker News, more context):
The Condé Nast-owned Ars Technica has terminated senior AI reporter Benj Edwards following a controversy over his role in the publication and retraction of an article that included AI-fabricated quotes, Futurism has confirmed.
Earlier this month, Ars retracted the story after it was found to include fake quotes attributed to a real person. The article — a write-up of a viral incident in which an AI agent seemingly published a hit piece about a human engineer named Scott Shambaugh — was initially published on February 13. After Shambaugh pointed out that he’d never said the quotes attributed to him, Ars’ editor-in-chief Ken Fisher apologized in an editor’s note, in which he confirmed that the piece included “fabricated quotations generated by an AI tool and attributed to a source who did not say them” and characterized the error as a “serious failure of our standards.”
I’ve enjoyed Edwards’ work over the years and linked to many of his pieces, but obviously this is a very serious offense.
The individual firing is a distraction from the structural issue. Newsrooms have been cutting editorial staff for a decade, which means the verification layers that would have caught this — fact-checkers, copy editors, senior editors doing source verification — largely don’t exist anymore. Then they adopt AI tools that increase throughput without increasing oversight capacity, and act surprised when fabrication slips through.
This is a classic systems failure: you remove the safety mechanisms, add a new source of risk, and punish the individual operator. It’s the same pattern you see in industrial accidents.
This has been going on for a lot longer than a decade. To me, the takeaway is not that it’s the system’s fault but that many of these media brands are operating on undeserved trust. There’s a lot less checking going on than there used be or that you might imagine.
The other interesting point is that Edwards says the intent was not to use AI to fabricate quotes but as a tool for processing quotes he already had:
During the process, I decided to try an experimental Claude Code-based Al tool to help me extract relevant verbatim source material. Not to generate the article but to help list structured references I could put in my outline.
[…]
I inadvertently ended up with a paraphrased version of Shambaugh’s words rather than his actual words.
Being sick and rushing to finish, I failed to verify the quotes in my outline notes against the original blog source before including them in my draft.
This seems not so different from how I commonly hear people say that they use AI to collect/extract or reformat information into a table. I’ve never understood, given the propensity for hallucination and citing papers that don’t exist, why such a distillation should be trusted. Yet I know that people are already relying on such LLM-derived works to make decisions. At least with code, you can compile it and test it and read it to see whether it makes sense. With a number in a table cell, how can you easily check where it came from?
Previously:
8 Comments RSS · Twitter · Mastodon
I will say that for the table/spreadsheet question. There are a lot of commonly used techniques to verify that data hasn’t been meaningfully changed during processing steps. Checksums, reconciliation, etc. that said, I don’t know how much people using AI tools for this are making use of them.
@gildarts For the table stuff, rather than processing I was thinking of when people say, “Pull these kinds of numbers out of this set of links and collect them in a table.” So I don’t think checksums or reconciliation apply.
If this is the only thing Edwards did, I think he should not have been fired for it. That seems like an honest mistake that lots of people would have made, and that he won't repeat.
I would rather hear what Ars is doing to its publishing process to help prevent something like this from happening again. The answer, though, is probably nothing.
"Pull these kinds of numbers out of this set of links and collect them in a table"
That's nothing. LLMs could, at least in theory, get this right. I have friends who throw in numbers and then have LLMs do statistical analysis on them. Most people have absolutely no grasp of what LLMs actually are, and what they can and can not do.
While complaints about the systemic dysfunction of newsrooms are valid, they should not take from the individual responsibility that each editor/contributor has through the editorial chain.
Ironically, on his website, Benj has a link to the following article:
Someone in that position should have known better, or more likely, has known but did not care enough.
@Michael Tsai: yeah, that is fair. I was thinking about restructuring data rather than fetching it, etc.
Once again, the initials "AI" are used without precisely pinpointing the underlying technology used.
Large Language Models (LLMs) are more prone to confabulations (a more accurate way to describe what they do when they "hallucinate") than Retrieval Augmented Generation (RAG) systems.
To convince yourself, take the same PDF report containing quotes and numerical data, then ask Gemini (a LLM) and NotebookLM (RAG) very precise questions about its content. You'll quickly realise that the latter will more accurately pull information from the PDF than the former.
I don't say that LLMs cannot be accurate, what I mean is that their system instruction doesn't prevent them from pulling information from elsewhere.
Similarly, I don't say that RAG systems cannot be inaccurate, but they are built to pull information from the sources and to link their citations to the part of the source they pulled it from. So the likelihood is much lower.
> I’ve never understood, given the propensity for hallucination and citing papers that don’t exist, why such a distillation should be trusted.
Yes, this doesn't make sense to me either. AI models don't "think" exactly as we do, but in some ways they are analogous to brains. We don't trust our brains to sort lists of numbers or tabulate data, and AIs should have the same issues. AI agents can use tools like we would use a text buffer or piece of paper, but still the process seems fraught to me.