Friday, July 28, 2023

The Mess at Stack Overflow

Ayhan Fuat Çelik:

Over the past one and a half years, Stack Overflow has lost around 50% 35% of its traffic (Update: Around 15% of the observed loss seems to be related to the recategorization of the Google Analytics Cookie around May 2022.[…]). This decline is similarly reflected in site usage, with approximately a 50% decrease in the number of questions and answers, as well as the number of votes these posts receive.

Via David Mimno:

This is not sad, this is a warning. A company did everything (mostly) right: created a vitally useful free service, worked hard to keep it healthy, shared data for research. Then another company swooped the data and made it compete against itself.

[…]

To be clear, this is really bad for LLM cos as well. You can only pull the “snarf an existing web community” trick once, folks like Reddit have already wised up. Free data is as dead as zero interest rates.

I’m not sure how much of this has to do with AI. What is the evidence for that? It doesn’t seem like a good substitute, at least for the way I use Stack Overflow. Sure, some people are probably asking ChatGPT or Copilot instead, but so many so quickly?

My impression is that Stack Overflow had been kind of losing its way for years. The founders got distracted by other projects, then left, and the new management seemed more interested in business deals than in improving the product. They lost sight of what was best for the user/community.

Jon Ericson (via Hacker News):

I’ve been on vacation, so I haven’t been following the Stack Overflow moderator strike. Not that there has been much progress. Negotiations stalled for a variety of reasons. Meanwhile Stack Overflow’s CEO, Prashanth Chandrasekar, dug the company’s hole a bit deeper during an interview with VentureBeat.

[…]

By contrast Prashanth regularly talks about combining community and AI without going into detail about how that solves the problem at hand. Neither does he go into much detail about the problems the company intends to solve. I suspect one reason is that Prashanth, who has spent most of his career in management, has become something of an architecture astronaut. As Joel puts it, “architecture people are solving problems that they think they can solve, not problems which are useful to solve.” Since there is overlap between a Q&A site and generative artificial intelligence, there must be a way of jamming them together.

But there’s another factor. In May I wrote about Stack Overflow’s business, which lost $42 million over 6 months and had just laid off 10% of its employees. Since then, the company’s fiscal year-end results came out. Despite growing revenue, it lost $84 million over the year ending on March 31, 2023. In fact Prosus’ entire education technology segment lost money despite growing income[…]

[…]

While this might be the company’s public position, Prashanth privately wanted to limit who can access the data. On March 28, 2023, he ordered the data dump not be uploaded to Archive.org. The DBA who turned it off warned that the community would notice and it did. Rather than having an answer prepared, the company publicly struggled for an answer. Internal communication shows most of the company was as surprised as the rest of us.

It seems like they could have been profitable at a smaller size, but they grew way too much and got rid of unique features people liked, such as the jobs board.

Danny Thompson:

STACK OVERFLOW JUST ANNOUNCED THEIR OWN AI!!!

OverflowAI is a tool, that will also have a VS Code plugin. The way this works, if you are on the site and ask a question, it will produce the answer for you while also citing the sources it used to produce the answer.

You can then ask more in the conversational area, even including code, and through Generative AI it can continue building off of the answer.

Previously:

Update (2023-08-04): Priyam (via Hacker News):

If we look closely, the most drastic drop starts around April of 2022, while ChatGPT came out 7 months later in November. While we do see drops every summer (school breaks) and winter (workplace vacations), this drop in April 2022 is sustained and only getting worse.

[…]

There are 4 reasons that explain the slow decline of Stack Overflow.

Update (2023-08-09): Rob Napier:

When someone on SO asks about the internal details of something, please stop chastising them that it’s internal and they shouldn’t need to know.

They want to know. That is all the reason they need.

Some questions are very advanced when it’s clear something simpler was meant, so probe that. But I’m tired of seeing folks who ask “just to learn” and get fussed at. Where do you think we get the next compiler devs?

10 Comments RSS · Twitter · Mastodon

I don't understand these companies like StackOverflow (and Twitter pre-Elon for that matter) that are clearly successful and yet manage to lose money endlessly. I guess I blame VC & greed. Any small company knows there is only one important factor in running your business, and that is ensuring that revenue > expenses. If you want to spend more than your revenue, then you damn well better have a plan for ensuring the revenue is increased in the short term. But VC screws with this, it stops the short term being short enough, and demands the increase in revenue be unrealistic. There is no way StackOverflow or pre-Elon Twitter couldn't be a profitable business - just not while expenses are dramatically unrealistic compared to revenue because "they can".

Beatrix Willius

I have no good experience using StackOverflow. They concentrate more on grammar and having the question formatted perfectly.

My last question was closed immediately because it wasn't "development related" or some such nonsense. It was about a CLI command which I wanted to use in my app. In code.

StackOverflow is not a friendly place.

"Sure, some people are probably asking ChatGPT or Copilot instead, but so many so quickly?"

Every single developer I personally know is using LLMs to help with programming. The only reason you wouldn't immediately do that is if your IDE doesn't support it and you can't switch away from it.

"My impression is that Stack Overflow had been kind of losing its way for years"

That is also true.

Jean-Daniel

What I don't get with this AI mode, is that current AI can't be better than the content they are based on.
By destroying source of user content, they are doomed to keep using older and older content to train the model, and will never be capable to adapt to the rapid evolution of software development. (Chat GTP which was trained with 2021 data is not capable to generate Swift code using newer language features for instance).

I think the slide started way too early to be caused by chatbots. Declining traffic follows declining posts and votes, which started much earlier.

The kind of questions Stack Overflow is good for, chatbots get wrong, in my experience, i.e. where the right answer isn’t the obvious one. If they learnt from Stack Overflow, they seem to weigh the answers by repetition, not vote counts. Often the only way to get the correct answer is to tell the chatbot that it’s answer was wrong multiple times. So I often wish there was a feedback mechanism to vote up the right answer. Maybe that’s what Stack Overflow will do.

As a long time user of Stack Overflow, it *has* lost its way. It is no longer a good place for asking general questions about programming. I won't put my finger on exactly why, but I will say that most of the questions I've asked there recently have experienced one or more of the following:

- Downvoted for no reason
- Voted to be closed as being "off topic" even though it was legitimate question that followed their rules
- Erroneously closed as being a duplicate by people who didn't bother to read and fully comprehend the question
- Answered with low quality responses by people who didn't bother to read and fully comprehend the question
- Received no answers at all
- Erroneously closed and accused of being written by ChatGPT

To be fair, the last question I asked did get a real answer and wasn't downvoted, but it was a more general C++ question. Being that I'm a long time and experienced programmer, and I think I'm a pretty damn good one too, that means that when I have a question to ask, it's probably something complex, nuanced, and perhaps a bit arcane. These questions don't tend to do very well, and are the sorts that attract bad actions from bad or ignorant actors.

I hypothesize that most of Stack Overflow's problem is because of this. Their user experience has tanked over the last decade. I begrudgingly resort to using Stack Overflow when I don't have another option knowing that it'll most likely not only be useless but also make me frustrated, rather than delighting in using it because it'll connect me with other knowledgable developers who want to share their knowledge.

Old Unix Geek

Why bother answering questions if they're just going to be pilfered by the owner of some LLM?

Why bother writing open source in your free time if it's just going to be pilfered by the owner of some LLM?

Why bother writing books or magazine articles if it's just going to be pilfered by the owner of some LLM?

People are lazy. They'd prefer to ask the LLM than reading a book and understanding problems in depth. But when everyone does that, actually understanding things in depth becomes prohibitively expensive. Why spend years going to university, if your competition can get the job out of high-school, by copying and pasting what the LLM says? And if so, the salary for these jobs will fall since there's no point paying big bucks to people who can easily be replaced -- which will make studying things at university in depth too expensive. So people doing these jobs will have no choice but to rely on LLMs.

I expect the future will be a lot harder for "newbies" who want to understand things deeply. Things will be similar to how it was when I learned (no internet & few hard to get books). And, as when I learned, actually understanding things deeply will be seen as a waste of time. "why are you wasting your time with that stupid computer when you could learn the Classics?" Shame, since the best engineers understand things deeply... and that's where the insights to revolutionize things comes from.

So, ironically, I see ChatGPT as a knowledge destroyer, not a knowledge enhancer the way Google search was.

Oh well. You reap what you sow. Ironically, I wouldn't be surprised if tech advanced fastest in countries that ban LLMs. In those countries you won't be able to fake it by asking a LLM, and instead you will understand things deeply enough to be productive. I'm thinking of Russia (which has a high proportion of engineers and no longer much likes us) and China (which has good reasons to distance itself from us since we're threatening it with war).

Clarence Odbody

I went to Stack Overflow a few hours ago to post a question about SwiftUI. I jumped through a bunch of hoops, using its guided question composer tool, and at the end it wouldn't let me post it. It didn't say why, but it was a perfectly normal question, I stated my problem, included the code at issue, and explained what steps I had already taken.

I have no idea why I couldn't post it. I instead posted the same question into Microsoft's Bing chat thing, and seconds later I got a reply saying though there was ~"nothing obviously wrong with my code, here are a few things to try" with written code examples. The second suggestion solved my issue.

Total turnaround? Less than 5 minutes. Why would I jump through SO's hoops next time?

*Why bother answering questions (...) Why bother answering questions if they're just going to be pilfered by the owner of some LLM?"

I don't really see the causal relationship between LLMs and not wanting to do creative stuff that gives you Internet points. As far as I can tell, contributions to GitHub haven't slowed down since LLMs started to appear, and the reason LLMs harm SO isn't that people no longer want to answer questions, it's that if you have a question, LLMs answer them much faster and with less hassle.

Old Unix Geek

@Plume

It's possible you have not been paying attention, or that you are too young to notice. I am reminded of the insects. Unless you've lived long enough, you probably don't even realize their population is collapsing. At the time the use of insecticide increased, concerns were dismissed: "Oh, the number of insects hasn't fallen since we introduced it. It is safe!" There is always a lag to things. The longer the lag, the harder it is to notice.

The Book publishing industry is down 24% this year. That's insane given how long it took to build up that 24%. And it's very worrying since it shows people are reading less. Perhaps because the generations who learned to appreciate reading are getting old or dying. This would be the result of a long lag, where Millennials and the younger generations have shorter attention spans and prefer videos because that is what they grew up with.

A world without books is a very different world. There are parts of the world where so few people read that a bestseller sells 5000 copies for a population of 280 million people. The result is an impoverished culture with few new ideas. There's no reason that this can't happen here. Brave New World projected a future like this.

A lot of Github contributions are for "employment portfolio" purposes. Indeed, it's even taught as "the thing successful software engineers do" in programming classes these days. This forgets why great software engineers did it, but instead is a form of cargo-culting. These are the "internet points" people, the "I have a career in high tech" people, and the "jump through the hoops to get a career" people. No doubt some exist on Stack Overflow too.

Another large category of people on GitHub publish work paid for by an employer. Sure, that's nice, but it's not going to be particularly novel or wild. It's going to be incremental improvements to things that exist, for business reasons, made available to others, for business reasons.

The people who actually care to create something because it's new and cool seem to me to be on the way out. They never cared for your "internet points". There was an explosion of such things back in the 90s: apache, nginx, linux, ghc, git, darcs, vim, enlightenment, kde, gnome, webkit, djgpp/cwsdpmi which Quake was built on, to mention but a few. Then many of these things got taken over by corporations. There was a similar burst of creativity among the Mac software indies for a while about 15 years ago, but that too seems over.

I expect the trend to continue, because it does matter to creative people that their creations are appreciated and acknowledged as their creations, if they share them. This is not a new phenomenon you can brush off as "internet points": one sees it in books where "The author asserts his moral right...". In French, it is recognized directly: "les droits d'auteur" (the rights of the author) is a less abstract concept than "copyright". LLMs simply take that which was not offered, and acknowledge no one. It doesn't really matter if courts or governments decide this is fair use -- the result will be the same: less published creativity.

I think there was a phase change in the quality of software when the authors of software stopped being acknowledged: treat creative people as disposable cogs and you find creativity disappears. Instead you get bland corporate products running on Electron, built with the "Extreme Programming" process and a "Scrum master".

Leave a Comment