Friday, August 11, 2023

CNET Deletes Thousands of Old Articles to Game Google Search

So speaking as someone who’s adjacent to the SEO industry (not my job, but I’ve spent a couple of decades in publishing, digital media, and analytics), I can share a little detail about what I suspect is going on here.

“Content pruning” is a common practice, and largely includes taking out of date content so that readers can focus on more current and/or profitable content. This is routine for large sites, and usually includes updating out-of-date but popular articles. Also has the benefit of trimming the amount of content to manage - spring cleaning, if you will.

From an SEO perspective, Google will dedicate limited resources to indexing any given site (its so-called “crawl budget”). If you take down the pages that aren’t doing you any good because they’re unprofitable, Google stops spending resources on those pages, and stops sending traffic to pages that don’t make money. If you’re lucky and have better pages with relevant content, Google will hopefully send those people to those better pages instead.

[…]

As for why Google says this isn’t necessary, well, CNET and Google have different objectives.

Thomas Germain (via Slashdot, Hacker News):

Archived copies of CNET’s author pages show the company deleted small batches of articles prior to the second half of July, but then the pace increased. Thousands of articles disappeared in recent weeks. A CNET representative confirmed that the company was culling stories but declined to share exactly how many it has taken down. The move adds to recent controversies over CNET’s editorial strategy, which has included layoffs and experiments with error-riddled articles written by AI chatbots.

“Removing content from our site is not a decision we take lightly. Our teams analyze many data points to determine whether there are pages on CNET that are not currently serving a meaningful audience. This is an industry-wide best practice for large sites like ours that are primarily driven by SEO traffic,” said Taylor Canada, CNET’s senior director of marketing and communications. “In an ideal world, we would leave all of our content on our site in perpetuity. Unfortunately, we are penalized by the modern internet for leaving all previously published content live on our site.”

[…]

Removing, redirecting, or refreshing irrelevant or unhelpful URLs “sends a signal to Google that says CNET is fresh, relevant and worthy of being placed higher than our competitors in search results,” the document reads.

Danny Sullivan:

Are you deleting content from your site because you somehow believe Google doesn’t like “old” content? That’s not a thing! Our guidance doesn’t encourage this.

Nick Heer:

A bunch of SEO types Germain interviewed swear by it, but they believe in a lot of really bizarre stuff. It sounds like nonsense to me. After all, Google also prioritizes authority, and a well-known website which has chronicled the history of an industry for decades is pretty damn impressive. Why would “a 1996 article about available AOL service tiers” — per the internal memo — cause a negative effect on the site’s rankings, anyhow? I cannot think of a good reason why a news site purging its archives makes any sense whatsoever.

It’s quite possible the consultants were taking them for a ride or are just wrong. But it’s also possible that the SEO people who follow this stuff really closely for a living have figured out something non-intuitive and unexpected. Google obviously doesn’t want to say that it incentivizes sites to delete content, and the algorithms are probably not intentionally designed to do that, but that doesn’t mean this result isn’t an emergent property of complex algorithms and models that no one fully understands.

Danny Sullivan:

Indexing and ranking are two different things.

Indexing is about gathering content. The internet is big, so we don’t index all the pages on it. We try, but there’s a lot. If you have a huge site, similarly, we might not get all your pages. Potentially, if you remove some, we might get more to index. Or maybe not, because we also try to index pages as they seem to need to be indexed. If you have an old page that doesn’t seem to change much, we probably aren’t running back ever hour to it in order to index it again.

[…]

People who believe removing “old” content aren’t generally thinking that’s going to make the “new” pages get indexed faster. They might think that maybe it means more of their pages overall from a site could get indexed, but that can include “old” pages they’re successful with, too.

fshbbdssbbgdd:

Suppose CNET published an article about LK99 a week ago, then they published another article an hour ago. If Google hasn’t indexed the new article yet, won’t CNET rank lower on a search for “LK99” because the only matching page is a week old?

If by pruning old content, CNET can get its new articles in the results faster, it seems this would get CNET higher rankings and more traffic. Google doesn’t need to have a ranking system directly measuring the average age of content on the site for the net effect of Google’s systems to produce that effect. “Indexing and ranking are two different things” is an important implementation detail, but CNET cares about the outcome, which is whether they can show up at the top of the results page.

It would be nice to look at concrete data. Google knows how the CNET pages rank in its index, and CNET knows how its traffic changed (or didn’t) after the deletions. But so far neither is sharing.

Previously:

Update (2023-08-15): Nick Heer:

The whole entire point of a publisher like CNet is to chronicle an industry. It is too bad its new owners do not see that in either its history or its future.

Adam Engst:

Though I’m dubious of most SEO claims based on my experience with the TidBITS and Take Control sites over decades, it’s conceivable that SEO experts have discovered a hack that works—until Google tweaks its algorithms in response. Regardless, I disapprove of deleting legitimate content because there’s no predicting what utility it could provide to the future; at least CNET says it’s sending deleted stories to the Internet Archive.

Update (2023-08-16): Chris Morrell:

I will say that Google has a history of publicly stating things about rankings that were measurably untrue. I would not at all be surprised to find out that “content pruning” is actually effective and is just another way Google’s search algos incentivize bad content decisions.

[…]

Google has claimed for years that they crawl client-side JS just fine, but almost everyone knows that’s not true. They’ve also said very clearly that Core Web Vitals are important but experimentation shows they have minimal impact.

I’m not advocating for deleting content on the web, but I do think that Google has put a lot of publishers in a position to second-guess everything because what they say often doesn’t match the evidence.

Update (2023-08-22): Nik Friedman TeBockhorst:

So speaking as someone who’s adjacent to the SEO industry (not my job, but I’ve spent a couple of decades in publishing, digital media, and analytics), I can share a little detail about what I suspect is going on here.

“Content pruning” is a common practice, and largely includes taking out of date content so that readers can focus on more current and/or profitable content. This is routine for large sites, and usually includes updating out-of-date but popular articles. Also has the benefit of trimming the amount of content to manage - spring cleaning, if you will.

From an SEO perspective, Google will dedicate limited resources to indexing any given site (its so-called “crawl budget”). If you take down the pages that aren’t doing you any good because they’re unprofitable, Google stops spending resources on those pages, and stops sending traffic to pages that don’t make money. If you’re lucky and have better pages with relevant content, Google will hopefully send those people to those better pages instead.

[…]

As for why Google says this isn’t necessary, well, CNET and Google have different objectives.

8 Comments RSS · Twitter · Mastodon

"If by pruning old content, CNET can get its new articles in the results faster, it seems this would get CNET higher rankings and more traffic."

That is absolute nonsense.

On a truly massive site with hundreds of thousands or millions of pages, deleting a substantial amount of old content will alter the crawl patterns. That MIGHT improve optimization but there's no guarantee it would.

It comes down to page quality regardless of size of site.

If you have a LOT of content the search algorithms deem to be "low quality" (and no SEO tool or consultant can accurately tell you what the search engines conclude), then deleting that low quality content -SHOULD- have a beneficial impact, per various Googlers who have said through the years that the more low quality content (per THEIR definitions) that a site contains, the less likely the site will perform well in search.

From the SEO practitioner/Website owners' perspective, it's always going to be a bit of a crapshoot because we don't know which content the search engines' algorithms have deemed to be "low quality".

If your site is all about getting clicks and not about publishing news articles, then just take everything down and join the ranks of sites that post nothing but scammy click-bait articles.

If I read the article correctly, that's the goal so why keep all the expensive pretense of caring about anything else.

[…] SEO Games ⇥ mjtsai.com […]

Way back when I first got a hotmail address, CNET and ZDnet were my go-to sites for quality tech news. They became ever more clickbaity with poorer quality articles. This development just cements that slide into irrelevance.

I like how seo and its priesthood is coming into public focus.

You’re right that expert SEOs stumble upon methods that are counter-intuitive (and user-hostile) but work. There’s plenty of junior SEOs who believe in data-less nonsense, but logic such as “Google likes authority thus it likes old content” is no way to interpret cutting-edge SEO.

Regarding what Googlers are saying regarding SEO, you cannot trust it when you want to sweat the details, even when they have no reason to lie/exaggerate. Counter-examples frequently pop up.

Up until relatively recently I ran both SEO/Audience Development and Editorial for a big publisher, so this is an area I'm intimately familiar with. Unfortunately some of the info that Danny S has put out is leading people to some very bad conclusions.

Why do you delete old content? First of all – you never just delete it. You always redirect it. Deleting it leaves you with a 404, and Google doesn't like lots of 404s suddenly appearing. It also makes it harder for Google to actually crawl your site: in technical SEO-speak, it may lead to "crawl paths" breaking, and content that you *want* Google to find not longer being available to it. So when getting rid of content, you have to evaluate the impact on crawl paths, and you have to use a redirect to tell Google where the most appropriate content is to answer any search queries that the old page was getting. At the very least, if there's no appropriate page, you redirect to the home page rather than leaving a 404.

So why do you unpublish and redirect old content? There are quite a few reasons for it, and I really wish people would read the whole of Cnet's document about the criteria and process for doing this because it explains quite a few of them.

First, any site that has been around a long time has content on it which is "thin" – short content, often in the case of sites like Cnet "news in brief" style posts which you just wouldn't write now. Google has explicitly said on numerous occasions that large amounts of thin content can damage the overall rankings of a site. To give a real world example, I when I worked on SEO for The Week about nine years ago, we culled about 10,000 pages of thin content which had almost zeros pageviews and saw all our rankings rise as a result.

Second, you will have a lot of content which is repetitive. News stories are often a single new piece of information padded out with a whole lot of very samey background. Over time, you can get a lot of content which is very, very similar. Again, Google doesn't like content like that. What Google likes is clarity over which of your pages should rank for a single search term. If you have a lot of pages ranking for the same term, you are much less likely to achieve the all-important top three rank for that term. It doesn't matter how long your site has been around, how big a publisher you think you are, or how much "authority" you think you have – you will fail to get high rankings if you don't have clarity over which is your main page for answering a search query.

Third, you are always trying to deliver the best page for whatever search query you're targetting, and that can mean redirecting old content. Say you're working on a car review site and a new version of the Ford Kuga comes out. You already have a review of the previous version. So you just create a new page for the new review, right?

Wrong. This, as far as Google is concerned, would mean you have two reviews for the Ford Kuga. If someone is searching for "Ford Kuga review" which one is the "right" page to send them to? Google doesn't know for sure. If you go by age, it will send them to review of the old, outdated model which isn't the one they can now buy – so it would be a worse experience for the user.

So as a publisher, your best option to serve users is to delete the old review and redirect it to the new one. That tell Google "you know that old page which matched 'Ford Kuga review'? This is the new version, so please consider it the authority for that search term". It's a better match for what users want than the old page, so Google says "sure, I'll pass all the authority that old page had accumulated on to the new one. Have fun, kids!".

Honestly, it's so annoying seeing people who haven't run anything bigger than a one-person blog pop up with opinions on this when people who spend their entire lives studying and testing different strategies and tactics for SEO are being completely ignored (and I don't mean you Michael, I think your post is perfectly fair).

SEO for large publishers isn't just reading runes, it's something we work at really hard and have to justify and evaluate with real world data of the impact of the changes we make. And it evolves all the time – strategies that have worked for years can be thrown out overnight if Google's algorithm changes, which it does all the time (the number of reports into the impact of algo changes that I've written is crazy...)

@Ian Thank you for taking the time to write all that.

Leave a Comment