Thursday, December 17, 2020

Pruning GitHub’s Code Search Index

GitHub (Hacker News):

Starting today, GitHub Code Search will only index repositories that have had recent activity within the last year. Recent activity for a repository means that it has had a commit or has shown up in a search result. If the repository does not have any activity for an entire year, the repository will be purged from the Code Search index.

That seems much less useful. I would rather have a comprehensive seach, even if it’s slower.

7 Comments RSS · Twitter

I wonder if this is cost related? I can think why else?! Surely it is possible to index everything.

When searching for usage of sometimes obscure API calls, this seems like a great way to filter out the *many* results that seem to represent school coding projects. Such result sets are bloated with essentially duplicates of the same project, and being coursework, don't always represent the best code/design quality.

A year seems like an awfully short window.

@Gregory At least for me, I’m usually searching for something obscure enough that the problem is too few results, not too many.

Michael,

Yeah, I run into that a lot, too. But I personally can't recall many situations like that where I found quality results in a repository that hadn't been touched in years. YMMV.

Gregory: Are you not a Mac developer? Code "in a repository that hadn't been touched in years" is my bread and butter! That's, like, every (example of every) API created B.S. (Before Swift).

@Chris

A year seems like an awfully short window.

I’m really late to the post, but 100% on the nose on this one. A year? Five maybe, eight sounds reasonable.

A year seems a pretty aggressive, rabbit out of a hat, management style pick that does not seem based on user input.

That said, all you have to do is show up in a search the previous year. One year is still brief, but at least libraries that thousands of companies inadvertently set as cornerstones of their SaaSes will still likely appear. 🙄

Leave a Comment