Pruning GitHub’s Code Search Index
Starting today, GitHub Code Search will only index repositories that have had recent activity within the last year. Recent activity for a repository means that it has had a commit or has shown up in a search result. If the repository does not have any activity for an entire year, the repository will be purged from the Code Search index.
That seems much less useful. I would rather have a comprehensive seach, even if it’s slower.
7 Comments RSS · Twitter
I wonder if this is cost related? I can think why else?! Surely it is possible to index everything.
When searching for usage of sometimes obscure API calls, this seems like a great way to filter out the *many* results that seem to represent school coding projects. Such result sets are bloated with essentially duplicates of the same project, and being coursework, don't always represent the best code/design quality.
@Gregory At least for me, I’m usually searching for something obscure enough that the problem is too few results, not too many.
Michael,
Yeah, I run into that a lot, too. But I personally can't recall many situations like that where I found quality results in a repository that hadn't been touched in years. YMMV.
Gregory: Are you not a Mac developer? Code "in a repository that hadn't been touched in years" is my bread and butter! That's, like, every (example of every) API created B.S. (Before Swift).
@Chris
A year seems like an awfully short window.
I’m really late to the post, but 100% on the nose on this one. A year? Five maybe, eight sounds reasonable.
A year seems a pretty aggressive, rabbit out of a hat, management style pick that does not seem based on user input.
That said, all you have to do is show up in a search the previous year. One year is still brief, but at least libraries that thousands of companies inadvertently set as cornerstones of their SaaSes will still likely appear. 🙄