Monday, February 6, 2023

PodSearch Reborn

David Smith:

Back in 2017 I had created a site which took the the audio of some of my favorite podcasts and tried to make them searchable by passing them through an automated speech-to-text engine.

[…]

Thankfully since then OpenAI has released Whisper a powerful speech-to-text engine that I can run right on my Mac and results in transcripts that are shockingly good. They aren’t quite at the level of a human transcriber but they get darn close in many instances. Getting close to the level where you could use them to grab a pull quote with only a little bit of tidying up to do.

Previously:

Update (2023-02-14): Jason Snell:

While not perfect, Whisper was staggeringly better than the 2017 transcript and really, much better than any other AI-driven transcription I’d tried recently. It got the punctuation. It got proper names. And it didn’t turn “Thanks for listening to The Incomparable, I’ve been your host Jason Snell” into “Goodnight everybody for listening to be uncomfortable, I’ve been your Hostess and smell.”

Fortunately, a fellow named Georgi Gerganov made a C++-native port of Whisper that is easy to install and run on macOS and is optimized for Apple silicon. I downloaded and installed Gerganov’s version, downloaded the medium English model, and discovered that it could transcribe a podcast at rates up to 2x!

This was great, but the last thing I needed was to have to remember all the arcane command-line commands required to get the files in the right place. So instead, I wrote The Transcriptor, a Shortcut that lets me control-click on audio files and turn them into transcripts in a format of my choice.

1 Comment RSS · Twitter · Mastodon

Whisper is quite good. Just for ATP, I have a similar transcript search site also based on Whisper. It directly links to overcast with timestamps and has permalinks:

https://marcoshuerta.com/dash/atp_search/?sort=3&page_num=1&search=%E2%80%9CIguana+flag%E2%80%9D&range_begin=1&range_end=520

Leave a Comment