OpenAudioSearch (OAS) – keywords and research

We recently reported on the features of the Cultural Broadcast Archive (CBA). One of the future features of the CBA is to be the transcription of podcasts. This is where Open Audio Search comes into play.
The first step is to perform audio transcription via a speech recognition engine. This converts the sound files into machine-readable text. The obtained transcripts of the broadcasts are made available in a search engine. The user can use a web interface to search the transcripts. The search result can be listened to directly and is time-stamped. This should help especially with the research!
If you are looking for a podcast or article for a term paper, research or out of interest, you can land directly on the right audio through the automatic categorization and keywording. Another advantage: The engine can also subscribe to RSS feeds. This means that content from different sources can be processed simultaneously. But how does OAS work?

  1. Loading audio & metadata
  2. Preprocess audio data
  3. Transcribe audio files
  4. Extract information from transcript of audio files
  5. Index transcript & metadata in search engine

You can already test at The demo site is currently fed with the RSS feeds of the latest contributions from FRN and CBA. Freelance radio editors are invited to test intensively and consider what other features might be useful for both internal radio use and for a wider audience. Feedback is welcome to or for technical issues in the Github repository.

Open Audio Search is supported by Prototype Fund and netidee grants.