Implementing a Local Audio Transcription Service with OpenAI Whisper and Docker


Implementing a Local Audio Transcription Service with OpenAI Whisper and Docker

As an IT professional, finding efficient and innovative solutions to unique project requirements is part of the job. Recently, I was tasked with setting up a service that could transcribe audio into text locally, without relying on an internet connection. This service needed to run on a Linux system, ensuring reliability and security in a contained environment. Solution Exploration

After some research, I identified OpenAI Whisper as a potential solution. I tested Whisper on my laptop to ensure its effectiveness, and it performed exceptionally well. However, for deployment on the final system, I decided to use Docker to streamline the setup and ensure ease of use and portability.

Leveraging Docker for Seamless Deployment

Docker is a powerful tool for creating, deploying, and managing containerized applications. By containerizing OpenAI Whisper, we can encapsulate the application and its dependencies, making it easier to deploy and maintain. During my search for existing Docker solutions, I discovered Ahmet Öner's ready-to-use Docker container that perfectly matched my requirements.

Docker Setup Documentation

Below is the detailed Docker documentation I created to set up the audio transcription service on a Linux server:

Credits

Docker Webservice by Ahmet Öner

Setup

Docker

  • Check if you have Docker installed docker -v
  • Install Docker if not installed sudo snap install docker

Pull the Docker image

Start the service

There are GPU and CPU versions.

  • See the available model list and replace base in the docker run command with the model you prefer to use.

  • CPU
    sudo docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest

  • GPU
    sudo docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu

Use it

  • Open http://localhost:9000/
  • Expend ASR drop down
  • Click "Try it out" button in the top right corner
  • Pick the language
  • Select audio file
  • Click "Execute"
    • Usually, it takes 1-2 minutes to transcribe a couple-minute conversation.

Docker commands

sudo docker ps list running services.
sudo docker images list available images.
sudo docker pause container_name/ID Pause all processes within one or more containers
sudo docker unpause container_name/ID Unpause all processes within one or more containers
sudo docker start container_name/ID Start one or more stopped containers
sudo docker stop container_name/ID Stop one or more running containers
sudo docker restart container_name/ID Restart one or more containers

Previous Next