Your own free Text-To-Speech engine in 2 minutes

Imagine you want to create YouTube videos, but you're not too fond of your own voice? You can then use a so-called TTS (Text-To-Speech) service. Such a service converts your written texts into spoken words. Many of these services are offered on the internet. The costs vary. However, it can be more advantageous and safer! Nowadays, you can simply set up your own text-to-speech service on your own computer. And you really don't have to compromise on quality. In this post, I'll explain how you can set up your own free TTS service in no time on your local computer or on a server online. The service supports English and many other languages.

What is needed to run the free TTS-service?

What you need for your free TTS service are the following things:

  • A computer - To run the service. You don't need special hardware like a GPU (graphics card).
  • A browser - To use the text-to-speech service and produce texts that you can play and download.
  • Docker software - To run the free TTS software. This software is free and you can download it here

Steps

  1. Install the Docker Desktop Client software on your computer. This software enables you to run so-called containers on your computer. A container is an isolated piece of software (such as the TTS service we are about to install) that can run independently of all other software on your computer. You can easily remove containers without affecting other programs on your computer.
  2. Create a directory on your local computer from which you want to start the TTS service. In Windows, you can do this via Windows File Explorer. On Linux or MacOS, execute the following command in the Terminal:
1mkdir ~/Documents/my-tts
  1. Create a file named docker-compose.yml in this new directory.
  2. Add the following data to this docker-compose.yml file:
 1version: "3"
 2services:
 3  mimic3:
 4    image: mycroftai/mimic3:latest
 5    container_name: mimic3
 6    ports:
 7      - 59125:59125
 8    volumes:
 9      - ./data:/home/mimic3/.local/share/mycroft/mimic3
10    restart: always
  1. We will execute the docker-compose.yml file in the next step. It contains instructions for the Docker Desktop Client to start the free TTS service.
  • Line 3: Is the unique ID of the service within this docker-compose file.
  • Line 4: Contains the so-called image, which is fetched from the internet and contains the TTS software package Mimic3. This software is from the company MyCroft AI and can be used under a Permissive License. This means that you can use this software for various (commercial and non-commercial) purposes.
  • Line 5: Is the name by which the container will be recognized when running in Docker.
  • Line 7: Contains the port number on which the web server for the TTS service will be available. You can adjust the number on the left to ensure that the service is available on a different port. Do not change the right side!
  • Line 9: Ensures that anything stored within the container in the directory: /home/mimic3/.local/share/mycroft/mimic3 will now be stored in the directory ./data. If you lose your container, the data will still be on your local computer.
  • Line 10: Ensures that the container automatically starts if it has failed.
  1. It's time to start the container and play with text-to-speech. Execute the following command to start the service:
1docker-compose up -d

Converting Text to Speech

Now that the free TTS service via this container has been started, you can go to this URL in your browser: http://127.0.0.1:59125. The following screen will appear.

This is your own free TTS-service. Choose the desired language, and with it the desired voice. The first time you choose a voice, converting text to speech may take a little longer. After that, it will be faster. You can now experiment as much as you like.

To stop the container

If you're done converting text to speech, you can shut down the container to save system resources. Execute the following command in the directory where you placed the docker-compose.yml file:

1docker-compose down

If you want the text-to-speech service to always be available, you don't need to do anything. The service starts when you boot your computer and stops when you shut down your computer.

Good luck with texting!

Translations: