Mimic 1 is a fast, light-weight Text to Speech engine developed by Mycroft AI and VocaliD.
Mimic 1 is low-latency and has a small resource footprint. Its range of high quality voices also set it apart from other open source text-to-speech projects. Apart from being used as the voice of Mycroft, Mimic 1's small resource footprint makes it an attractive choice for other embedded systems.
Mimic 1 works on Linux, Android and Windows currently, and other platforms may be supported in the future. We also anticipate adding more languages to to enable many people to access realistic voices for the first time.
Mimic 1 is a powerful TTS tool, however it can also help solve other important problems. That's why Mycroft.AI has partnered with VocaliD to help Dr. Rupal Patel and her team bring realistic TTS voices to people with speech disorders. VocaliD's technology creates customized voices that better represent the people who use them. To use these voices, VocaliD's clients need a fast, lightweight, cross-platform engine. That's where Mimic 1 comes in! VocaliD’s clients can use Mimic 1 as the engine that empowers them to speak with their own unique voice.
NOTE: If you are installing a Mycroft build for Linux or Picroft, Mimic 1 will be installed as part of the installation dependencies - you don't need to build it separately. Follow the instructions below if you want to build Mimic as a standalone component.
Currently, Mimic 1 runs on Linux (ARM & Intel architectures), Mac OSX, and Windows.
In order to build Mimic 1, you will need the following:
- PCRE and ICU libraries and headers
- An audio engine - for Linux we recommend ALSA, and for Mac OSX and Windows we recommend PortAudio
On Ubuntu or Debian Linux
$ sudo apt-get install gcc make pkg-config automake libtool libicu-dev libpcre2-dev libasound2-dev
On Fedora Linux
$ sudo dnf install gcc make pkgconfig automake libtool libicu-devel alsa-lib-devel
On Arch Linux
$ sudo pacman -S --needed install gcc make pkg-config automake libtool icu alsa-lib
On Mac OSX
First, install Brew:
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
$ brew install pkg-config automake libtool portaudio icu4c
Cross-compiling from Linux
This requires additional packages to be installed.
On Ubuntu 16.04 (xenial):
sudo apt-get install gcc make pkg-config automake libtool libicu-dev libpcre2-dev wine binutils-mingw-w64-i686 mingw-w64-i686-dev gcc-mingw-w64-i686 g++-mingw-w64-i686
On Ubuntu 14.04 (trusty):
sudo apt-get install gcc make pkg-config automake libtool libicu-dev libpcre2-dev mingw32 mingw32-runtime wine
Next, run the Windows build script:
Test that the build executed correctly. The directory into which Mimic was installed will contain a
wine ./mimic.exe -t "hello world"
To distribute the compiled Mimic 1 executable, add everything in the
install/winbuild/bindirectory to a
.zipfile. Copy it to your Windows machine via the cloud, USB file etc.
Building on Windows natively
NOTE: The build process is much slower on Windows, and we strongly recommend cross-compiling from Linux.
For building Mimic 1 on Windows natively, audio device libraries and audio libraries are optional, as Mimic 1 can write its output to a waveform file. Some of the source files are very large, and some C compilers will have difficulty building them. We recommend
NOTE: Visual C++ 6.0 is know to fail on the large diphone database files.
First, clone the
$ git clone https://github.com/MycroftAI/mimic.git
Navigate to the Mimic directory:
$ cd mimic
Generate the build scripts:
Configure the build scripts:
$ ./configure --prefix="/usr/local"
Build from source:
Validate the build:
$ make check
Install the compiled code:
$ sudo make install
By default, Mimic 1 will play the text using the selected audio device. Alternatively, Mimic 1 can output the wave file in RIFF format (.wav).
To read text to an audio device, use this command:
$ ./mimic -t TEXT
$ ./mimic -t "Hello. Doctor. Name. Continue. Yesterday. Tomorrow."
To read text, and have Mimic output to an audio file, use this command:
$ ./mimic -t TEXT -o WAVEFILE
$ ./mimic -t "Hello. Doctor. Name. Continue. Yesterday. Tomorrow." -o hello.wav
To read text from a file, and have Mimic output to an audio device, use this command:
$ ./mimic -f TEXTFILE
$ ./mimic -f doc/alice
To read text from a file, and have Mimic output to an audio file, use this command:
$ ./mimic -f TEXTFILE -o WAVEFILE
$ ./mimic -f doc/alice -o hello.wav
To list the available internal voices, use this command:
$ ./mimic -lv
To use an internal voice, use this command:
$ ./mimic -t TEXT -voice VOICE
$ ./mimic -t "Hello" -voice slt
To use an external voice file, use this command:
$ ./mimic -t TEXT -voice VOICEFILE
$ ./mimic -t "Hello" -voice voices/cmu_us_slt.flitevox
To use an external voice via a URL, use this command:
$ ./mimic -t TEXT -voice VOICEURL
$ ./mimic -t "Hello" -voice http://www.festvox.org/flite/packed/flite-2.0/voices/cmu_us_ksp.flitevox
Mimic 1 offers several different Voices. They use different speech modelling techniques (diphone, clustern, hts for example). Voices differ a lot on size, how human they sound and how easy they are to understand.
Diphone Voices are less computationally expensive and quite intelligible, but they sound very robotic.
./mimic -t "Hello world" -voice kal16
Clustergen Voices sound more natural and are easy to understand, but this comes at the expense of larger file size and higher computational requirements.
./mimic -t "Hello world" -voice slt
./mimic -t "Hello world" -voice ap
hts Voices sound more robotic than clustergen voices, but have much smaller file size.
./mimic -t "Hello world" -voice slt_hts
Voices can be compiled (built-in) into Mimic 1 or loaded from a
.flitevoxfile. The only exception are
htsVoices combine both a compiled function with a voice data file,
.htsvoice. Mimic 1 will look for the
.htsvoicefile when the
htsvoice is loaded, looking in the current working directory, the
voicessubdirectory and the
$prefix/share/mimic/voicesdirectory if it exists.
Voice names are identified as loadable files if the name includes a "/" (slash) otherwise they are treated as internal compiled-in voices.
Voices accept additional debug options, specified as
--setf feature=valuein the command line. Wrong values can prevent Mimic 1 from working.
Here are some examples:
To use simple concatenation of diphones without prosodic modification:
./mimic --sets join_type=simple_join doc/intro.txt
To print sentences as they are said:
./mimic -pw doc/alice
To make Mimic speak more slowly:
./mimic --setf duration_stretch=1.5 doc/alice
To make Mimic speak more quickly:
./mimic --setf duration_stretch=0.8 doc/alice
To make Mimic speak with a higher pitch:
./mimic --setf int_f0_target_mean=145 doc/alice
To print Mimic help information: