Speech-To-Text
Speech-To-Text (STT) is the process of converting audio of spoken words into strings of text. Mycroft supports a range of Speech-To-Text engines.
Last updated
Was this helpful?
Speech-To-Text (STT) is the process of converting audio of spoken words into strings of text. Mycroft supports a range of Speech-To-Text engines.
Last updated
Was this helpful?
Many users want to use a specific STT engine rather than the default. Like most of Mycroft's technology stack, this too can be customized.
For a voice assistant like Mycroft, speech recognition must be performed very quickly and with a high degree of accuracy. For this reason, Mycroft by default uses Google's STT engine.
In order to provide an additional layer of privacy for our users, we proxy all STT requests through Mycroft's servers. This prevents Google's service from profiling Mycroft users or connecting voice recordings to their identities. Only the voice recording is sent to Google, no other identifying information is included in the request. Therefore Google's STT service does not know if an individual person is making thousands of requests, or if thousands of people are making a small number of requests each.
By supporting Mozilla's DeepSpeech project we are aiming to provide a competitive open source alternative. The accuracy of DeepSpeech is not yet sufficient to provide a quality experience for Mycroft users. However we will be switching to DeepSpeech by default as soon as we have achieved an acceptable level of accuracy.
The following are some of the available STT options. Each provides details on how to get setup, and how to configure Mycroft.
Mycroft has been supporting Mozilla's efforts to build DeepSpeech, an open Speech-to-Text technology. It is a fully open source STT engine, based on Baidu’s Deep Speech architecture and implemented with Google’s TensorFlow framework. Being open source means that if you have the hardware, it can be run within your own network providing additional privacy and control for you and your family.
You can test DeepSpeech using their pre-trained model by following the instructions on the .
To setup a DeepSpeech server that Mycroft can use, try the . Once you have this up and running, we can configure Mycroft to use this server.
Using the we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
Kaldi can be run on a Linux cluster or an individual machine, making it another option for those wanting local network speech-to-text.
To our existing configuration values we will add the following:
To our existing configuration values we will add the following:
The credential token will be provided to you by GoVivace.
To obtain the required credential JSON data, you must create a Google API Console project. To do this:
Set up authentication:
From the Service account drop-down list, select New service account.
Enter a name into the Service account name field.
Don't select a value from the Role drop-down list. No role is required to access this service.
Click Create. A note appears, warning that this service account has no role.
Click Create without role. A JSON file that contains your key downloads to your computer.
To our existing configuration values we will add the following:
A streaming STT interface for the Google Cloud Speech-To-Text API.
Install google-cloud-speech
in the Mycroft Virtual environment using:
To our existing configuration values we will add the following:
STT provided by Houndify.
Create a New Client from your dashboard
Give your client a name and select a platform.
Enable the "Speech To Text Only" domain for your Client.
Get the Client ID
and Client Key
from your Client Information panel.
To our existing configuration values we will add the following:
IBM Cloud - Watson Speech to Text is a cloud-based deep-learning speech-to-text service offered on top of the IBM Watson platform.
Create a New Resource from your dashboard
Select "Speech to Text" as the product
Retrieve the API Key
and URL
from the Services section of your dashboard
To our existing configuration values we will add the following:
STT provided by the Microsoft Azure Speech Services. Formerly known as Bing STT.
To our existing configuration values we will add the following:
A natural language platform owned by Facebook.
To our existing configuration values we will add the following:
Yandex is one of the largest cloud platforms in Russia.
Create first "folder" in cloud.
To our existing configuration values we will add the following:
If you are interested in the continued development of the DeepSpeech STT engine, please join our the .
is described as a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. It is intended for use by speech recognition researchers.
First be sure to read the .
The latest installation instructions can be found on the .
Using the we can edit the mycroft.conf
file by running:
is a for-profit company that has a .
The software is available in both 32 and 64-bit versions for Linux, Windows, and Mac platforms. A minimum of 4GB of RAM and a 2.0GHz processor is recommended. A is also available.
See for more details.
Using the we can edit the mycroft.conf
file by running:
The standard .
A with active billing is required. Please carefully consider the .
Select or create a GCP project in the
Make sure that billing is enabled for your project -
Enable the Cloud Text-to-Speech API from your
Go to the in the GCP Console
Remember to activate the API in the
Using the we can edit the mycroft.conf
file by running:
A with active billing is required. Please carefully consider the .
Then, using the we can edit the mycroft.conf
file by running:
Create a , then:
Using the we can edit the mycroft.conf
file by running:
Create an account at , then:
Using the we can edit the mycroft.conf
file by running:
Create a and get a server access token.
Using the we can edit the mycroft.conf
file by running:
Create an account at then create a new app to get your server access token. See the for further details.
Using the we can edit the mycroft.conf
file by running:
Create a , then:
- you can activate a free period in the console.
for your Mycroft instance with role editor.
for your service account.
See the for further details.
Using the we can edit the mycroft.conf
file by running: