Speech-To-Text
Speech-To-Text (STT) is the process of converting audio of spoken words into strings of text. Mycroft supports a range of Speech-To-Text engines.
Many users want to use a specific STT engine rather than the default. Like most of Mycroft's technology stack, this too can be customized.
Default Engine
For a voice assistant like Mycroft, speech recognition must be performed very quickly and with a high degree of accuracy. For this reason, Mycroft by default uses Google's STT engine.
In order to provide an additional layer of privacy for our users, we proxy all STT requests through Mycroft's servers. This prevents Google's service from profiling Mycroft users or connecting voice recordings to their identities. Only the voice recording is sent to Google, no other identifying information is included in the request. Therefore Google's STT service does not know if an individual person is making thousands of requests, or if thousands of people are making a small number of requests each.
By supporting Mozilla's DeepSpeech project we are aiming to provide a competitive open source alternative. The accuracy of DeepSpeech is not yet sufficient to provide a quality experience for Mycroft users. However we will be switching to DeepSpeech by default as soon as we have achieved an acceptable level of accuracy.
The following are some of the available STT options. Each provides details on how to get setup, and how to configure Mycroft.
Mozilla DeepSpeech
Mycroft has been supporting Mozilla's efforts to build DeepSpeech, an open Speech-to-Text technology. It is a fully open source STT engine, based on Baidu’s Deep Speech architecture and implemented with Google’s TensorFlow framework. Being open source means that if you have the hardware, it can be run within your own network providing additional privacy and control for you and your family.
Server Setup
You can test DeepSpeech using their pre-trained model by following the instructions on the DeepSpeech Github repository.
To setup a DeepSpeech server that Mycroft can use, try the deepspeech-server project on PyPI. Once you have this up and running, we can configure Mycroft to use this server.
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
Community Support
If you are interested in the continued development of the DeepSpeech STT engine, please join our the DeepSpeech channel on Mycroft Chat.
Kaldi
Kaldi is described as a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. It is intended for use by speech recognition researchers.
Kaldi can be run on a Linux cluster or an individual machine, making it another option for those wanting local network speech-to-text.
Server Setup
First be sure to read the system requirements in the Kaldi documentation.
The latest installation instructions can be found on the Kaldi Github repository.
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
GoVivace
GoVivace is a for-profit company that has a proprietary STT engine.
Account Setup
The software is available in both 32 and 64-bit versions for Linux, Windows, and Mac platforms. A minimum of 4GB of RAM and a 2.0GHz processor is recommended. A UniMRCP server plugin is also available.
See their website for more details.
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
The credential token will be provided to you by GoVivace.
Google Cloud
The standard Google Cloud Speech-To-Text API.
Account Setup
A Google Cloud account with active billing is required. Please carefully consider the financial cost of using this service.
To obtain the required credential JSON data, you must create a Google API Console project. To do this:
Select or create a GCP project in the Cloud Resource Manager
Make sure that billing is enabled for your project - see documentation
Enable the Cloud Text-to-Speech API from your GCP Console
Set up authentication:
Go to the Create service account key page in the GCP Console
From the Service account drop-down list, select New service account.
Enter a name into the Service account name field.
Don't select a value from the Role drop-down list. No role is required to access this service.
Click Create. A note appears, warning that this service account has no role.
Click Create without role. A JSON file that contains your key downloads to your computer.
Remember to activate the API in the GCP Console
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
Google Cloud Streaming STT
A streaming STT interface for the Google Cloud Speech-To-Text API.
Account Setup
A Google Cloud account with active billing is required. Please carefully consider the financial cost of using this service.
Mycroft Configuration
Install google-cloud-speech
in the Mycroft Virtual environment using:
Then, using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
Houndify
STT provided by Houndify.
Account Setup
Create a Houndify account, then:
Create a New Client from your dashboard
Give your client a name and select a platform.
Enable the "Speech To Text Only" domain for your Client.
Get the
Client ID
andClient Key
from your Client Information panel.
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
IBM Cloud
IBM Cloud - Watson Speech to Text is a cloud-based deep-learning speech-to-text service offered on top of the IBM Watson platform.
Account Setup
Create an account at IBM.com/cloud, then:
Create a New Resource from your dashboard
Select "Speech to Text" as the product
Retrieve the
API Key
andURL
from the Services section of your dashboard
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
Microsoft Azure
STT provided by the Microsoft Azure Speech Services. Formerly known as Bing STT.
Account Setup
Create a Microsoft Azure account and get a server access token.
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
Wit.ai
A natural language platform owned by Facebook.
Account Setup
Create an account at Wit.ai then create a new app to get your server access token. See the Wit.ai documentation for further details.
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
Yandex SpeechKit STT
Yandex is one of the largest cloud platforms in Russia.
Account Setup
Create a Yandex Cloud account, then:
Create a billing account - you can activate a free period in the console.
Create first "folder" in cloud.
Create a service account for your Mycroft instance with role editor.
Create an API key for your service account.
See the Yandex Identity and Access Management documentation for further details.
Mycroft Configuration
Using the Configuration Manager we can edit the mycroft.conf
file by running:
To our existing configuration values we will add the following:
Last updated