Speech-To-Text
Speech-To-Text (STT) is the process of converting audio of spoken words into strings of text. Mycroft supports a range of Speech-To-Text engines.
Many users want to use a specific STT engine rather than the default. Like most of Mycroft's technology stack, this too can be customized.
For a voice assistant like Mycroft, speech recognition must be performed very quickly and with a high degree of accuracy. For this reason, Mycroft by default uses Google's STT engine.
In order to provide an additional layer of privacy for our users, we proxy all STT requests through Mycroft's servers. This prevents Google's service from profiling Mycroft users or connecting voice recordings to their identities. Only the voice recording is sent to Google, no other identifying information is included in the request. Therefore Google's STT service does not know if an individual person is making thousands of requests, or if thousands of people are making a small number of requests each.
By supporting Mozilla's DeepSpeech project we are aiming to provide a competitive open source alternative. The accuracy of DeepSpeech is not yet sufficient to provide a quality experience for Mycroft users. However we will be switching to DeepSpeech by default as soon as we have achieved an acceptable level of accuracy.
The following are some of the available STT options. Each provides details on how to get setup, and how to configure Mycroft.
Mycroft has been supporting Mozilla's efforts to build DeepSpeech, an open Speech-to-Text technology. It is a fully open source STT engine, based on Baidu’s Deep Speech architecture and implemented with Google’s TensorFlow framework. Being open source means that if you have the hardware, it can be run within your own network providing additional privacy and control for you and your family.
You can test DeepSpeech using their pre-trained model by following the instructions on the DeepSpeech Github repository.
To setup a DeepSpeech server that Mycroft can use, try the deepspeech-server project on PyPI. Once you have this up and running, we can configure Mycroft to use this server.
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"deepspeech_server": {
"uri": "http://localhost:8080/stt"
},
"module": "deepspeech_server"
}
If you are interested in the continued development of the DeepSpeech STT engine, please join our the DeepSpeech channel on Mycroft Chat.
Kaldi is described as a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. It is intended for use by speech recognition researchers.
Kaldi can be run on a Linux cluster or an individual machine, making it another option for those wanting local network speech-to-text.
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"kaldi": {
"uri": "http://localhost:8080/client/dynamic/recognize"
},
"module": "kaldi"
}
The software is available in both 32 and 64-bit versions for Linux, Windows, and Mac platforms. A minimum of 4GB of RAM and a 2.0GHz processor is recommended. A UniMRCP server plugin is also available.
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"govivace": {
"uri": "https://services.govivace.com:49149/telephony",
"credential": {
"token": "xxxxx"
}
},
"module": "govivace"
}
The credential token will be provided to you by GoVivace.
A Google Cloud account with active billing is required. Please carefully consider the financial cost of using this service.
To obtain the required credential JSON data, you must create a Google API Console project. To do this:
- Set up authentication:
- From the Service account drop-down list, select New service account.
- Enter a name into the Service account name field.
- Don't select a value from the Role drop-down list. No role is required to access this service.
- Click Create. A note appears, warning that this service account has no role.
- Click Create without role. A JSON file that contains your key downloads to your computer.
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"google_cloud": {
"lang": "hi-in",
"credential": {
"json": {
"type": "service_account",
"project_id": "xxxxxxxxxx",
"private_key_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"private_key": "-----BEGIN PRIVATE KEY-----nxxxxxxxxxxxxxxxxn-----END PRIVATE KEY-----n",
"client_email": "[email protected]",
"client_id": "xxxxxxxxxxxxxxxxxxxxx",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/xxxxxx.iam.gserviceaccount.com"
}
}
},
"module": "google_cloud"
},
A streaming STT interface for the Google Cloud Speech-To-Text API.
A Google Cloud account with active billing is required. Please carefully consider the financial cost of using this service.
Install
google-cloud-speech
in the Mycroft Virtual environment using:mycroft-pip install google-cloud-speech
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"google_cloud_streaming": {
"credential": {
"json": # Paste Google API key JSON here
}
},
"module": "google_cloud_streaming"
}
STT provided by Houndify.
- Create a New Client from your dashboard
- Give your client a name and select a platform.
- Enable the "Speech To Text Only" domain for your Client.
- Get the
Client ID
andClient Key
from your Client Information panel.
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"houndify": {
"credential": {
"client_id": "xxxxx",
"client_key": "xxxxx"
}
},
"module": "houndify"
}
IBM Cloud - Watson Speech to Text is a cloud-based deep-learning speech-to-text service offered on top of the IBM Watson platform.
- Create a New Resource from your dashboard
- Select "Speech to Text" as the product
- Retrieve the
API Key
andURL
from the Services section of your dashboard
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"ibm": {
"credential": {
"token": "YOUR_API_KEY"
},
"url": "YOUR_URL"
},
"module": "ibm"
}
STT provided by the Microsoft Azure Speech Services. Formerly known as Bing STT.
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"bing": {
"credential": {
"token": "xxxxx"
}
},
"module": "bing"
}
A natural language platform owned by Facebook.
Create an account at Wit.ai then create a new app to get your server access token. See the Wit.ai documentation for further details.
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"wit": {
"credential": {
"token": "xxxxx"
}
},
"module": "wit"
}
Yandex is one of the largest cloud platforms in Russia.
- Create first "folder" in cloud.
mycroft-config edit user
To our existing configuration values we will add the following:
"stt": {
"yandex": {
"lang": "en-US",
"credential": {
"api_key": "YOUR_API_KEY"
}
},
"module": "yandex"
}
Last modified 2yr ago