Mycroft AI
  • Documentation
  • About Mycroft AI
    • Why use Mycroft AI?
    • Glossary of terms
    • Contributing
    • FAQ
  • Using Mycroft AI
    • Get Mycroft
      • Mark II
        • Mark II Dev Kit
      • Mark 1
      • Picroft
      • Linux
      • Mac OS and Windows with VirtualBox
      • Docker
      • Android
    • Pairing Your Device
    • Basic Commands
    • Installing New Skills
    • Customizations
      • Configuration Manager
      • mycroft.conf
      • Languages
        • Français (French)
        • Deutsch (German)
      • Using a Custom Wake Word
      • Speech-To-Text
      • Text-To-Speech
    • Troubleshooting
      • General Troubleshooting
      • Audio Troubleshooting
      • Wake Word Troubleshooting
      • Log Files
      • Support Skill
      • Getting more support
  • Skill Development
    • Voice User Interface Design Guidelines
      • What can a Skill do?
      • Design Process
      • Voice Assistant Personas
      • Interactions
        • Intents
        • Statements and Prompts
        • Confirmations
      • Conversations
      • Error Handling
      • Example Interaction Script
      • Prototyping
      • Design to Development
    • Development Setup
      • Python Resources
      • Your First Skill
    • Skill Structure
      • Lifecycle Methods
      • Logging
      • Skill Settings
      • Dependencies
        • Manifest.yml
        • Requirements files
      • Filesystem access
      • Skill API
    • Integration Tests
      • Test Steps
      • Scenario Outlines
      • Test Runner
      • Reviewing the Report
      • Adding Custom Steps
      • Old Test System
    • User interaction
      • Intents
        • Padatious Intents
        • Adapt Intents
      • Statements
      • Prompts
      • Parsing Utterances
      • Confirmations
      • Conversational Context
      • Converse
    • Displaying information
      • GUI Framework
      • Show Simple Content
      • Mycroft-GUI on a PC
      • Mark 1 Display
    • Advanced Skill Types
      • Fallback Skill
      • Common Play Framework
      • Common Query Framework
      • Common IoT Framework
    • Mycroft Skills Manager
      • Troubleshooting
    • Marketplace Submission
      • Skills Acceptance Process
        • Information Review Template
        • Code Review Template
        • Functional Review Template
        • Combined Template
      • Skill README.md
    • FAQ
  • Mycroft Technologies
    • Technology Overview
    • Roadmap
    • Mycroft Core
      • MessageBus
      • Message Types
      • Services
        • Enclosure
        • Voice Service
        • Audio Service
        • Skills Service
      • Plugins
        • Audioservice Plugins
        • STT Plugins
        • TTS Plugins
        • Wake Word Plugins
      • Testing
      • Legacy Repo
    • Adapt
      • Adapt Examples
      • Adapt Tutorial
    • Lingua Franca
    • Mimic TTS
      • Mimic 3
      • Mimic 2
      • Mimic 1
      • Mimic Recording Studio
    • Mycroft GUI
      • Remote STT and TTS
    • Mycroft Skills Kit
    • Mycroft Skills Manager
    • Padatious
    • Precise
    • Platforms
Powered by GitBook
On this page
  • Default Engine
  • Mozilla DeepSpeech
  • Server Setup
  • Mycroft Configuration
  • Community Support
  • Kaldi
  • Server Setup
  • Mycroft Configuration
  • GoVivace
  • Account Setup
  • Mycroft Configuration
  • Google Cloud
  • Account Setup
  • Mycroft Configuration
  • Google Cloud Streaming STT
  • Account Setup
  • Mycroft Configuration
  • Houndify
  • Account Setup
  • Mycroft Configuration
  • IBM Cloud
  • Account Setup
  • Mycroft Configuration
  • Microsoft Azure
  • Account Setup
  • Mycroft Configuration
  • Wit.ai
  • Account Setup
  • Mycroft Configuration
  • Yandex SpeechKit STT
  • Account Setup
  • Mycroft Configuration

Was this helpful?

  1. Using Mycroft AI
  2. Customizations

Speech-To-Text

Speech-To-Text (STT) is the process of converting audio of spoken words into strings of text. Mycroft supports a range of Speech-To-Text engines.

PreviousUsing a Custom Wake WordNextText-To-Speech

Last updated 4 years ago

Was this helpful?

Many users want to use a specific STT engine rather than the default. Like most of Mycroft's technology stack, this too can be customized.

Default Engine

For a voice assistant like Mycroft, speech recognition must be performed very quickly and with a high degree of accuracy. For this reason, Mycroft by default uses Google's STT engine.

In order to provide an additional layer of privacy for our users, we proxy all STT requests through Mycroft's servers. This prevents Google's service from profiling Mycroft users or connecting voice recordings to their identities. Only the voice recording is sent to Google, no other identifying information is included in the request. Therefore Google's STT service does not know if an individual person is making thousands of requests, or if thousands of people are making a small number of requests each.

By supporting Mozilla's DeepSpeech project we are aiming to provide a competitive open source alternative. The accuracy of DeepSpeech is not yet sufficient to provide a quality experience for Mycroft users. However we will be switching to DeepSpeech by default as soon as we have achieved an acceptable level of accuracy.

The following are some of the available STT options. Each provides details on how to get setup, and how to configure Mycroft.

Mozilla DeepSpeech

Mycroft has been supporting Mozilla's efforts to build DeepSpeech, an open Speech-to-Text technology. It is a fully open source STT engine, based on Baidu’s Deep Speech architecture and implemented with Google’s TensorFlow framework. Being open source means that if you have the hardware, it can be run within your own network providing additional privacy and control for you and your family.

Server Setup

You can test DeepSpeech using their pre-trained model by following the instructions on the .

To setup a DeepSpeech server that Mycroft can use, try the . Once you have this up and running, we can configure Mycroft to use this server.

Mycroft Configuration

Using the we can edit the mycroft.conf file by running:

mycroft-config edit user

To our existing configuration values we will add the following:

  "stt": {
    "deepspeech_server": {
      "uri": "http://localhost:8080/stt"
    },
    "module": "deepspeech_server"
  }

Community Support

Kaldi

Kaldi can be run on a Linux cluster or an individual machine, making it another option for those wanting local network speech-to-text.

Server Setup

Mycroft Configuration

mycroft-config edit user

To our existing configuration values we will add the following:

  "stt": {
    "kaldi": {
      "uri": "http://localhost:8080/client/dynamic/recognize"
    },
    "module": "kaldi"
  }

GoVivace

Account Setup

Mycroft Configuration

mycroft-config edit user

To our existing configuration values we will add the following:

"stt": {
  "govivace": {
    "uri": "https://services.govivace.com:49149/telephony",
    "credential": {
      "token": "xxxxx"
    }
  },
  "module": "govivace"
}

The credential token will be provided to you by GoVivace.

Google Cloud

Account Setup

To obtain the required credential JSON data, you must create a Google API Console project. To do this:

  • Set up authentication:

    • From the Service account drop-down list, select New service account.

    • Enter a name into the Service account name field.

    • Don't select a value from the Role drop-down list. No role is required to access this service.

    • Click Create. A note appears, warning that this service account has no role.

    • Click Create without role. A JSON file that contains your key downloads to your computer.

Mycroft Configuration

mycroft-config edit user

To our existing configuration values we will add the following:

"stt": {
  "google_cloud": {
    "lang": "hi-in",
    "credential": {
      "json": {
        "type": "service_account",
        "project_id": "xxxxxxxxxx",
        "private_key_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
        "private_key": "-----BEGIN PRIVATE KEY-----nxxxxxxxxxxxxxxxxn-----END PRIVATE KEY-----n",
        "client_email": "xxxx@xxxx.iam.gserviceaccount.com",
        "client_id": "xxxxxxxxxxxxxxxxxxxxx",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/xxxxxx.iam.gserviceaccount.com"
      }
    }
  },
  "module": "google_cloud"
},

Google Cloud Streaming STT

A streaming STT interface for the Google Cloud Speech-To-Text API.

Account Setup

Mycroft Configuration

Install google-cloud-speech in the Mycroft Virtual environment using:

mycroft-pip install google-cloud-speech
mycroft-config edit user

To our existing configuration values we will add the following:

"stt": {
  "google_cloud_streaming": {
    "credential": {
      "json": # Paste Google API key JSON here
    }
  },
  "module": "google_cloud_streaming"
}

Houndify

STT provided by Houndify.

Account Setup

  • Create a New Client from your dashboard

    • Give your client a name and select a platform.

  • Enable the "Speech To Text Only" domain for your Client.

  • Get the Client ID and Client Key from your Client Information panel.

Mycroft Configuration

mycroft-config edit user

To our existing configuration values we will add the following:

"stt": {
  "houndify": {
    "credential": {
      "client_id": "xxxxx",
      "client_key": "xxxxx"
    }
  },
  "module": "houndify"
}

IBM Cloud

IBM Cloud - Watson Speech to Text is a cloud-based deep-learning speech-to-text service offered on top of the IBM Watson platform.

Account Setup

  • Create a New Resource from your dashboard

    • Select "Speech to Text" as the product

  • Retrieve the API Key and URL from the Services section of your dashboard

Mycroft Configuration

mycroft-config edit user

To our existing configuration values we will add the following:

"stt": {
  "ibm": {
    "credential": {
      "token": "YOUR_API_KEY"
    },
    "url": "YOUR_URL"
  },
  "module": "ibm"
}

Microsoft Azure

STT provided by the Microsoft Azure Speech Services. Formerly known as Bing STT.

Account Setup

Mycroft Configuration

mycroft-config edit user

To our existing configuration values we will add the following:

"stt": {
  "bing": {
    "credential": {
      "token": "xxxxx"
    }
  },
  "module": "bing"
}

Wit.ai

A natural language platform owned by Facebook.

Account Setup

Mycroft Configuration

mycroft-config edit user

To our existing configuration values we will add the following:

"stt": {
  "wit": {
    "credential": {
      "token": "xxxxx"
    }
  },
  "module": "wit"
}

Yandex SpeechKit STT

Yandex is one of the largest cloud platforms in Russia.

Account Setup

  • Create first "folder" in cloud.

Mycroft Configuration

mycroft-config edit user

To our existing configuration values we will add the following:

"stt": {
  "yandex": {
    "lang": "en-US",
    "credential": {
      "api_key": "YOUR_API_KEY"
    }
  },
  "module": "yandex"
}

If you are interested in the continued development of the DeepSpeech STT engine, please join our the .

is described as a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. It is intended for use by speech recognition researchers.

First be sure to read the .

The latest installation instructions can be found on the .

Using the we can edit the mycroft.conf file by running:

is a for-profit company that has a .

The software is available in both 32 and 64-bit versions for Linux, Windows, and Mac platforms. A minimum of 4GB of RAM and a 2.0GHz processor is recommended. A is also available.

See for more details.

Using the we can edit the mycroft.conf file by running:

The standard .

A with active billing is required. Please carefully consider the .

Select or create a GCP project in the

Make sure that billing is enabled for your project -

Enable the Cloud Text-to-Speech API from your

Go to the in the GCP Console

Remember to activate the API in the

Using the we can edit the mycroft.conf file by running:

A with active billing is required. Please carefully consider the .

Then, using the we can edit the mycroft.conf file by running:

Create a , then:

Using the we can edit the mycroft.conf file by running:

Create an account at , then:

Using the we can edit the mycroft.conf file by running:

Create a and get a server access token.

Using the we can edit the mycroft.conf file by running:

Create an account at then create a new app to get your server access token. See the for further details.

Using the we can edit the mycroft.conf file by running:

Create a , then:

- you can activate a free period in the console.

for your Mycroft instance with role editor.

for your service account.

See the for further details.

Using the we can edit the mycroft.conf file by running:

DeepSpeech Github repository
deepspeech-server project on PyPI
Configuration Manager
DeepSpeech channel on Mycroft Chat
Kaldi
system requirements in the Kaldi documentation
Kaldi Github repository
Configuration Manager
GoVivace
proprietary STT engine
UniMRCP server plugin
their website
Configuration Manager
Google Cloud Speech-To-Text API
Google Cloud account
financial cost of using this service
Cloud Resource Manager
see documentation
GCP Console
Create service account key page
GCP Console
Configuration Manager
Google Cloud account
financial cost of using this service
Configuration Manager
Houndify account
Configuration Manager
IBM.com/cloud
Configuration Manager
Microsoft Azure account
Configuration Manager
Wit.ai
Wit.ai documentation
Configuration Manager
Yandex Cloud account
Create a billing account
Create a service account
Create an API key
Yandex Identity and Access Management documentation
Configuration Manager