A broad overview of the technology that makes up Mycroft AI.
Our vision for Mycroft
Mycroft is modular. Some components can be easily 'swapped out' for others:
- Wake Word detection
- Speech to Text (STT)
- Intent parser
There are two technologies that Mycroft.AI currently uses for Wake Word detection:
Because PocketSphinx is trained on English speech, your Wake Word currently needs to be an English word, like
Hi there Mickeyor
Hey Mike. Wake Words in other languages, like Spanish, French or German, won't work as well.
- Precise: Unlike PocketSphinx, which is based on Speech to Text technology, Precise is a neural network that is trained on audio data. It doesn't matter what words you want to use for your Wake Word. Instead, you train it on sounds. The downside is that Precise needs to be trained on your chosen Wake Word. Precise is the default Wake Word Listener for the "Hey Mycroft" wake word, PocketSphinx provides a fallback to this if Precise is unavailable.
Speech to Text (STT) software is used to take spoken words, and turn them into text phrases that can then be acted on.
An intent parser is software which identifies what the user's intent is based on their speech. An intent parser usually takes the output of a Speech to Text (STT) engine as an input.
For example, Julie Speaks the following to Mycroft:
Hey Mycroft, tell me about the weather
Julie's intent is to find out about the weather (probably in her current location).
An intent parser can then match the intent with a suitable Skill to handle the intent.
- Padatious: Padatious is a neural network based intent parser. Padatious is currently under active development by Mycroft and is available under an open source license. It is likely that some Mycroft platforms will switch to using Padatious in the future instead of Adapt.
Text to Speech (TTS) software takes written text, such as text files on a computer, and uses a voice to speak the text. Text to Speech can have different voices, depending on the TTS engine used.
In your home.mycroft.ai account, you can select voices from these as well as
even more tts engines are available but require manual configuration.
The Mycroft middleware has two components:
- Mycroft Home and Mycroft API: this is the platform where data on Users and Devices is held. This platform provides abstraction services, such as storing API keys that are used to access third-party services to provide Skill functionality. The code for this platform is available under an AGPL 3.0 open source license.
Mycroft Skills are like 'add-ons' or 'plugins' that provide additional functionality. Skills can be developed by Mycroft Developers, or by Community Developers, and vary in their functionality and maturity.
Mycroft is designed to run on many different platforms. Each dedicated platform is called a device, these include:
- Mark 1 - our first reference hardware device using a dedicated software image.
- Mark 2 - our latest reference hardware device using a dedicated software image.
- Picroft - any Raspberry Pi 3 or 4 that is running the Picroft software image.
The enclosure refers to the specific code that is required for that device. It might define unique functionality such as the eyes on the Mark 1, or a specific way of interacting with the hardware, such as controlling the volume levels at a hardware level via i2c.