# Grokotron

Mycroft AI’s primary mission has always been to create a true privacy-respecting voice assistant. One that is truly a personal assistant rather than a household spying device. A device that does what you want it to do rather than what the mega-corporation that sold it to you wants it to do.&#x20;

One of the greatest challenges for us to achieve this has been the lack of a fast, accurate, flexible Speech to Text (STT) engine that can run locally. While the product is still in early days of development, we believe we finally have an answer to this problem. We call it Grokotron.

Grokotron provides limited domain automatic speech recognition on low-resource hardware like the Raspberry Pi 4 that comes in the Mark II. It does this extremely quickly, and of course completely offline. Grokotron’s impressive accuracy and performance is due to its hybrid nature. It includes both an acoustic model and a grammar of expected expressions which constrains its transcription built on top of the popular open source Kaldi Speech Recognition Toolkit.&#x20;

You can [read more about this experimental release on our blog](https://mycroft.ai/blog/grokotron-stt-on-the-edge/).

### Download

A proof of concept image for use on the Mark II is available for testing. This is based on the Dinkum Sandbox image. By default it will not connect to the internet or pair with the Mycroft backend and speech recognition is limited to the phrases defined in the [sentences.ini](https://github.com/MycroftAI/mark-ii-sandbox/blob/grokotron/files/opt/grokotron/sentences.ini) file.

{% embed url="<https://myc200.fra1.digitaloceanspaces.com/pub/releases/sandbox/other/20221118-mark2_grokotron.img.gz>" %}
20221118-mark2\_grokotron.img.gz\
SHA256: abb6cbfbee2b992c61a2322de47b2b404c6d0723e0ace332f1189ba7a11f2ae9
{% endembed %}

### Defining Grammar

The grammar is easy to define and extend with a simple markup language. This ability to be expanded easily means that while the range of expressions Grokotron can process is limited, it can be quite large and can be practically extended to cover nearly anything a voice assistant needs.

Currently the primary grammar is defined in [`/opt/grokotron/sentences.ini`](https://github.com/MycroftAI/mark-ii-sandbox/blob/grokotron/files/opt/grokotron/sentences.ini) using a variation of the [voice2json template language](https://voice2json.org/sentences.html), which itself is a simplified form of the [JSpeech Grammar Format (JSGF)](https://www.w3.org/TR/jsgf/).&#x20;

#### Sections

If you look at the default sentences.ini file you will see that it is broken up into sections by Skill name. These are not currently used for anything, it is purely to make the file more readable for humans.

#### Optional Words

Within a sentence, you can specify optional word(s) by surrounding them `[with brackets]`.&#x20;

The template:

```ini
an example sentence [template]
```

represents 2 different possible sentences - one with the optional word, and one without:

1. `an example sentence template`
2. `an example sentence`

Note that if an optional word is required at the beginning of a sentence the opening square bracket must be escaped so that it does not get interpreted as a section heading. For example:

```ini
[SomeSkill]
\[no longer] a problem sentence
```

#### Alternatives

Where you have a set of options, one of which must be present we use parentheses `()` and a pipe delimeter `|`

The template:

```ini
set the light to (red | green | blue)
```

will represent:

1. `set the light to red`
2. `set the light to green`
3. `set the light to blue`

#### Optional Alternatives

You can also include alternatives within square brackets to define optional alternatives.&#x20;

So the following sentence template:

```ini
An example sentence [with some | that has] optional words
```

represents 3 different sentences:

1. `An example sentence with some optional words`
2. `An example sentence that has optional words`
3. `An example sentence optional words`

#### Number ranges

Where a range of numbers may be needed, they can defined using two consecutive periods `(0..100)`.

For example the sentence template:

```ini
set the volume to (0..100) percent
```

Would return 101 sentence variations using all the numbers from 0 to 100. So each of the following sentences would be included, along with everything in between:

* `set the volume to 0 percent`
* `set the volume to 1 percent`
* `set the volume to 36 percent`
* `set the volume to 100 percent`

#### Rules

Rules allow you to reuse common phrases, alternatives, etc. Rules are defined by&#x20;

```ini
rule_name = ... 
```

alongside your sentences and referenced by `<rule_name>`.&#x20;

The template above with colors could be rewritten as:

```ini
colors = (red | green | blue)
set the light to <colors>
```

which will represent the same 3 sentences as above. Importantly, you can **share rules** across intents by prefixing the rule’s name with the intent name followed by a dot:

```ini
[light]
colors = (red | green | blue)
set the light to <colors>

[background]
set the background to <light.colors>
```

The second section (`background`) references the `colors` rule from the `light` section.

Rules may also reference each other, for example:

```ini
seconds = ((1){seconds} second | (2..59){seconds} seconds)
minutes = ((1){minutes} minute | (2..59){minutes} minutes)
hours = ((1){hours} hour | (2..59){hours} hours)
time = (<seconds> | <minutes> [[and] <seconds>] | <hours> [[and] <minutes>] [[and] <seconds>])
```

#### Slots

Where many alternatives are required, entity slots can be defined using a `$` prefix.

The sentence template:

```ini
change wallpaper to ($wallpapers)
```

will look for a file at `/opt/grokotron/slots/wallpapers` In that file we would list all of the options available, such as:

```
default
river
sea
earth
moon
nebula
city
blue
green
orange
```

As we have used parentheses in the sentence template, each new line in the `wallpapers` file will be a required term. If we instead used `[square brackets]` that slot would be optional.

In the default `sentences.ini` file you might notice that slots of often followed by a set of `{curly braces}`. This is a way of defining how the content of a slot can be referenced by other parts of the system, similar to entities in Adapt and Padatious. However they are not yet used as the template currently only defines the possible grammar for speech recognition, not intent definitions.&#x20;

### Retraining&#x20;

After modifying the Grokotron grammar or any slot files, the model must be retrained by running `/opt/grokotron/train.py`

{% hint style="info" %}
If running this as the default `pi` user,  you will first need to set the permissions of the output directory. For example:

```
sudo chown -R pi:pi /opt/grokotron/output
```

{% endhint %}

### Setting the location, time and date

Currently the Grokotron proof-of-concept image intentionally does not communicate with our backend server. That means no pairing is required, and the device can run completely offline. However this also means that the location of the device, including the time and date are set to a default value of Kansas City, Missouri.&#x20;

These can be updated within the `mycroft.conf` files on the device.  If you have an existing Mark II setup, the quickest way to get these values is to copy the `location` block from the remote configuration stored at: `~/.config/mycroft/mycroft.remote.conf`

For example:

```json
{
  "location": {
    "city": {
      "code": "Lawrence",
      "name": "Lawrence",
      "state": {
        "code": "KS",
        "name": "Kansas",
        "country": {
          "code": "US",
          "name": "United States"
        }
      }
    },
    "coordinate": {
      "latitude": 38.971669,
      "longitude": -95.23525
    },
    "timezone": {
      "code": "America/Chicago",
      "name": "Central Standard Time",
      "dstOffset": 3600000,
      "offset": -21600000
    }
  }
}
```

Once in place, restart the Mycroft Dinkum services to ensure it takes effect system wide:

```shell
sudo systemctl restart mycroft-dinkum.target
```
