I don’t typically write a blog post like this, but I’ve had an infuriating day messing around writing an Alexa skill, and wanted to explain the issues I’ve faced for anyone else trying to do something similar.
First though, an important note: because of the problem I haven’t submitted the skill to Amazon, so only I can use it at the moment. Don’t bother trying to enable it for your Echo at the moment.
I wanted to write an Alexa Skill (that’s Alexa-speak for an app) to help other radio amateurs with a very common problem: Given a dipole antenna of a certain length, which you know to work efficiently on a certain frequency, how much do you need to cut or add to make it efficient at another (target) frequency. I therefore wanted to be able to do something like this:
Alexa, ask Dipole Calculator how much I need to cut off my dipole to move it from 14.1 to 14.3MHz
There’s a simple formula to deal with this, which we don’t need to go into here, but the critical point is that moving resonance by a few hundred kHz (that’s to say 0.x MHz) is very common. If you wanted to change frequency by anything larger than that, you’d probably start rebuilding the antenna.
Translating this for Alexa
For those who haven’t dealt with Alexa programming before, you need to know that Alexa commands for custom skills are in a very defined format:
- ‘Alexa’ is the wake-word, to wake the Amazon Echo up
- ‘ask Dipole Calculator’ is the Invocation, which tells Alexa which skill to use to interpret the rest of the command
- the rest of the command is the Utterance, which is the only bit that’s defined in the skill itself
(For the purposes of completeness, Utterances are linked to an Intent through a file called the Intent Schema, and Intents are effectively functions or methods, which is how Alexa gets stuff done.)
Utterances can contain Slots, which is Alexa’s terminology for a variable. In my sentence above, ‘14.1’ and ‘14.3’ are the slots within the utterance.
Which is where we get to the problem…
Slots are effectively very strongly-typed. Presumably this is to aid the speech recognition to narrow the number of possibilities for what you are saying. Thus, for each slot in your utterance, you need to define the type for that slot. Built-in examples include AMAZON.NUMBER (an unsigned integer), AMAZON.DATE (a date), AMAZON.DURATION (self-explanatory), etc. You can also define custom enumerations. Note though, that there’s no equivalent of a float or a decimal. This means that I can’t just have two slots (a current and a target frequency): 14.1 wouldn’t be a valid AMAZON.NUMBER, and so it would throw an exception. I obviously can’t realistically create an enumeration of all possible values either.
There used to be a AMAZON.LITERAL type which would return an untyped value, but that’s now deprecated. It’s available in the US English locale for just the next couple of weeks, and those of us in UK English can’t use it at all. Annoyingly, this means that if the speech recognition can’t interpret the value in the slot, it seems there’s no way to find out what the Echo ‘heard’. All you get if a ‘?’, representing the fact that the value couldn’t be matched to something valid for that slot type.
A possible solution?
Trying to think about a way around this, I thought that a good approach might be to create four slots:
[Current integer] point [Current decimal] to [Target integer] point [Target decimal]
In my example above, these four slots would have the values 14, 1, 14 and 3 respectively, all of which are valid AMAZON.NUMBERs.
We’ll come back to this case later, but for now let’s try an easier case, where there’s just one frequency involved: I’m going to calculate the antenna length for a single frequency.
I put this into my Sample Utterances file in AWS Lambda:
And set the types in the Intent Schema:
And sure enough it appears to work in the text simulator:
When you try to issue the same command in real-life, however, the number after the decimal place is ‘lost’:
To check what’s going on, I add a console.log() to the function where I run parseFloat(int + “.” + decimal). Checking the logs in AWS CloudWatch reveals that the FrequencyDecimal wasn’t considered a valid value for AMAZON.NUMBER, and so has come back as the infamous ‘?’.
Let’s return to the previous, more complex example.
In this case, the decimal place for the first frequency is interpreted correctly, making it evident that my approach should work. But the decimal from the second frequency has again gone missing:
It’s not only me facing this problem, either. There are a number of posts on StackOverflow and also the Amazon Developer Forum (such as this one) without conclusion, other than advising to use literals, which isn’t an option anymore.
- You can’t have a decimal value type for a slot
- You can work around this limitation by creating two slots, one for the integer and one for the decimal, and then use parseFloat() to stitch them together
- But this workaround for some reason won’t work in real life for the last decimal slot in the utterance, even though it appears to be fine in the simulator