Components
MyRobotLab, an open-source robotics framework. I am targeting its speech recognition stack and how it lacks a wakeword engine or any activation interface.
URL: myrobotlab.org
Wakeword engine interface, Snowboy wakeword engine service, Activator and ActivatorListener interfaces are what I am planning on adding
Snowboy wakeword engine found here: https://github.com/Kitt-AI/snowboy
Proposal
In MyRobotLab, we do not yet have a way to detect when somebody is speaking to the robot or to someone/something else. It assumes that everything heard is directed to it, leading to awkward scenarios where you are talking to a buddy and your nearby robot responds to you. If we look towards other assistants, like Google Assistant and Amazon Alexa, we can see that they don't have this problem because they use something called wakeword or hotword engines to detect when someone is speaking to them.
In fact, the Alexa SDK sample in Java uses an open-source wakeword engine called Snowboy. Snowboy is, based on my tests, incredibly accurate with the correct model (it's recommended to donate one's voice online to help train the models). Each model looks for a specific "hotword" in a stream of audio data, that hotword being what the model was trained on. For example: Snowboy can detect when you say "Computer" or when you say "Alexa" by using two separate models, one for each hotword.
My proposal is that we integrate Snowboy into MyRobotLab as a service, with several new interfaces as well to allow connections between other services. These interfaces would be Activator and ActivationListener, and all speech recognition interfaces should derive from ActivationListener to allow a hotword service, like Snowboy, to activate and deactivate a speech recognizer. The Snowboy service itself requires a separate thread because it constantly polls the model with new audio data looking for a match, so the poll period should be configurable to allow fine tuning. It should also implement the ActivationListener interface, for allowing the use of a secondary function to override Snowboy, such as a mute microphone button, like the Echoes and the Google Homes. This function should propagate a deactivation command down to the attached services, but not an activation command. This would automatically stop any listening that a speech recognition service might be doing, but won't automatically start listening upon reactivating Snowboy.
Benefits
MyRobotLab really needs a wakeword engine and an activation interface. Currently, InMoov robots and others powered by MyRobotLab are unable to differentiate between commands spoken to them or simple side conversation. With this enhancement, robots are given the ability to only react to commands spoken to them. Other services may also eventually support Activator and ActivationListener, such as the FaceRecognizer OpenCV filter. That would allow a robot to start listening when someone looks at it, giving a more natural feel to the interaction. I believe that it is in the best interests of MRL users to prioritize this for the Nixie release.
Posted on Utopian.io - Rewarding Open Source Contributors