Monday, August 7, 2017

Bridging Google Assistant and Belphanior

Having acquired a Google Home not too long ago, I decided to delve into building a bridge between it and Belphanior, my home automation framework. While there are a few sharp edges, I was surprised to discover it wasn't as challenging as I expected! The following is an overview of the process and some things I ran into.


Architecture Overview

Google Assistant endpoints (Home, Allo, and the Assistant on Android phones) can be extended with additional functionality via the Actions on Google framework. This is a set of technologies that basically let you define a conversational flow and an endpoint that will receive and act on information from the ongoing conversation.

I based my bridge very heavily on the "Google Facts" demo that serves as the getting started process for Actions on Google.

The technologies I used include 
  1. Actions on Google, which manages the action that handles requests from Google Assistant.
  2. api.ai, which hosts the conversational agent. Assistant interacts with the user via rules in this agent, and the interactions become requests to an HTTP endpoint I specify.
  3. Cloud Functions, a Google Cloud service that runs a bit of script in response to an HTTP request (like a smaller version of App Engine). My agent sends requests to this endpoint, which is interpreted and sent to my home automation service via Pub/Sub.
  4. Cloud Pub/Sub, another Google Cloud service that supports posting messages to be picked up asynchronously. I used this so that the home automation service (which runs on a Raspberry Pi in my house) can receive requests without needing to expose a world-visible endpoint that could be vulnerable to hacking.
Wiring up these tools took some time, but once completed I had a full end-to-end solution for voice and text control of the house.

The Process

Setting up the action

Actions on Google are basically small sets of rules for extending the things Google Assistant knows how to do. They're triggered via a command phrase such as "Okay, Google, tell <your project's name> <command sent to the project>", for example, "Okay, Google, please tell Belphanior 'Lights on.'" 

Starting with a new project is simple enough: you create one from the Actions console and follow the flow provided to set up the project. In my case, I ended up changing the name to "The house ghost," because (a) 'Belphanior' is a bit hard to pronounce and (b) Google generally forbids one- or two-word names for assistant apps.

App configuration, including the name for assistant purposes.


A bit about this process: Google very much designs this flow to get you to a published assistant app that could be used by anyone, but for my purposes I only intend to allow my own house to be controlled (I didn't even add multiple account support). Getting as far as testing the action was enough.


Setting up the agent

Actions allow for a couple of options in setting up the conversation flow, but the easiest one to use was api.ai. It provides a convenient interface for setting up a conversational agent (for multiple systems, including Actions on Google and Slack). The api.ai framework is pretty rich, including neat capabilities such as training conversation components with multiple variants (i.e. "Let me buy X", "I want to buy X", "How can I buy X", etc.) and creating flows from one intent to another.

For my purposes, however, the functionality I want is very simple: I want api.ai to forward the message directly to my cloud function. So I only need the default intent, and I indicate that it should handle the request ("fulfillment") via the webhook I will specify.

"Use webhook" is basically the interesting part here.

Once the default fallback intent is set up, I go to the "Fulfillment" tab and specify where the agent should send requests. The webhook is just a world-visible URL; I specify my cloud-functions endpoint here. Note: what's missing from this screenshot is that I also specified an HTTP Basic Auth username and password. Those will be sent to the Cloud endpoint in the "Authorization" header as the Base64 conversion of "username:password" (no newline). It's a wise idea to do this at least, since a Cloud Functions HTTP endpoint will accept any request that comes into it.

Don't forget to set up basic auth

The only remaining step in api.ai is to enable the integration to Actions on Google, which is under the "Integrations" tab. Just hit the toggle button to enable the integration, then click "Settings" to get the detail dialog. At this point, I can click "UPDATE" to publish the configuration to Actions on Google, and then "TEST to use the Actions on Google simulator to try and control my app.




I can attempt to send commands such as "please tell the house ghost Lights On", but the result will be that nothing happens, because the Cloud Function webhook doesn't exist yet.

Setting up the Cloud Function handler

Now that we have the API set up, we can set up our Cloud Function handler to catch the requests. I have an existing Google Cloud Project that I use for various things; since I don't use any Cloud Functions yet, I just added one there named "speech-list-options".

When you create a Cloud Function, you have the option to configure what triggers it (HTTP in this case) and provide code that it executes. The code for my function is here; essential steps:
  1. Verify auth header
  2. Initialize an API.AI app from the 'actions-on-google' library
  3. Grab the raw speech input to the agent
  4. Bundle up the raw speech as a command published to the 'belphanior-commands' topic (which is provided by the @google-cloud/pubsub library)
The libraries are pulled in by the rules specified in the package.json tab of the Cloud Functions editor:
{
  "name": "speech-execute",
  "version": "0.0.1",
  "dependencies": {
    "actions-on-google": "^1.0.0",
    "@google-cloud/pubsub": "~0.10.0"
  }
}
After that, the only remaining configuration is to specify a bucket to serve as "staging" for Cloud Functions and the entrypoint function that should be run when the HTTP trigger is hit (listOptions).

Setting up the Cloud Function

At this point, every command sent is published to the 'belphanior-commands' topic in my project. Using the Cloud Console and the "Pub/Sub" tab, I then created the topic and a pull subscription (projects/<project name>/subscriptions/belphanior-butler).

Receiving the commands

Receipt of the commands is done by a small service running on the same Raspberry Pi that hosts Belphanior's home automation core (which holds an App Engine default service account credential). The overall flow of the service is:
  1. Establish connection to the Belphanior butler
  2. Establish connection to Pub/Sub in the project on the subscription previously set up
  3. Poll for new messages 
    1. New message received: treat the message body as a Belphanior command and send it to the butler.
All the pieces are now in place. The test version of house ghost runs under my Google account so, it can be accessed via the simulator, my smartphone, or a Google Home registered to my account.

Issues

Overall, this process works; I've only encountered a couple of issues.
  • It appears that running an app in test mode "wears off" eventually, and the test has to be restarted. It'd be convenient if there were a way to build an app intended to only be used by one user, but I don't see a mechanism to do that.
  • Because of the need to use Pub/Sub to serve as a pull-request target (so my Raspberry Pi doesn't itself need a world-viewable IP address tied to a domain name), all communication from the Assistant to Belphanior is one-way. To improve this, I'd either have to run Belphanior itself in the cloud or have a protocol for shipping some of the house's state to the cloud (possibly also via Pub/Sub) so that the Cloud Function could answer questions about the state of the house from locally-cached data. At this time, it's not a need I have, but if I ever need to ask questions like "Are the lights on?" it's a problem I'd have to solve.
  • After a period of inactivity (a few minutes), the next Pub/Sub message can take over thirty seconds to get picked up by my Raspberry Pi's polling. I don't know precisely what the issue is, but it seems that Pub/Sub may not be optimized for rapid delivery of sparse, low-volume messages. I've heard rumor that it is possible to force a "buffer flush" by increasing the number of messages I send at one time (buffering the "payload" message with a half-dozen "no-op" messages before and after it), but I haven't yet experimented with that option.
Once the pipeline is set up, I'm very impressed with the reliability of Home as a voice control solution; it's extremely good at dissecting my intended message without operating off of a dictionary of possible inputs I provide. Voice recognition tech keeps getting better!

Tuesday, October 25, 2016

Dipping a toe into Z-Wave: OpenZWave in Go with the Ninja Blocks wrapper library

Having installed OpenZWave and successfully set up my UZB key as a controller, the next part of the process was to bridge the OpenZWave library to my Belphanior home automation core. Belphanior communicates with its servants via an HTTP-based RPC protocol; easiest route seemed to be to set up a web server to handle requests from Belphanior and translate them into Z-Wave commands.

OpenZWave is a C++ library; while I can work in C++, for web servers and the like I prefer a less detailed abstraction. Fortunately for me, an Australian firm called Ninja Blocks put together an excellent Go-language wrapper available on their github repo, which was exactly the tool for the job.

Here, I document what I liked about go-openzwave and my experience getting it working with Belphanior.

Sunday, September 18, 2016

Dipping a toe into Z-Wave with the UZB controller hardware

Quite some time ago, I laid down some things I'm looking for in a home automation solution. I've recently come to the conclusion that Z-wave is probably mature enough to begin experimenting with. It's a closed protocol owned by a single company, but the company has been around long enough and has affiliated with enough consumer partners that it's unlikely to vanish tomorrow. It may not be secure (it looks like there are security features in the protocol, but I don't know yet if all devices support them), but it should support the debugging and status features I'm looking for that I was never able to get working with INSTEON. To that end, I purchased a UZB stick from Z-Wave.Me and a GE Dimmable Lamp Module and started experimenting. This will be an ongoing trip; we'll see where we end up.

Tuesday, August 23, 2016

Understanding USB Serial on Arduino Uno (There Is No Magic)

(source: http://gordonscruton.blogspot.com/2011/06/no-magic-please-part-1-learning.html)


A professor I had was fond of saying that the first thing to learn about computers is that there is no magic. From time to time, I am reminded of that truth. So let me tell you a story.

I've been playing around with a lighting project using some Supernight LED strips (unlike NeoPixels, these light strips allow you to control the hue but don't allow for individual LED clusters to be controlled; the entire strip takes on one color). After having some fun with them using the packaged-in controller, I decided to wire up my own based on some previous work and a tutorial from Make on how to control these lights (they're basically four-channel: one channel is +12v, and the other three can accept a PWM signal for color intensity). I used an Arduino UNO as the driver for the light control circuit. Everything was going well: I did a breadboard test, wired it all up, and put it in a nice project enclosure box.

To control the lights, I'm running a web service on a Raspberry Pi that communicates via USB serial to the Aruduino (in theory, the Pi could send the PWM signals directly, but I'm squeamish about wiring my harder-to-replace Pi to a 12 volt signal). I wrote a small state machine to accept some serial signals on the Arduino, and a small go service to take web requests and translate them into opening the serial port, sending the data, and closing the port.

When it works, it works great!
Trouble is, it didn't work. I could see my server sending the signals, and it believed they were sent correctly, but the Arduino wouldn't respond. I could even go into the Serial Monitor in the Arduino IDE and send the characters individually; they'd trigger the lights, but when I sent them from my server, the lights would go out. I couldn't debug by printing messages back over the serial connection, because I was using it to control the board (PROTIP: always add a couple LEDs to your enclosure; you never know when you're going to need the ability to just get a simple status signal from your Arduino without using serial).

The breakthrough came from adding a bit of code to the setup() function that caused the lights to quickly pulse red-green-blue to indicate setup was complete. I quickly learned that every time I tried to send a signal to change the light color from the Pi, the lights were pulsing---the Arduino was doing a full reboot. Why?

I had assumed that the Arduino was (at the hardware level) talking directly to the USB, and the Serial library I used emulated a serial connection in software. In reality, that's not the case at all. The secret is described clearly (well, clearly enough ;) ) in the wiring diagram for the UNO itself.

(Source)


The main processor is an ATMega 328P, but there's an entire second processor on the chip---an ATMega16U2 that sits between the main processor and the USB hardware. That processor has (normally) non-modifiable firmware that translates the USB serial protocol to a traditional 5v serial logic protocol. It also has a useful feature, sitting right in the middle of the diagram:

"There's yer problem."
It's a bit hard to tell in all the visual noise, but that's pin 13 on the USB controller wired to RESET on the main processor. So when does the USB controller reset the main processor? When a USB connection is initiated, the controller sends a DTR signal on pin 13. This signal reboots the processor. Every time my web service opened the serial device in Linux, it was causing the USB controller to trigger a DTR---the subsequently-sent bytes were then either being consumed or dropped on the floor. Fixing this was as simple as re-coding my service to open the device once and hold onto it for the entire life of the process. No sweat!

So this still leaves us the question: why is DTR wired to RESET? Here's the magic I was missing: The Arduino UNO uses the USB connection for two things: sending and receiving serial communications, and programming the Arduino itself. How does it know that the signal coming in is supposed to be new program logic and not serial commands for the program to consume? It turns out that the IDE reprograms the Arduino by starting a serial connection (causing DTR to pull the RESET pin). The first thing the Arduino's main processor does on reset is run firmware that listens for a few milliseconds for a sequence of bytes indicating new code incoming (these bytes are documented in the STK500 communication protocol v1)---if it receives those bytes, it writes them to flash memory, otherwise it gives up and begins running userland code. The Arduino Serial Monitor triggers the reset also, but doesn't send the new-code bytes. Previous to this feature being added, reprogramming the Arduino required the developer to time resetting the Arduino and starting the code upload somewhat precisely, which was annoying. Incidentally, if you own an Arduino and don't want this feature enabled, there are a couple of convenient ways to disable it so that your processor won't reboot every time it starts a serial connection.

So there you go: no magic, just a pair of processors with some interwoven activation logic that allows the same wire to serve as both communications and as the channel to download new program versions. Which, when it all operates smoothly, is pretty magic!


Sunday, May 1, 2016

Playing with light: "Hologram" (Pepper's Ghost) projector


There's a neat trick you can do with angled plastic to give the impression of an image floating in space. While this is technically a "Pepper's Ghost" illusion, it commonly goes online by the name "Pyramid Hologram." For our friends at Iguanatron, we put together a top-down projector implementation with a 20" computer monitor.

So here's how it went down.

Sunday, January 3, 2016

GoBlockly Interpreter: An Interpreter for Blockly in Go

I've just published GoBlockly: an interpreter for the Blockly visual editor output written in Go. This is something I wrote as part of a larger project (reimplementing the Belphanior Butler in Go), but it was a reasonable-sized chunk to break off and polish. It passes Blockly's own unit tests and allows for addition of custom blocks.

Example output of a server running GoBlockly library
If you program in go and want to play around with Blockly as a UI for simple code authoring, feel free to try it out! It doesn't come batteries-included (i.e. you'll need to set up your own Blockly web client and send the code back-and-forth to the server in go), but Go makes it easy enough to write a web server that this should be straightforward.

Have fun! :)

Monday, July 13, 2015

Cecilia's Drawing Toy: Updated for Little Hands

You learn a lot about proper software design by actually watching users. This is a well-known fact, but sometimes a reminder is helpful.

Cecilia was in town for several weeks, so I got a chance to watch her playing with her drawing toy. I noticed that she preferred to hold the phone tightly with one thumb, which really threw off the motion tracking.

A few changes later, I added support for multiple-touch operations. If little hands need to grab the screen in two places, that's not a problem!

It's also good for finger-painting. :)


As always, the updated version is available on the Play Store if you're not in the mood to mess around with compiling your own.