Google Assistant endpoints (Home, Allo, and the Assistant on Android phones) can be extended with additional functionality via the Actions on Google framework. This is a set of technologies that basically let you define a conversational flow and an endpoint that will receive and act on information from the ongoing conversation.
I based my bridge very heavily on the "Google Facts" demo that serves as the getting started process for Actions on Google.
The technologies I used include
- Actions on Google, which manages the action that handles requests from Google Assistant.
- api.ai, which hosts the conversational agent. Assistant interacts with the user via rules in this agent, and the interactions become requests to an HTTP endpoint I specify.
- Cloud Functions, a Google Cloud service that runs a bit of script in response to an HTTP request (like a smaller version of App Engine). My agent sends requests to this endpoint, which is interpreted and sent to my home automation service via Pub/Sub.
- Cloud Pub/Sub, another Google Cloud service that supports posting messages to be picked up asynchronously. I used this so that the home automation service (which runs on a Raspberry Pi in my house) can receive requests without needing to expose a world-visible endpoint that could be vulnerable to hacking.
Wiring up these tools took some time, but once completed I had a full end-to-end solution for voice and text control of the house.
Setting up the action
Actions on Google are basically small sets of rules for extending the things Google Assistant knows how to do. They're triggered via a command phrase such as "Okay, Google, tell <your project's name> <command sent to the project>", for example, "Okay, Google, please tell Belphanior 'Lights on.'"
Starting with a new project is simple enough: you create one from the Actions console and follow the flow provided to set up the project. In my case, I ended up changing the name to "The house ghost," because (a) 'Belphanior' is a bit hard to pronounce and (b) Google generally forbids one- or two-word names for assistant apps.
|App configuration, including the name for assistant purposes.|
A bit about this process: Google very much designs this flow to get you to a published assistant app that could be used by anyone, but for my purposes I only intend to allow my own house to be controlled (I didn't even add multiple account support). Getting as far as testing the action was enough.
Setting up the agent
Actions allow for a couple of options in setting up the conversation flow, but the easiest one to use was api.ai. It provides a convenient interface for setting up a conversational agent (for multiple systems, including Actions on Google and Slack). The api.ai framework is pretty rich, including neat capabilities such as training conversation components with multiple variants (i.e. "Let me buy X", "I want to buy X", "How can I buy X", etc.) and creating flows from one intent to another.
For my purposes, however, the functionality I want is very simple: I want api.ai to forward the message directly to my cloud function. So I only need the default intent, and I indicate that it should handle the request ("fulfillment") via the webhook I will specify.
|"Use webhook" is basically the interesting part here.|
Once the default fallback intent is set up, I go to the "Fulfillment" tab and specify where the agent should send requests. The webhook is just a world-visible URL; I specify my cloud-functions endpoint here. Note: what's missing from this screenshot is that I also specified an HTTP Basic Auth username and password. Those will be sent to the Cloud endpoint in the "Authorization" header as the Base64 conversion of "username:password" (no newline). It's a wise idea to do this at least, since a Cloud Functions HTTP endpoint will accept any request that comes into it.
|Don't forget to set up basic auth|
The only remaining step in api.ai is to enable the integration to Actions on Google, which is under the "Integrations" tab. Just hit the toggle button to enable the integration, then click "Settings" to get the detail dialog. At this point, I can click "UPDATE" to publish the configuration to Actions on Google, and then "TEST to use the Actions on Google simulator to try and control my app.
I can attempt to send commands such as "please tell the house ghost Lights On", but the result will be that nothing happens, because the Cloud Function webhook doesn't exist yet.
Setting up the Cloud Function handler
Now that we have the API set up, we can set up our Cloud Function handler to catch the requests. I have an existing Google Cloud Project that I use for various things; since I don't use any Cloud Functions yet, I just added one there named "speech-list-options".
When you create a Cloud Function, you have the option to configure what triggers it (HTTP in this case) and provide code that it executes. The code for my function is here; essential steps:
- Verify auth header
- Initialize an API.AI app from the 'actions-on-google' library
- Grab the raw speech input to the agent
- Bundle up the raw speech as a command published to the 'belphanior-commands' topic (which is provided by the @google-cloud/pubsub library)
The libraries are pulled in by the rules specified in the package.json tab of the Cloud Functions editor:
After that, the only remaining configuration is to specify a bucket to serve as "staging" for Cloud Functions and the entrypoint function that should be run when the HTTP trigger is hit (listOptions).
|Setting up the Cloud Function|
At this point, every command sent is published to the 'belphanior-commands' topic in my project. Using the Cloud Console and the "Pub/Sub" tab, I then created the topic and a pull subscription (projects/<project name>/subscriptions/belphanior-butler).
Receiving the commands
Receipt of the commands is done by a small service running on the same Raspberry Pi that hosts Belphanior's home automation core (which holds an App Engine default service account credential). The overall flow of the service is:
- Establish connection to the Belphanior butler
- Establish connection to Pub/Sub in the project on the subscription previously set up
- Poll for new messages
- New message received: treat the message body as a Belphanior command and send it to the butler.
All the pieces are now in place. The test version of house ghost runs under my Google account so, it can be accessed via the simulator, my smartphone, or a Google Home registered to my account.
Overall, this process works; I've only encountered a couple of issues.
- It appears that running an app in test mode "wears off" eventually, and the test has to be restarted. It'd be convenient if there were a way to build an app intended to only be used by one user, but I don't see a mechanism to do that.
- Because of the need to use Pub/Sub to serve as a pull-request target (so my Raspberry Pi doesn't itself need a world-viewable IP address tied to a domain name), all communication from the Assistant to Belphanior is one-way. To improve this, I'd either have to run Belphanior itself in the cloud or have a protocol for shipping some of the house's state to the cloud (possibly also via Pub/Sub) so that the Cloud Function could answer questions about the state of the house from locally-cached data. At this time, it's not a need I have, but if I ever need to ask questions like "Are the lights on?" it's a problem I'd have to solve.
- After a period of inactivity (a few minutes), the next Pub/Sub message can take over thirty seconds to get picked up by my Raspberry Pi's polling. I don't know precisely what the issue is, but it seems that Pub/Sub may not be optimized for rapid delivery of sparse, low-volume messages. I've heard rumor that it is possible to force a "buffer flush" by increasing the number of messages I send at one time (buffering the "payload" message with a half-dozen "no-op" messages before and after it), but I haven't yet experimented with that option.
Once the pipeline is set up, I'm very impressed with the reliability of Home as a voice control solution; it's extremely good at dissecting my intended message without operating off of a dictionary of possible inputs I provide. Voice recognition tech keeps getting better!