GateKey - Keys to the kingdom
Keys to the kingdom
I live in a security complex (though people in South Africa tend to just call them “complexes”) with a keypad at the front gate. When visitors arrive, they dial our unit number and press #. The gate then calls my cellphone, I can talk to whoever’s at the gate and, when I’m convinced that they’re friendly, I dial 9 to open the gate.
On one hand, that’s great because I don’t need to walk all the way to the gate every time someone comes to visit. On the other hand, only one number can be configured per household, so if I’m out and my wife has visitors or a delivery, I have to make sure to keep an eye on my phone so I don’t leave them stranded in the cold tundra of Cape Town. A few weeks ago, I saw a comment on Hacker News from someone that had set up a system to work with keycodes instead. Having visitors type codes to open the gate sounded like a nice improvement, so I thought I’d give it a go myself.
tl;dr give code
What do
This project had a few nice bite-sized problems to solve:
- a way to interface with the system and provision keys
- a way to accept codes from visitors and open the gate
- a dynamic DNS record, since I’d be running the server from home
- a canary to notify me if the system goes down
Beyond that, it’s a fairly straightforward CRUD application, so I used it as an opportunity to play with Docker and GitHub actions, which I don’t get to do in my day-to-day.
Key management
Telegram bots are great for creating simple interfaces to backend services, especially for someone like myself with very little in the way of frontend dev skills. One aspect that I really like is the simplicity with which you can add buttons to the UI to avoid users have to remember and type commands:
If you haven’t tried the “From BotFather to Hello World” tutorial, I’d highly recommend giving it a go. You’ll have a Telegram bot up and running in minutes.
Provisioning keys
One of the more important pieces of functionality in the “key management” half of the service is the part where you get keys to manage. I iterated through several different types of keys, but settled on keys that are valid for 24 hours from the time of provisioning and are all one-use. “One-use” here actually means that they can be used any number of times within 5 minutes of their first use; the gate at the complex isn’t perfect and ocasionally won’t open, so visitors need to be able to try again with the same key.
Instead of deleting keys once they used, they’re tombstoned and hard deleted after 30 days by an asynchronous sweeper. Tombstoning the keys means that I can ensure that no key is re-used within 30 days, just in case some ne’er-do-well thinks to come back and try the same key when I don’t intend them to.
Authorization
Telegram bots aren’t private. Anyone who comes across the LeonardHomeBot can send it messages; you can go ahead and try here. That’s an issue because I don’t really want just anyone with a Telegram account to be able to create access keys for my complex. As such, I created a user registration flow and authorization model that let me manage users and control their permissions.
Luckily, authentication is dealt with by Telegram; by the time I get a message, the information about the sender comes with it for free.
Clicking the /addUser
button creates a one-use user registration token that’s valid for 24 hours and generates a deep link that, when opened, will automatically start a conversation with the bot and pass it the token. When the server received the token, it checks that it’s valid and creates a user profile for the caller.
If someone clicks the /addUser
button too many times or a token recipient didn’t feel like signing up, it would create a security risk if the tokens were just left lying around so, as with keys, there’s an asynchronous sweeper that hard deletes any expired tokens.
The RBAC AuthZ system defines a set of permissions and permission “bundles” (effectively roles). Every user is currently an admin (and has all permission), but if I choose to expand the system later, I’ve got a framework within which to do so.
Conversations
When users request a key, there’s a bit of back-and-forth that needs to happen:
- User requests a key
- System asks who the key is for
- User types the recipient’s name
Messages in Telegram can reference each other (if a user uses the “reply” functionality), but those references don’t contain any of the content of the previous messages. i set up the ConversationHandler to deal with this issue. It caches information about ongoing conversations so that it can link the messages together.
If I were to expand the system, I’d look at building a state machine, where each message in a conversation is a state through which the conversation can transition. For now, however, the simple conversation system works well.
Open sesame
With keys provisionable, I got to work setting up the system to handle phone calls. Twilio is great for this kind of thing, though it’s decidedly slower to get going than with Telegram. The regulatory approval to buy a telephone number took several days.
Vocal training
Twilio’s voice API operates via simple webhooks; you tell Twilio what address your server’s listening on and, when a phone call arrives on the number, it makes HTTP requests against your endpoint. You then respond in Twilio’s markup language called TwiML, which contains verbs for all the things you could need to do with a programmatic voice API:
- The
Say
verb reads out the text to the person on the other end of the line - The
Dial
verb dials numbers - The
Reject
verb rejects calls - etc.
If you haven’t used it before, take a look at
ngrok
. It’s great for setting up quick webhooks (with TLS!) while testing when you don’t want to faff with port forwarding and DNS records.
One of the verbs, Gather
, lets you gather digits from the caller when they type them in on their keypad. Phones use dual-tone multi-frequency signaling to send numbers, but Twilio’s voice API takes care of it for you and you just grab the digits from the HTTP request that hits your webhook. That’s how I get the keycodes from visitors.
Once my voice controller has the digits, it’s a fairly straightforward process to check those against the DB and send a Play
verb to play the DTMF tone for tbe number 9 to open the gate (I actually send four 9s just in case the gate happens to be hard of hearing at the time).
Fallback
I have enough professional software development experience to know that my code will break at some point and the webhook will be unreachable for whatever reason. As such, it was important to me to have a reliable fallback. Luckily, the smart folks over at Twilio thought of that!
You can choose one of a few fallbacks. The one I opted for is a “TwiML bin”, where you specify static TwiML that should be executed in the case of a webhook failure.
In this case, my fallback causes Twilio to redirect the call to my cellphone, so in the absolute worst case, the visitor just thinks that it’s taken a little longer for the call to connect:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Dial>{my cellphone number}</Dial>
</Response>
I can barely contain my excitement
I needed to deploy the GateKey service somehow. I settled on GitHub actions that are triggered when I push changes to the package’s mainline
branch (no need for releases if I’m the only one making changes or using it), which
- use Gradle to build the service
- build it into a docker image
- upload it to DockerHub
- SSH to my server
- pull down and start up the new image
D’ing my DNS
Because I’m hosting from home and don’t pay for a static IP address, mine can change randomly. I could have used a provider like DuckDNS (who I would recommend if you’re looking for a quick and easy Dynamic DNS solution), but I thought it would be more fun to do it myself.
A while before starting this project, I’d purchased a domain through CloudFlare, just in case I happened to need one for something. Luckily I did because CloudFlare’s got a handy set of APIs for managing DNS records, which made it easy to write a little Python script that:
- makes a request to ipify.org every 5 seconds to get my IP address
- caches the result
- updates the CloudFlare DNS record if my IP address has changed
once that was all done, I set up a GitHub action that triggers when I push changes that SSH’s to my server and configure the script to run with systemd.
Because I don’t get paged enough
Given that we were going to rely on this system working for visitors to be able to get to our house, I want to be confident that it’s working, even if there is a fallback in place. The way I decided to do that was to run a canary on AWS Lambda that polls a health check endpoint on my server every 30 minutes to make sure that it’s still alive.
This script was even more straightforward than the DNS one. Because I’m running it in a Lambda, I don’t want it to be complex and long-running, becuase that’ll cost me money. Instead, it runs for 1 second every 30 minutes and stays comfortable wihin the AWS free tier. The canary sends emails from its own gmail account, the details of how to do that are in the canary’s README.
There’s no complicated deployment story here. I opened the AWS console, copy/pasted the contents of the script into a new Lambda, and set up an EventBridge rule to run it every 30 minutes.
Closing the gate on the project
That’s it! The system does what it needs to do, so next time you visit, you’ll get a nice robot lady answering at the gate.
An addendum
Added on 2023/10/23
Intercoms, like the Mircom one at our complex may have their keypads disabled during calls. Without an active keypad, no DTMF tones are sent to the recipient of the call, which prevents a system like this one from working at all. Luckily, our complex manager is friendly and was happy with me enabling it. I’d recommend checking on the active keypad configuration of your intercom (or the willingness of your complex/building manager to let you play with it) before embarking on a project like this one. I found our intercom model’s manual online, so you can likely do the same for whichever model you’re working with.