Home

How we updated a drone while flying - and how you can too!

Dockercon 2016 took place in Seattle just a short while ago, and we were really excited (and slightly nervous) to be invited to the main stage for the conference's closing keynote, the Cool Hacks section. We demoed live updating the software running on a drone, while being in the air, mid-flight. Here's the video of what happened there, check it out!

In this blogpost I hope to untangle what happened in the demo, and show how you can use tools and techniques from this "cool hack" with your own drone - or any other device.

Intermission: drone hardware

For the demo we had a drone from Dronesmith, and their co-founder & CTO Greg Friesmuth came to help us fly it (as we are pretty good at software updates, while he's a lot better at not crashing into the audience, each has their own specialties).

Our drone at Dockercon 2016

The controller was a Dronesmith Luci board, which combines a flight management unit (FMU) and an Intel Edison board for higher level applications. The two communicate with each other over the MAVlink protocol. The resin application (the thermal camera control, gyroscope data streaming, web interface, etc) was running on the Edison.

The drone frame was QK2, also by Dronesmith, which was designed for prototyping drone applications. Other drone frames should work as well, though might need some customization (such as this different prototype below).

Dronesmith Luci board

The thermal camera, an important part of the demo, was a PureThermal1 module.

We list some information on how to buy these parts in the last section of this article!

Update magic

The drone works as a resin.io device: the Intel Edison on the controller board is running resinOS to run a device supervisor, and run Docker for the "useful payload", the user application. The user application is created as a Docker container from the source pushed with git to the resin.io servers. The supervisor on the device keeps checking the servers for any newer version of the application. If found, it downloads and replaces the running container.

The core aspect of the demo was the very quick hand-over between the old and new version of the software running on the device. The resin.io tools already permitted reliable and safe update before, which would have worked as follows in the case of a hypothetical drone:

"Old-school" update process:

  1. Block application updates with an update lock
  2. The supervisor checks for, and downloads (but does not install) the latest software update that has been pushed by the user
  3. The application can use the supervisor API to query whether an update is pending
  4. If an update is pending, find a safe place to land (maybe go back to base?), unlock updates, install and start the new version of the software
  5. Go back to the mission, and resume from step 1

There's nothing really wrong with this routine, but it can take a while, and it's not always desirable to interrupt the mission to receive an update. The application switchover time, that is killing the old application container and bringing up the new one, can take up to a few seconds, not to mention the time for the round trip and landing.

At resin.io we are working to reduce the barriers to updating embedded device software, because we're convinced that a world with less outdated software is a better one. To that end, we have introduced update strategies, as mentioned in the demo, to implement a "hand-over" between the old and the new version of the software, facilitated by the supervisor.

Handover update strategy

"Space age" update process:

  1. Supervisor checks for, and downloads a software update
  2. The new application container is started, and as its first order of business, it signals to the old container that it's time to go
  3. The old container wraps things up by releasing and transferring its resources to the new container, and requests to be retired
  4. The new container picks things up, and continues the mission

While preparing for this drone demo, depending on the hardware setup, we have measured 50-200ms handover time, or more scientifically "half a blink of an eye". This is short enough time, that even if the rotors were turned off for that long, the drone would not fall out of the sky (we've tested it in a safe environment:)). We also haven't focused on minimizing this time yet, and suspect that it can be shrunk even further.

Diving into the the software

The software used in the demo, with all its demo/work-in-progress quality is available on Github. Since the data communication is over MAVlink, it could be adapted to any other drone, UAV, or ground vehicle that uses that protocol.

The application uses an Alpine Linux base image, as it results in quite a bit smaller images than than for example using Debian, the other, more commonly used base image on resin.

The onboard tasks are handled by a combination of a Python script, which forwards messages from the FMU, and handles the application handover, and a Node.js server, which creates the web interface that we access over the Public Device URL of the drone. Out of these two parts, for this demo, the Python script (mavlink_forwardpubnub.py) is more interesting and/or unique. There is no fixed form how the application containers should handle the transition during the hand-over update strategy, as the requirements depend very strongly on the use case at hand. Thus at this moment every user would need to roll their own. This script contains at least some examples of how it can be done using tools available in Linux. In this case:

  1. first instance of the script creates a UNIX socket, starts its main task loop, and tries to write to it every time when it goes around in the loop
  2. The second, new instance tries to connect to the same socket, and tries to read from it
  3. Since the first instance can send data now, it realizes that its time is up, so cleans up, signals to the supervisor that the application container is ready to retire, and exists
  4. The second instance recreates the socket, sets up its environment, and kicks off of the main task loop again (until a third instance comes around, and so on)

To make things clearer, here's a simplified version of the handover script, with the non-update-relevant, drone-related code stripped out.

"""
Python `hand-over` strategy example  
"""
import os  
import sys  
import socket

SOCKFILE = "/tmp/handover.sock"

# Check for this script already running
# by trying to receive one byte of data over the socket
try:  
    client = socket.socket(socket.AF_UNIX,
                           socket.SOCK_STREAM)
    client.connect(SOCKFILE)
    client.recv(1)
except socket.error as err:  
    pass

# Remove socket if already exists since now we are
# the only code that runs after the previous check
if os.path.exists(SOCKFILE):  
    os.remove(SOCKFILE)

# Recreate socket
server = socket.socket(socket.AF_UNIX,  
                       socket.SOCK_STREAM)
server.bind(SOCKFILE)  
server.listen(5)  
server.setblocking(False)

def heavylifting():  
    """ Main task to run within this application
    """
    i = 0
    while True:
        print("Doing some work, count to {}".format(i))
        i += 1

        # Handover check
        try:
            # Try to send data over the socket
            conn = server.accept()[0]
            conn.close()
            break
        except socket.error:
            # Other end of socket not connected,
            # continue looping
            pass

# Run the main task in a loop
heavylifting()

###
# Release resources for handover:
# * Close file handles
# * Close serial port
# * ...
#
# In the real application finally create this file
# to signal being ready to retire:
# open('/data/resin-kill-me', 'w+').close()
###

# Exit
# Force exit all threads
sys.exit(0)  

You can try it by saving it as a file, say handover.py, running one instance in the terminal (it should start count numbers), then running another instance in another terminal (the first instance should exit and the new start counting numbers). Repeat as needed.

Going live with the IR camera

This is of course just one example of a handover, there could be more complex and more interesting solutions. For example one could send over open file descriptors to the new process (e.g. the serial port connection), thus saving time and not losing any data. This is left as an exercise for the reader.

Onwards and upwards

To read more about the available update strategies, check out our documentation, where we will also add some good basic patterns for hand-over, just like it is shown above. If you have any suggestions of how you'd do it, leave it in the comments, or talk to as on Gitter!

If you'd like to build your own drone like the one we've used, you can can check out the Groupgets page for the drone frame, for the Luci board, and the thermal camera. It's all pretty new technology, so not all pieces might be available all the time for order, unfortunately!

All of this information above also applies to all our other supported devices, so you can choose to build your next drone army (or other resin-powered application) on top of Raspberry Pi, Beaglebone Black/Green, Parallella, or many other controller boards.

In closing, for me it is quite mind-boggling to think of the amount of complex architecture needed to pull this "cool hack" off, which is in the end not just a hack, but a production-grade system. After all live demoing to a crowd of 4000 people plus the web audience is a pretty mission-critical environment!

comments powered by Disqus