The ability to remotely manage IoT fleets and deploy updates over the air (OTA) is essential for security and business model expansion. But managing a large fleet with millions of devices presents many challenges. For example, a large device fleet may run on various architectures. Also, some devices in the fleet may connect to different backends, making updating particularly complex.
Senior Engineer Anibal Portero gave a talk at the Open Source Summit held in Austin earlier this year. Anibal spoke on how to leverage modern open protocols and containers so that operators can update IoT fleets that run on multiple architectures with any OTA cloud provider.
What operators need in an IoT fleet deployment system
IoT fleets are not uniform and can consist of devices running on many different Linux architectures with a wide range of resource requirements. When faced with managing and deploying to such a diverse fleet of embedded systems, at a minimum, an IoT deployment and management system needs to do the following:
- Fully remote updates – The minimum requirement is a system that automates deployments and can apply updates over the air to IoT fleets whenever required.
- Reproducible states – Zero downtime for the fleet is a must. Therefore, Operators must be able to recreate device states either by generating a given device state or accessing one through a version control system.
- Version history with the ability to roll back or forward – Operators must be able to automatically roll back and return the device to a good state. If a system cannot accomplish this, they risk bricking devices, which can require an expensive visit from a technician. In addition, freely rolling changes back and forward without worrying about bricking devices allows for more experimentation and innovation when deploying new services and features.
- Ability to send meaningful feedback – A sound system must generate meaningful feedback on the deployment state so that the system can respond appropriately.
- Independence from OTA providers – And lastly, most product makers and customer service providers want to avoid vendor lock-in and move from one provider or another or even take advantage of multiple providers if needed.
High-level system components for OTA providers
At a high level, the system consists of three main components: application, update manager, and cloud client.
Application level
The application layer on the embedded device requires updating and maintenance.
Update manager
The update manager is software on the device side that manages how the updates get installed. Ideally, the update manager is also aware of what version is running at any moment to manage update transactions and provide status feedback to the cloud client.
Cloud client
The cloud client is software on the device that communicates with the system in the cloud. The primary purpose of the cloud client is to:
- Check for new updates.
- Download new updates for the update manager.
- Provide and get feedback from the update manager and then send it back to the cloud.
Available OTA IoT management solutions
According to Anibal, the solutions available today fall into three categories: full stack cloud device, agent, or agentless. This table compares the different solution categories and discusses the pros and cons of each approach with example solutions.
OTA Solution | Description | Pros | Cons | Examples |
---|---|---|---|---|
Full stack cloud-device | The cloud provider offers a complete end to end solution with a flashable image that provides both the Update Manager and the cloud client functionality. | Out of the box flashable image with a fully working lifecycle. | Device is tightly coupled with cloud provider. Limited choice of OS. | |
Agent | Cloud provider offers an installable agent, which is basically a cloud client with some functionality from the update manager. | Can integrate with any embedded device and any Linux distro. | Requires integration work. Updates are limited to application level. |
|
Agent-less | Server is in charge of communication with device using SSH protocol, so the role of the update manager is done by scripts sent from the cloud side, while the cloud client is represented by the SSH server. | The only dependency in the device is the SSH server | There is no feedback to the cloud, which can lead to scalability and reproducibility issues |
How can you decouple the cloud from the device?
All of the solutions described above pose challenges when you have multiple backends in one fleet or across different fleets. Each solution is dependent on a proprietary implementation that limits use with another independent cloud provider. Because of these dependencies, updating a fleet with devices that integrate with different backends is cumbersome at best and may not be possible.
One way to solve the dependency issue is by having a wholly independent and reproducible state definition. We can turn the application layer into a set of containers to accomplish this. This set of application containers can be referenced from a single state JSON as a set of binary artifacts and configuration files stored in a source control system and reproduced as immutable containers whenever needed.
Pantavisor container orchestrator is the update manager
Returning to our nomenclature, Anibal says that a container orchestrator (like Pantavisor) running on the device represents the update manager and is in charge of installing device state versions. Since device states are containers, the orchestrator installs and runs the appropriate containers on the device.
It is also possible to containerize the cloud client, which removes any dependencies and required integration work. The cloud client is still in its original form; instead, it now runs inside a container. The cloud client, however, still needs to communicate with the orchestrator, so we need an open and universal update protocol.
“Following the Unix philosophy that teaches us to do one thing and do it well, we decomposed device-side software to get the most straightforward container orchestrator we call Pantavisor.
You can add more functionality to this basic-yet-versatile set up in containers with privileges that can control the orchestrator using an open protocol as we did with our cloud client container.”
– Anibal Portero, Senior Engineer, Pantacor
Control socket
We created a control socket to solve the communication problem between the cloud client container and the orchestrator. The container orchestrator runs a small server communicating with the cloud client on a UNIX socket over the HTTP protocol.
Update workflow
These are the minimal steps the cloud client must perform to install and run a new update:
Container orchestrator (Pantavisor) endpoints
The container orchestrator also provides the following endpoint to the cloud client container:
- Steps – Installs new state JSONs onto the device
- Objects – Installs new artifacts onto the device
- Commands – Tells the orchestrator to run a given update
- Metadata – Retrieves information about installation and execution that sent to the cloud
For more information, see: https://docs.pantahub.com/pantavisor-commands/.
Example use cases and tutorials
Azure Device Update Agent Client:
Implements a content handler so the Agent can communicate with Pantavisor
- Supports Pantavisor updates through Azure IOT Hub
See:
- Hands-on Tutorial over the Air device updates with Pantavisor and Azure IoT Hub
- https://gitlab.com/pantacor/pv-platforms/ph-client
Pantacor Hub Client:
- Directly uses our cloud REST API
- Can be used as an example on how to connect Pantavisor and other clouds that offer a REST API for OTA updates
See: https://gitlab.com/pantacor/pv-platforms/ph-client
View the talk in its entirety
View the talk here (note that a portion of the video was unfortunately missed during the OSS Linux Foundation summit recording):
If you have any questions on this talk or anything else, please reach out to us at https://community.pantavisor.io. Our engineers are happy to answer any questions you may have.