What is MQTT?
MQTT (Message Queue Telemetry Support) is a lightweight protocol used for transferring data. It is primarily used by IoT devices because it is low power and uses very little data.
MQTT doesn't use direct addressing rather it using a subject line called a 'Topic'. Messages are sent to and received by anyone who is 'Subscribed' to the 'Topic'. Sending a message is called 'Publishing'. This architecture is referred to as sub/pub.
This can be seen in the picture below where an edge device (temperature sensor) publishes data to the topic 'Temp'. The message data is then sent to a users smartphone which has subscribed to that topic. While MQTT could technically be sent Peer-to-Peer, it is much easier to use a Broker as will be explained.
When sending is done through a broker and the addressing is by Topic, this allows a One-To-Many communication without the sending client needing to know or specify all of the receiving nodes addresses. All the edge device needs to do is to Publish to the broker. This keeps the MQTT protocol very lightweight and perfect for low power, low bandwidth devices such as IoT.
Similarly you can have many different devices all sending data to one centralized place such as a database for logging, using a Many-To-One architecture.
Why a MQTT Broker reduces attack surface:
A typical SCADA system may look like this picture, which many open ports on every single device to send/receive data to many different endpoint and management controls. This can leave open extra ports increasing the attack surface. Additionally, every single endpoint needs to know the IP address, Domain, and routing information of every other device in order to communicate with it. This means that one compromised device can be used to pivot and attack the rest of the network.
By using a MQTT Broker (such as EMQ) we centralize all communications. There are many benefits that arise from using a Broker:
- The MQTT broker knows all connected devices on the network.
- Endpoint devices are not aware of each other so that if one is compromised then the infecttion wont spread.
- Less ports need to be opened.
- Authentication is handled centrally by the broker as opposed to having each endpoint manage authentication and hoping that they are doing it securely.
- Managing and updating is much easier. We don't have to worry that an update to one device will break some other device somewhere else on the network.
- Scalability. New devices can be added or removed at will without the need to write complicated communication paths.
The Broker enforces Security using:
- TLS Encryption
- Username/Password protected communication
- Optional certificate based authentication
MQTT clients can be configured to send data when a value has changed such as temperature, battery power, or GPS coordinates. They can also send data every X number of minutes or hours. When a new device joins the network and subscribes to a specific 'Topic' (eg 'Temperature') the Broker send the device the previous value of that topic (eg 38 degrees) so the new device can calibrate against that value and send changes when they occur. The broker doesnt save all values that have been sent, only the most recent value so that it can inform new devices of the value. If you want to save all data then you would need to set up a database as a client who is subscribed to the 'Topic.'
Brokers do however save what is called Persistant Session information. This includes information about which device is signed up to which Topics so that when the device goes offline, the device will not need to Subscribe again to all of the Topics. The Broker stores this information for when the devices comes online again.
Birth, Death and Last Will and Testament (LWT)
When a device connects to a Network it sends the Broker 3 messages, the Birth, Death and LWT messages. The broker stores these and forwards them in the following circumstances. Birth - When a new device connects to the network the Broker broadcast Birth to let the other devices know about the new device. Death - When a device ends its session, a death notice is sent out so that other devices know this device is no longer connects. LWT - when a device suddenly ends communicating (eg power or network failure) a LWT is sent out so that other devices should not rely on this device. Additionally, the LWT notice can be used to activate backup systems or simply to notify a manager that a device has failed.
MQTT and Apache Kafka
What is Kafka? From the official kafka.apache.com
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Kafka is used to collect data from many sources and send it to various analytic or processing devices. As an example, Netflix uses Kafka to to display real time video recommendations based on current and past viewings. Uber uses Kafka to compute ride pricing in real time based on traffic and to increase fares during peak times based on current traffic.
The Kafka architecture is similar to the MQTT architecture as we see in the next pictures.
This picture shows many Source and Targets systems connecting without using Kafka. Notice how connecting the 4 Source systems to the 4 target systems requires writing 16 integrations! Also, each integration requires using the correct protocol (TCP, HTTP, REST, FTP) and different data formats (Binary, CSV, JSON).
By using Kafka we not only simplify the architecture as we see in the picture. We also make writing and adding integrations much easier.
However many times IoT systems use both MQTT and Kafka. The reason is because these two protocols accomplish two different yet complimentary goals. MQTT focus on the management of low power, low bandwidth edge devices. It allows collecting data and managing many devices to be as seamless as possible. However it order to take that data and pass it on to more high performance applications such as a web site, analytics software, etc. then we need to use Kafka.
To be clear MQTT and Kafka work in tandem. The data is passed from the hundreds of IoT devices to the MQTT Broker which then passes it to the Kafka Broker to be passed on to whatever system needs it. In the picture above the MQTT Broker is one of the Source Systems and Graphing Software is one of the target systems. To take the Uber example, there may be hundreds of MQTT traffic sensors all around the city that sense the real time traffic. This is then sent to a Broker which sends it to Kafka which sends it to the algorithm that computes the pricing.
In summary, the IoT stack is frequently comprised of:
- Edge device
- MQTT Broker
- Apache Kafka
- Higher powered applications
I hope this helps explain the basics of the MQTT protocol and how IoT devices communicate.
Cheers till next time!