Skip to main content
Version: Next

Monitoring

With thin-edge.io device monitoring, you can collect metrics from your device and forward these device metrics to IoT platforms in the cloud.

Using these metrics, you can monitor the health of devices and can proactively initiate actions in case the device seems to malfunction. Additionally, the metrics can be used to help the customer troubleshoot when problems with the device are reported.

thin-edge.io uses the open source component collectd to collect the metrics from the device. thin-edge.io translates the collectd metrics from their native format to the thin-edge.io JSON format and then into the cloud-vendor specific format.

device monitoring with collectd

Install​

Device monitoring is not enabled by default, however it can be enabled using a community package, tedge-collectd-setup, which will install collectd and configure some sensible defaults including monitoring of cpu, memory and disk metrics.

sudo apt-get install tedge-collectd-setup
note

The default collectd settings, /etc/collectd/collectd.conf, use conservative interval times, e.g. 10 mins to 1 hour depending on the metric. This is done so that the metrics don't consume unnecessary IoT resources both on the device and in the cloud. If you want to push the metrics more frequently then you will have to adjust the Interval settings either globally or on the individual plugins. Make sure you restart the collectd service after making any changes to the configuration.

Background​

The following sections provide information about further customizing the collectd settings and give some background about how the collectd messages are processed by the tedge-mapper-collectd service.

collectd configuration​

You can further customize the default collectd configuration by editing the following file:

/etc/collectd/collectd.conf

Details about collectd plugins and their configuration can be viewed directly from the collectd documentation.

However keep in mind the following points when editing the file:

  1. MQTT must be enabled.
    • thin-edge.io expects the collectd metrics to be published on the local MQTT bus. Hence, you must enable the MQTT write plugin of collectd.

    • The MQTT plugin is available on most distribution of collectd, but this is not the case on MacOS using homebrew. If you are missing the MQTT plugin, please recompile collectd to include the MQTT plugin. See https://github.com/collectd/collectd for details.

    • Here is a config snippet to configure the MQTT write plugin:

       LoadPlugin mqtt

      <Plugin mqtt>
      <Publish "tedge">
      Host "localhost"
      Port 1883
      ClientId "tedge-collectd"
      </Publish>
      </Plugin>
  2. RRDTool and CSV might be disabled
    • The risk with these plugins is to run out of disk space on a small device.

    • With thin-edge.io the metrics collected by collectd are forwarded to the cloud, hence it makes sense to disable Local storage.

    • For that, simply comment out these two plugins:

      #LoadPlugin rrdtool
      #LoadPlugin csv
  3. Cherry-pick the collected metrics
    • Collectd can collect a lot of detailed metrics, and it doesn't always make sense to forward all these data to the cloud.

    • Here is a config snippet that uses the match_regex plugin to select the metrics of interest, filtering out every metric emitted by the memory plugin other than the used metric":

      PreCacheChain "PreCache"

      LoadPlugin match_regex

      <Chain "PreCache">
      <Rule "memory_free_only">
      <Match "regex">
      Plugin "memory"
      </Match>
      <Match "regex">
      TypeInstance "used"
      Invert true
      </Match>
      Target "stop"
      </Rule>
      </Chain>

tedge-mapper-collectd​

The tedge-mapper-collectd service subscribes to the collectd/# topics to read the monitoring metrics published by collectd and emits the translated measurements in thin-edge.io JSON format to the measurements topic.

The metrics collected by collectd are emitted to subtopics named after the collectd plugin and the metric name. You can inspect the collectd messages using the following commands:

tedge mqtt sub 'collectd/#'
Output
[collectd/raspberrypi/cpu/percent-active] 1623076679.154:0.50125313283208
[collectd/raspberrypi/memory/percent-used] 1623076679.159:1.10760866126707
[collectd/raspberrypi/cpu/percent-active] 1623076680.154:0
[collectd/raspberrypi/df-root/percent_bytes-used] 1623076680.158:71.3109359741211
[collectd/raspberrypi/memory/percent-used] 1623076680.159:1.10760866126707

The tedge-mapper-collectd translates these collectd metrics into the thin-edge.io JSON format, grouping the measurements emitted by each plugin:

tedge mqtt sub 'te/+/+/+/+/m/+'
Output
[te/device/main///m/] {"time":"2021-06-07T15:38:59.154895598+01:00","cpu":{"percent-active":0.50251256281407},"memory":{"percent-used":1.11893578135189}}
[te/device/main///m/] {"time":"2021-06-07T15:39:00.154967388+01:00","cpu":{"percent-active":0},"df-root":{"percent_bytes-used":71.3110656738281},"memory":{"percent-used":1.12107875001658}}

From there, if the device is actually connected to a cloud platform like Cumulocity, these monitoring metrics will be forwarded to the cloud.

tedge mqtt sub 'c8y/#'
Output
[c8y/measurement/measurements/create] {"type": "ThinEdgeMeasurement","time":"2021-06-07T15:40:30.155037451+01:00","cpu":{"percent-active": {"value": 0.753768844221106}},"memory":{"percent-used": {"value": 1.16587699972141}},"df-root":{"percent_bytes-used": {"value": 71.3117904663086}}}
[c8y/measurement/measurements/create] {"type": "ThinEdgeMeasurement","time":"2021-06-07T15:40:31.154898577+01:00","cpu":{"percent-active": {"value": 0.5}},"memory":{"percent-used": {"value": 1.16608109197519}}}

Troubleshooting​

For troubleshooting tips, check out the device monitoring section.