All Articles

Data Filters for IoT devices

Tl;dr

I wake up the fitness tracker once every 30s, it can receive 60 measurements from the Accelerometer sensor and I don’t have memory space to log all the measurements.


In this blog post I will try to explore what data filters are more representative of a population, depending on what we are looking for.

For example, the mean and the variance can give you an idea of the representation of the data, with a minimum data size, instead of sending 30 data items , you will only send 2 data items (the mean and the variance).

So what are the properties we can keep in this data?

It depends on our main query:

  • In case we want to know if a certain data exists, we use a Bloom Filter.
  • In case where we are looking for outliers we will keep some of them.
  • In case we are searching for patterns we can:
    • Derive data to find when it increases and when it decreases.
    • Cluster data using a Similarity function, and report on characteristics of the clusters.
    • Compress the data by removing redundant data.
    • Represent the data using a Model, like Markov Chain.
  • Other data may include : min, max, mean, variance, median, mad, Mod z-score, etc.

My favorite ones are : the Mod z-score for outliers detection, mean and features (I will write about that later - Be the first to know).

The utility

  1. Compress the data. In IoT, the host (Main Unit) can get a lot of data from the sensors ( eg: Accelerometer, GPS, …), it has a small memory (16MB in average), so you have to optimize what you want to store:

    • Using the Accelerometer data, I wanted to know which side of the fitness tracker was facing up. The data raw data has 3 axes of two bytes each. So I ended up with a simple data composed of 3 bits to represent the 6 faces. Moreover, if the face won’t change, then, don’t report.
  2. Privacy by design. Using filters at the very beginning, means you get only what you need, and you don’t expose the user data for other adversaries.

  3. Real time decisions. Filters can give you a simple data formats upon which, you can make decisions in the IoT device itself, here are some examples :

    • When the Accelerometer detects motion, then, activate GPS. This can be found as a hardware feature called WOR (Wake on Motion) implemented inside the Accelerometer itself using the DMP(Digital Motion Processor), their aim is to offload the host processor (Only consuming 68uA instead of 10mA).

    • When the GPS locations are on the same place for a long time, then, deactivate GPS.

Finally, here is my wish, all of this would be simpler with the ReactiveX library, as it has its pre-built data reducers called Observable Operators, where you can apply operators ( eg:Debounce, Reduce, Distinct). My only wish is to find a simple C implementation with the Amazon FreeRTOS, it would simplify the design of all the operations (Observe => Map => Filter => Combine), and give us a normalized way to solve these problems and communicate around it.