Tl;dr
I wake up the fitness tracker once every 30s, it can receive 60 measurements from the Accelerometer sensor and I don’t have memory space to log all the measurements.
In this blog post I will try to explore what data filters are more representative of a population, depending on what we are looking for.
For example, the mean and the variance can give you an idea of the representation of the data, with a minimum data size, instead of sending 30 data items , you will only send 2 data items (the mean and the variance).
So what are the properties we can keep in this data?
It depends on our main query:
- In case we want to know if a certain data exists, we use a
Bloom Filter. - In case where we are looking for outliers we will keep some of them.
- In case we are searching for patterns we can:
- Derive data to find when it increases and when it decreases.
- Cluster data using a
Similarity function, and report on characteristics of the clusters. - Compress the data by removing redundant data.
- Represent the data using a
Model, likeMarkov Chain.
- Other data may include :
min,max,mean,variance,median,mad,Mod z-score, etc.
My favorite ones are : the Mod z-score for outliers detection, mean and features (I will write about that later - Be the first to know).
The utility
Compress the data. In IoT, the host (
Main Unit) can get a lot of data from the sensors ( eg:Accelerometer,GPS, …), it has a small memory (16MBin average), so you have to optimize what you want to store:- Using the
Accelerometerdata, I wanted to know which side of the fitness tracker was facing up. The data raw data has 3 axes of two bytes each. So I ended up with a simple data composed of3 bitsto represent the6 faces. Moreover, if the face won’t change, then, don’t report.
- Using the
Privacy by design. Using filters at the very beginning, means you get only what you need, and you don’t expose the user data for other adversaries.
Real time decisions. Filters can give you a simple data formats upon which, you can make decisions in the IoT device itself, here are some examples :
When the
Accelerometerdetects motion, then, activateGPS. This can be found as a hardware feature calledWOR(Wake on Motion) implemented inside theAccelerometeritself using theDMP(Digital Motion Processor), their aim is to offload the host processor (Only consuming68uAinstead of10mA).When the
GPSlocations are on the same place for a long time, then, deactivateGPS.
Finally, here is my wish, all of this would be simpler with the ReactiveX library, as it has its pre-built data reducers called Observable Operators, where you can apply operators ( eg:Debounce, Reduce, Distinct).
My only wish is to find a simple C implementation with the Amazon FreeRTOS, it would simplify the design of all the operations (Observe => Map => Filter => Combine), and give us a normalized way to solve these problems and communicate around it.