Enhancing the Accuracy of Trackpac's Location Resolution Engine using Outlier Detection
Accurate asset tracking is essential for businesses to monitor their valuable resources effectively. To improve the reliability of our location resolution system, we have implemented an outlier detection mechanism along with outlier removal.
We want to walk you through our process of detecting and removing outliers and how it enhances the accuracy of our location engine.
Understanding Outlier Detection
Outliers are data points that deviate significantly from a dataset's normal distribution or expected patterns.
Identifying outliers is crucial as they may indicate errors, anomalies, or unusual events in the tracked data. Outlier detection allows us to identify these exceptional data points and take appropriate actions to ensure data accuracy and reliability.
This becomes important on the Helium network as hotspot hosts control where their hotspots are asserted. Sometimes they are asserted incorrectly for non-malicious reasons, such as a recent transfer of ownership, with the location still needing to be updated. Other times it's a sign of gaming (faking results for better rewards).
To implement outlier detection, we utilise the concept of z-scores. A z-score measures the number of standard deviations a data point is away from the mean of a distribution. By calculating z-scores for latitude and longitude values obtained from each hotspot that hears our sensor, we can identify outliers based on their deviation from other hotspots in the area.
This means that if a hotspot reports the uplink 100km away from the rest, it's likely this hotspot needs to be updated and should not be used.
1. Extract The Latitude and Longitude of Hotspots: We begin by extracting hotspots' latitude and longitude values reported in an uplink.
2. Calculate Mean and Standard Deviation: We calculate both variables' mean and standard deviation using the extracted latitude and longitude data. These statistics serve as reference points for identifying outliers.
3. Calculate Z-Scores: With the mean and standard deviation values in hand, we compute the z-scores for each latitude and longitude value using the formula (x — mean) / std, where x represents the value, mean represents the mean of the distribution, and std represents the standard deviation.
4. Set a Threshold: We define a threshold to determine which data points are considered outliers. Any data point with a z-score higher than the threshold is considered an outlier.
5. Identify Outliers: By comparing the z-scores against the threshold, we identify the indices of the data points that fall into the outlier category. These indices indicate the specific data points that deviate significantly from the expected values.
6. Remove Outliers: Once the outlier indices are identified, we remove the corresponding data points from the dataset before resolving the location. This helps eliminate the influence of outlier values on the location estimation accuracy.
Implementing outlier detection and removal in the Trackpac location resolution engine has significantly enhanced the accuracy and reliability. By calculating z-scores, identifying outlier indices, and removing the outliers from the dataset, we can ensure that the location estimation is based on reliable and representative data points.
We will continue fine-tuning our outlier detection and removal mechanism, considering domain knowledge, specific use cases, and evolving data patterns. This ongoing refinement aims to provide our users with a robust and dependable asset-tracking solution that meets their diverse tracking needs.
We're also working on publishing a list of outlier hotspots. This list could help improve anti-gaming/poc reward systems on the Helium Network by automatically handling these invalid placements or disabling them when they cause issues. For now, Trackpac doesn't use the data from these hotspots once identified as an outlier.