The government is testing an interesting use of predictive analysis. As described in the article, “Homeland Security Tests Crime Prediction Tech,” the Department of Homeland Security is testing a system that can predict when people might have a tendency to commit criminal behavior before it happens.
As seen here, predictive analysis is becoming an emerging trend in the business world, no longer used primarily by statisticians or economists. Previously, this type of statistical analysis was highly utilized by economists to allow them to predict market behavior, such as the price of oil. Predictive analysis aims towards raising expectations of an event occurring given the combination of other events. In mathematical terms, it is the science of statistically assuming a value for a variable given a set of other correlated variables. To perform predictive analysis one must master two parts: the mathematical model that captures the relationship between the variables and the data base that stores the data itself.
The primary challenges encountered by building the mathematical model are the accuracy, usefulness, and meaningfulness of the data variables. The DHS project that this article describes sounds, at first glance, very sci-fi. But to a predictive analyst, it sounds challenging. As DHS builds the model that this article describes, they will likely encounter several challenges. First, they will have to set parameters on all of the data variables involved. For instance, they will need to know what eye blinking patterns are considered a pattern and what are not; what is an eye blink and what is not; does eye blinking correlate to heart rate, and so on.
Another challenge they will likely face is how much data will DHS be able to collect store, sort, and analyze? Hypothetically let’s assume eye blinking behavior was captured by three variables — for instance frequency, speed, and type. Now let’s assume there are 100 blinking frequencies, 100 blinking speeds, and 10 blinking types; then there exists 100,000 distinct blinking behavior data types. Now all of those 100,000 data points need to be multiplied with the crime data to identify which of those 100,000 points are associated with criminal activity. That only covers the eye blinking. When multiplied by other data points for the heart rate, movement, and temperature, we will easily be taking about terabytes by terabytes of data (i.e. Yottabytes of data). This is just the data needed to start accurate behavior identification!
So DHS has to build a model where yottabytes of data points are classified and used to build an accurate model, then they have to develop a real time acquisition system to gather terabytes of data in real-time and run them through this model. It isn’t an easy task given that no system yet can store that much data from a performance and capabilities perspective. It will be interesting to see what comes of DHS’ project. While it seems to be futuristic (i.e. – Minority Report), we are in a new era of big data processing. As technology improves, maybe we will be able to identify the meaning of human behavior down to the blink!