Flawed Reality: Your Data Set
Flawed Reality with Tyrone Grandison
Artificial intelligence (AI) is only as effective as the data that feeds it. When presented with a problem, AI solutions rely on past data to produce insight.
The first step in correctly building an AI system starts with asking the right question. Without knowledge of and focus on the core question that needs to be answered, any deployed AI solution will produce results that are ineffective; if not downright destructive.
At the most basic of levels, the collectors of data are humans and every human has a set of biases that guide every aspect of their world, every day. Human bias and prejudice directly influence and create every dataset today. These datasets are imbued with the conscious and unconscious beliefs and dispositions of the humans that create data and or data ingestion systems. Thus, a lot of the data in the world is flawed.
If this flawed data is fed into an algorithm that produces results, which guide decisions for the future, the loop of biased AI decision-making is unleashed, once again, upon the world and the vicious cycle continues.
In order to protect our future from incorrect assumptions, bad decisions, or discriminatory practices, we must interrogate the data that we create and use.
Today’s data is flawed, which means we’re living in a flawed reality. Future AI solutions will be riddled with discrimination if we aren’t honest about our current reality. While no dataset will ever be perfect, we must recognize the flaws in the process of collecting data in order to fix it. Additionally, each AI practitioner is also human and brings their issues to the application of their knowledge and skills to AI development. The people who interpret the results of AI solutions bring their own belief systems and positions. The entire pipeline is prone to the inclusion of flaws at many points, which means it is imperative that the process chain be examined holistically.
It is critical to deeply understand the current state of everything that touches the data and the data ecosystem in order to spot undesirable discrepancies. Failure to do so will create a feedback loop of harmful assumptions baked into an AI architecture that then recycles those assumptions into the applications it was created for, essentially resulting in a perpetual cycle of artificial implicit bias. Some of these assumptions are racist, ableist, or sexist, and if they aren’t spotted, these assumptions feed into an AI landscape that will most likely have a devastating impact on a global scale.
For example, when examining police data for traffic stops, the data cannot be taken at face value. Some of the discrepancies may be innocuous, like the varied verbiage and semantics associated with particular police incidents across stations. Each department may interpret the definition of a traffic stop slightly differently. Traffic stops are ultimately a human endeavor, meaning the biases in these human interactions become the default as soon as the data is used in AI solutions, for everything from predictive policing to traffic optimization.
A Better Reality
The key component to solving this problem is intentional and meaningful diversity up and down the value chain. Ideally, diversity starts with the people who are generating the data, which means they must be aware of their biases and actively seek to remove its impact. There must be diversity in the team that is building the tools. There must be diversity in the team that evaluates the data. Blind spots must be identified and corrected, and it takes a diverse team of individuals to fill in the gaps and right the ship. This process takes time and may involve a lot of trial and error.
The importance of recognizing our flawed reality and taking steps to correct it cannot be understated. If we are to create a future that leverages artificial intelligence, we must understand that the data we feed the future now is literally a matter of life and death.