AI as Judgment Machine?
In the realm of AI, a crucial line exists between prediction, the ability to estimate the probability of future outcomes based on data, and judgment, the capacity to assess and evaluate situations.
In the past year or so, we have been seeing an increasing rate of AI adoption in interesting use cases across the industry. With the remarkable power of large language models added to the toolbox of AI engineers, we are seeing more and more applications that blur the boundaries of work done by human versus machine. As we see adoption in deeper layers of the society, and as these applications become more sophisticated, it is becoming harder, especially for non-expert users, to decipher what the machine is actually doing.
This lack of clarity, coupled by rushed and careless user experiences designed and built by engineers and product people could result in unintended consequences for the ever increasing areas of human-machine interaction. Imagine the outcome of a not-so-techy grandma interpreting the next-word-prediction output of ChatGPT as medical judgment.
In the realm of AI, a crucial line exists between prediction, the ability to estimate the probability of future outcomes based on data, and judgment, the capacity to assess and evaluate situations with understanding and values. While AI excels at prediction, offering next-word suggestions or analyzing medical scans, it currently lacks the human ability to judge.
Unlike humans, AI's predictions stem from complex algorithms analyzing vast data, not genuine reasoning and understanding. While it can churn through information and identify patterns, it lacks the ability to reason about cause and effect, consider alternative scenarios, or adapt to unforeseen situations. This is due to the absence of a true world model, a comprehensive internal representation of the world we inhabit. Without this, AI cannot grasp the nuances of context, emotions, or social cues, crucial elements for making sound judgments. Furthermore, AI's training data often reflects human biases, leading to decisions devoid of ethical or moral or even safe considerations.
“But AI judges what I like to watch on Netflix!”
It is so easy to think that recommender systems, like the ones that Netflix and TikTok use, generative co-pilots, like ChatGPT or Gemini, and self-driving cars are making decisions. While the reality is that they are statistical machines making predictions about what is the most likely thing to happen next. Because of the specifics of these use cases, a significant amount of effort goes into making the most likely predictions so good that it is easy to forget that they are one of the several predicted outputs with a certain probability of resembling an average human judgment.
In the case of a recommender system, it seems that the AI is judging your taste and recommending something you'll enjoy. While in reality the algo analyzes your past viewing history, demographics, and similar users' preferences to predict how long you will spend consuming a piece of content. While it considers your past choices, it doesn't understand your actual enjoyment or deeper reasons for watching specific movies.
It might seem that the generative chatbot is understanding your problem and offering personalized solutions. While it in fact is following pre-programmed decision trees and predicts the next words in the conversation. It doesn't truly understand your intent or the nuances of your problem.
And a self-driving car might seem to “decide” to change lanes, or “choose” to yield to the pedestrian, or “selects” the right exit on the highway. In this case, the car uses its sensors (cameras, radar, LiDAR) to gather data about its environment, including other vehicles, pedestrians, and road markings, feeds that into complex algorithms that analyze the situation and predict possible future outcomes based on past experiences and training data. It uses all the data to predict the most likely action that would have been taken by a human operator.
In each of these cases, interpreting the most probable output as a judgment without carefully assessing the assumptions, biases, and limitations existing in the training data could lead to unintended outcomes.
What now?
The number of areas where we have enough data, sophisticated algorithms, and human oversight to be able to let the most likely output to pass as judgment is definitely increasing. However, doing so blindly and without the necessary ethical, responsible, and safe measures in place might result in reputational risk, miscalculated investments, or opportunity cost.
In the meantime, the safest grounds are treating AI as a prediction machine, and use cases as predictive or prescriptive analysis, and building the algorithms within user experiences that provide checks and balances, including human oversight, for robust and reliable performance.
Well, putting the nerd talk aside, if your workflow contains complex decisions, build well designed/ behaved co-pilots, not tools that try to do your work.