K
Kathleen Martin
Guest
This article was contributed by Frederik Bussler, consultant, and analyst.
AI fuels modern life — from the way we commute to how we order online, and how we find a date or a job. Billions of people use AI-powered applications every day, looking at just Facebook and Google users alone. This represents the tip of the iceberg when it comes to AI’s potential.
OpenAI, which recently made headlines again for offering general availability to its models, uses labeled data to “improve language model behavior,” or to make its AI fairer and less biased. This is an important example, as OpenAI’s models were long reprimanded for being toxic and racist.
Many of the AI applications we use day-to-day require a particular dataset to function well. To create these datasets, we need to label data for AI.
Why does AI need data labeling?
The term artificial intelligence is somewhat of a misnomer. AI is not actually intelligent. It takes in data and uses algorithms to make predictions based on that data. This process requires a large amount of labeled data.
ADVERTISEMENT
This is particularly the case when it comes to challenging domains like healthcare, content moderation, or autonomous vehicles. In many instances, human judgment is still required to ensure the models are accurate.
Consider the example of sarcasm in social media content moderation. A Facebook post might read, “Gosh, you’re so smart!” However, that could be sarcastic in a way that a robot would miss. More perniciously, a language model trained on biased data can be sexist, racist, or otherwise toxic. For instance, the GPT-3 model once associated Muslims and Islam with terrorism. This was until labeled data was used to improve the model’s behavior.
As long as the human bias is handled as well, “supervised models allow for more control over bias in data selection,” a 2018 TechCrunch article stated. OpenAI’s newer models are a perfect example of using labeled data to control bias. Controlling bias with data labeling is of vital importance, as low-quality AI models have even landed companies in court, as was the case with a firm that attempted to use AI as a screen reader, only to have to later agree to a settlement when the model didn’t work as advertised.
The importance of high-quality AI models is making its way into regulatory frameworks as well. For example, the European Commission’s regulatory framework proposal on artificial intelligence would subject some AI systems to “high quality of the datasets feeding the system to minimize risks and discriminatory outcomes.”
Continue reading: https://venturebeat.com/2021/12/07/data-labeling-will-fuel-the-ai-revolution/
AI fuels modern life — from the way we commute to how we order online, and how we find a date or a job. Billions of people use AI-powered applications every day, looking at just Facebook and Google users alone. This represents the tip of the iceberg when it comes to AI’s potential.
OpenAI, which recently made headlines again for offering general availability to its models, uses labeled data to “improve language model behavior,” or to make its AI fairer and less biased. This is an important example, as OpenAI’s models were long reprimanded for being toxic and racist.
Many of the AI applications we use day-to-day require a particular dataset to function well. To create these datasets, we need to label data for AI.
Why does AI need data labeling?
The term artificial intelligence is somewhat of a misnomer. AI is not actually intelligent. It takes in data and uses algorithms to make predictions based on that data. This process requires a large amount of labeled data.
ADVERTISEMENT
This is particularly the case when it comes to challenging domains like healthcare, content moderation, or autonomous vehicles. In many instances, human judgment is still required to ensure the models are accurate.
Consider the example of sarcasm in social media content moderation. A Facebook post might read, “Gosh, you’re so smart!” However, that could be sarcastic in a way that a robot would miss. More perniciously, a language model trained on biased data can be sexist, racist, or otherwise toxic. For instance, the GPT-3 model once associated Muslims and Islam with terrorism. This was until labeled data was used to improve the model’s behavior.
As long as the human bias is handled as well, “supervised models allow for more control over bias in data selection,” a 2018 TechCrunch article stated. OpenAI’s newer models are a perfect example of using labeled data to control bias. Controlling bias with data labeling is of vital importance, as low-quality AI models have even landed companies in court, as was the case with a firm that attempted to use AI as a screen reader, only to have to later agree to a settlement when the model didn’t work as advertised.
The importance of high-quality AI models is making its way into regulatory frameworks as well. For example, the European Commission’s regulatory framework proposal on artificial intelligence would subject some AI systems to “high quality of the datasets feeding the system to minimize risks and discriminatory outcomes.”
Continue reading: https://venturebeat.com/2021/12/07/data-labeling-will-fuel-the-ai-revolution/