Data labelling is a key process in machine learning. It facilitates in training machine learning models and accelerates the development of artificial intelligence. Data annotation is frequently outsourced to data labelling firms, which annotate images, videos, audios and text language. In addition to providing outsourcing data annotation services to firms, data labelling companies have also collaborated and partnered with firms to enable research and innovation in the field of data annotation and AI. This article presents the top five data labelling projects of 2021.
Scale AI and Oxford University’s Reddit Data Set
Scale AI, a data annotation platform, has collaborated with Oxford University to build a comprehensive dataset on online debates and discourse. Natural language processing is currently in its nascent stage, and NLP models often struggle with understanding the context of online exchanges. For example, the NLP models fail to process slang, sarcasm, context-specific jokes, and diverse online interactions by default.
Scale AI and Oxford University created a dataset, ‘Debagreement’, containing comment-reply interactions across five subreddits: Democrats, Republicans, Black Lives Matter, Brexit, and Climate. Each comment-reply interaction is annotated with “agree,” “disagree,” “neutral,” or “unsure” labels by at least three raters, allowing the ML model to detect the stance of Redditors in online discourse. The collaborative project has been viewed as the first step in training socially aware language models.
Continue reading: https://analyticsindiamag.com/5-data-labelling-projects-that-impacted-the-ai-industry-the-most/
Scale AI and Oxford University’s Reddit Data Set
Scale AI, a data annotation platform, has collaborated with Oxford University to build a comprehensive dataset on online debates and discourse. Natural language processing is currently in its nascent stage, and NLP models often struggle with understanding the context of online exchanges. For example, the NLP models fail to process slang, sarcasm, context-specific jokes, and diverse online interactions by default.
Scale AI and Oxford University created a dataset, ‘Debagreement’, containing comment-reply interactions across five subreddits: Democrats, Republicans, Black Lives Matter, Brexit, and Climate. Each comment-reply interaction is annotated with “agree,” “disagree,” “neutral,” or “unsure” labels by at least three raters, allowing the ML model to detect the stance of Redditors in online discourse. The collaborative project has been viewed as the first step in training socially aware language models.
Continue reading: https://analyticsindiamag.com/5-data-labelling-projects-that-impacted-the-ai-industry-the-most/