Brianna White

Administrator
Staff member
Jul 30, 2019
4,656
3,456
Since data is at the heart of AI, it should come as no surprise that AI and ML systems need enough good quality data to “learn”. In general, a large volume of good quality data is needed, especially for supervised learning approaches, in order to properly train the AI or ML system. The exact amount of data needed may vary depending on which pattern of AI you’re implementing, the algorithm you’re using, and other factors such as in house versus third party data. For example, neural nets need a lot of data to be trained while decision trees or Bayesian classifiers don’t need as much data to still produce high quality results.
So you might think more is better, right? Well, think again. Organizations with lots of data, even exabytes, are realizing that having more data is not the solution to their problems as they might expect. Indeed, more data, more problems. The more data you have, the more data you need to clean and prepare. The more data you need to label and manage. The more data you need to secure, protect, mitigate bias, and more. Small projects can rapidly turn into very large projects when you start multiplying the amount of data. In fact, many times, lots of data kills projects.
Clearly the missing step between identifying a business problem and getting the data squared away to solve that problem is determining which data you need and how much of it you really need. You need enough, but not too much. “Goldilocks data” is what people often say: not too much, not too little, but just right. Unfortunately, far too often, organizations are jumping into AI projects without first addressing an understanding of their data. Questions organizations need to answer include figuring out where the data is, how much of it they already have, what condition it is in, what features of that data are most important, use of internal or external data, data access challenges, requirements to augment existing data, and other crucial factors and questions. Without these questions answered, AI projects can quickly die, even drowning in a sea of data.
Getting a better understanding of data
In order to understand just how much data you need, you first need to understand how and where data fits into the structure of AI projects. One visual way of understanding the increasing levels of value we get from data is the “DIKUW pyramid” (sometimes also referred to as the “DIKW pyramid) which shows how a foundation of data helps build greater value with Information, Knowledge, Understanding and Wisdom.
Continue reading: https://www.forbes.com/sites/cognitiveworld/2022/08/20/are-you-making-these-deadly-mistakes-with-your-ai-projects/?sh=5d4c9b256b54
 

Attachments

  • p0008736.m08329.forbes.jpg
    p0008736.m08329.forbes.jpg
    2.8 KB · Views: 45
  • Like
Reactions: Brianna White