K
Kathleen Martin
Guest
Synthetic data — the generation of artificial images to train AI and computer vision — will be key to building out a future metaverse.
Why it matters: AI has long been trained on images — including human faces — captured from the real world, but doing so can create serious privacy concerns.
Between the lines: Increasingly, privacy concerns will lead companies to move from capturing real faces and other images to train AI as they transition to using synthetically generated data.
Why it matters: AI has long been trained on images — including human faces — captured from the real world, but doing so can create serious privacy concerns.
- Using synthetic data instead can help sidestep that issue, though it brings new worries about accuracy and authenticity.
Between the lines: Increasingly, privacy concerns will lead companies to move from capturing real faces and other images to train AI as they transition to using synthetically generated data.
- Tel Aviv-based synthetic data company Datagen does high-quality level digital scans and motion capture of real people and objects and then uses AI to generate realistic but not real versions.
- Gartner predicted recently that by 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated.
- Early computer vision systems were often trained on datasets taken from the internet that were disproportionately white and male, which meant they were less accurate in recognizing faces from other races and genders.
- With synthetic data, "you can incorporate the real distributions of the real world, so there's no bias among age, gender and more," says Gil Elbaz, co-founder and CTO of Datagen.
- Many of the same tools used to generate synthetic faces for AI training could also be used to create convincing deepfakes, though Elbaz notes technical tools like smart contracts could be used to separate synthetics from fakes.