Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the td-cloud-library domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /var/www/wp-includes/functions.php on line 6121
In Data-Centric AI, It’s All Eyes On The Data
Thursday, May 15, 2025

Trending Topics

HomeTechnologyIn Data-Centric AI, It’s All Eyes On The Data

In Data-Centric AI, It’s All Eyes On The Data

spot_img

Innovation In Data-Centric AI, It’s All Eyes On The Data Rahul Singhal Forbes Councils Member Forbes Technology Council COUNCIL POST Expertise from Forbes Councils members, operated under license. Opinions expressed are those of the author. | Membership (fee-based) Jul 14, 2022, 07:15am EDT | Share to Facebook Share to Twitter Share to Linkedin Rahul is the Chief Product and Marketing Officer for Innodata , a global data engineering company powering next-generation AI applications.

getty Technology evolves at warp speed, and AI is at the forefront of this evolution. For AI practices to keep pace with technology, data scientists and ML engineers need to be flexible and adaptive. As the AI playing field transitions from model-centricity to data-centricity , data scientists and ML engineers must make a parallel shift.

This article will explore the history leading up to this shift and its implications for data science as a practice. The Ongoing Evolution Of Computation Data-centricity is a natural next step in the evolution of computation and AI. In traditional programming, raw data and a rule-based program are fed into a computer.

The computer runs the program on the data and produces a result. When creating an ML model, however, that process gets inverted. The data and the results are fed into a neural network.

The neural network then generates a model to connect said data with said result. Computer scientists’ roles have followed a similar progression. • Before 2012, a domain expert would translate raw data into rules, and a data engineer would use those rules to write a program.

• From about 2012 to 2018, when model-centric AI predominated, a domain expert would tag or annotate the data, and a data engineer would write a neural net . MORE FROM FORBES VETTED Hop On These Back-To-School Prime Day Deals Before They Expire By Jason R. Rich Forbes Staff The Bestselling Nanit Pro Smart Baby Monitor Is $90 Off For Prime Day By Lex Goodman Contributor • From 2018 forward, in data-centric AI, while a domain expert still tags and annotates the data, the engineer now writes data and error analysis programs to refine and improve the data itself.

Data-Centric AI Takes Center Stage The data-centric AI movement was spearheaded by Andrew Ng, a Stanford professor, co-founder of Google Brain and Coursera, former chief scientist at Baidu and founder of Landing AI. Ng asserts that since AI models have mostly been figured out and high-quality models are now widely available, data scientists no longer need to prioritize writing algorithms or creating models. The success of their work will now hinge on systematically engineering the data, e.

g. , reducing errors, making tags consistent, choosing the best samples, ensuring diversity and optimizing data collection and augmentation. According to Ng in a recent IEEE interview , “The dominant paradigm over the last decade was to download the data set while you focus on improving the code.

” Whereas, now, “the code—the neural network architecture—is basically a solved problem. So for many practical applications, it’s now more productive to hold the neural network architecture fixed, and instead find ways to improve the data. ” Ng contends that ML models can be trained on surprisingly small datasets when the data is of very high quality.

“In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn. ” When done well, data-centric AI yields higher accuracy, lower costs (by reducing the need for SMEs and very large datasets) and expanded access to AI for non-tech industries, such as manufacturing and healthcare.

And ML Engineers Take New Roles What does this mean for data scientists and ML engineers? A radical shift in the nature of their work. Instead of writing algorithms, developing ML models and creating neural networks, teams must now devote their resources to getting the best possible data and using it to its fullest potential. This includes data-optimization functions such as: • Enforcing consistent and learnable annotations.

• Analyzing blind spots, bias, inconsistencies and noisy labels. • Augmenting data where it falls short in volume or diversity. • Finding ways to collect data more efficiently.

• Developing methods to select the best data samples at any given time. There are a variety of tools and techniques that ML engineers can employ to drive data quality. Some examples include: • Consistency: Create detailed annotation guidelines, use double pass workflows with arbitration.

• Learnability: Use taxonomy builders and task-specific workbenches. • Errors and bias: Analyze label and feature distribution, review collisions and frequent errors. • Data augmentation: Add noise, substitute data, use generative techniques to expand currently available data, create/acquire synthetic data and use programmatic annotations.

• Data collection: Run online feedback loops, use ML-centric UI/UX to thoughtfully involve users in data collection. • Data selection: That is, selecting the best examples to annotate in real-time. Use active learning to perform pool-based sampling and stream-based selective sampling.

With these tools and techniques, ML engineers can both streamline and deepen their data curation and preparation process while vastly improving data quality and model accuracy. Tapping The Potential Of Data-Centric AI AI technology never stands still, and businesses need to evolve and grow with it. With the shift from model-centric to data-centric AI, data quality is front and center.

To capitalize on this powerful strategy, data scientists and ML engineers, rather than working directly on ML models, must redirect their efforts to creating impeccable datasets across industries and use cases. The sooner they build a toolkit to do this, the better equipped they will be to power highly accurate models and highly effective AI across the board. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives.

Do I qualify? Follow me on Twitter or LinkedIn . Check out my website . Rahul Singhal Editorial Standards Print Reprints & Permissions.


From: forbes
URL: https://www.forbes.com/sites/forbestechcouncil/2022/07/14/in-data-centric-ai-its-all-eyes-on-the-data/

DTN
DTN
Dubai Tech News is the leading source of information for people working in the technology industry. We provide daily news coverage, keeping you abreast of the latest trends and developments in this exciting and rapidly growing sector.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

spot_img

Must Read

Related News