photo-1525338078858-d762b5e32f2c

Why Companies shouldn’t invest in AI/ML yet

Artificial Intelligence

Why Companies shouldn’t invest in AI/ML yet

Samathur Li Kin-kan who is a famous Hong Kong-based Real estate investor came in news for rather something which he would not wish for. This was when Samathur Li Kin-kan’s famous investment in AI engine deployed on some of the fanciest technical specs of supercomputer failed miserably. “K1”, the AI-based Super Computer managed part of his fortune intending to boost his funds. Instead, the  Artificial Intelligence (AI) engine regularly lost up to US$20 million daily, according to a Bloomberg story. There are countless such examples where AI has failed miserably especially in the last 2-3 years. Organizations are in the race to implement AI and to see the return on their investments. How many of them are successful? It is shocking to see so many POCs being implemented but rarely these POCs seeing the light by end of the day to become full-fledged Product.

Let’s dig deeper into this and understand what goes into this glamorous Artificial Intelligence (AI) based systems. Let us propose an example, an AI engine needs to be deployed for defect detection in the packaging industry. In this example, the task for AI engine is to ensure that the Machine Learning based system can identify all the defective packages which are around to say 200 defective packages in every 5000 products. The process to build an accurate system will solely rely on the training data set that typically a Data Engineer will feed into the Machine learning algorithm to be able to teach what the AI system has to look for. To further make it clear, what becomes one of the most critical parts of any AI system to be accurate and performant in its job is how well it is trained. Training data is the key ingredient of any AI system in the world. Data Engineers who thrive to work with these training data sets understand clearly that the training data sets are essentially going to be the set of examples that is going to teach and define any such Machine Learning algorithm to “Understand” what set of patterns to look for to enable AI system to perform and execute real-life actions. In this example, the algorithm needs to be trained to recognize the difference between how a fine quality packaged product is different from anything which is not up to the mark i.e, defective packages which can vary based on their packaging defects, misplacement of labels, color variation, design issues, disoriented patterns, etc. The Machine Learning algorithm then uses all of these examples to figure out what features are important in distinguishing the various classes for example defective versus non-defective product class. Based on this training, the algorithm can then look for similar features in future data to classify it. After the data engineers find an efficient way to collect and train the AI System with this data, the data scientist can then work with the subject matter experts to create a neural network necessary to solve the problem.

The role of the data engineer does not stop at just collecting the photos of defects and non-defects, they might find that the factory is also collecting additional data manually or from existing devices, hardware, sensors, etc which can be combined with the existing trained data set to start helping predict why and how defects are created and can the factory optimize their current workflow to reduce these defects.

Hence, training data formulate the backbone of the entire AI and ML Systems without that it is not possible to train a machine that learns from humans and predict for humans. Typically when we see AI engines failing, it is either of these 2 reasons:
– The AI engine is sub-optimally trained or wrongly trained
– There is an absence of enough training data set and organizations have failed to implement a successful infrastructure to be able to build data pipelines which can be later fed into the AI systems and enhance their performance.

Data engineers and data engineering automation tools are the most significant pillars to make any AI system triumph or breakdown. Typically these Data Engineers help organizations create robust production data pipelines to feed into machine learning models increasing amounts of disparate data they require. The role of Data engineers and data engineering automation tools is to amplify the best practices in data engineering and management to support machine learning.  Data Engineers’ sole goal is to focus on collecting, cleansing, transforming, and governing “new” and big data for analysis and training. Enterprises and Startups may have used traditional AI systems based on human logic and models but now they must adopt automated, usable & classified data discovery practices against the enormous amount of data floating into their business transactions.

Best Practices Lead to Better Results. For any Artificial Intelligence-based system, more is better — having more and diversely classified training data brings more accurate results. Training Data sources are internal or external to the organization. Without data engineering, there is no data. Without data, there is no machine learning and no AI. Data science needs data upon which to apply algorithms. More Data Means Better Predictions. On the contrary, low-quality data leads to low-quality machine learning results. Hence seeking out tools that can ensure standardization and accuracy is always of key importance.  

Data is the new fuel for companies in the 20th Century. If you’re not making moves based on the data you have, you’re missing out. Data Practitioners use data to solve business problems, disrupt existing business models and innovate new products. We need to bring in automation systems to establish data lakes and real-time data streams. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions. We need data engineering tools and a team to build pipelines that fill up these lakes.

So there is a reason why Elon Musk states we should be worried about AI; yet, Bill Gates says it will make life easier. With No Data Engineering, You Have No AI Systems.

In the world of Artificial Intelligence, a lack of data and the ability to manage, classify and leverage these data becomes a bottleneck for AI systems to perform with real-world problems with high accuracy and precision. As data scientists like to say: Data engineers are the building blocks that enable all the components of the AI ecosystem to work together. They accomplish this by creating and maintaining efficient databases,  building data pipelines, monitoring and managing all the data systems (scalability, security, etc), implementing data scientists’ output in a reliable and scalable manner.

If you don’t want your company to be left behind, make sure you’re paying attention to your data engineering now so that you can move on to advanced analytics and AI Systems before it’s too late.