Why Data Collected Matters Most For AI

avatar

1.jpg

Artificial intelligence and its associated technologies couldn't have arrived at a better time as we face some of the most difficult and complex problems that we can't possibly solve on our own. Artificial Intelligence (AI) in basic words means intelligence demonstrated by machines that can mimic cognitive functions of humans.

Machine learning (ML) is a subset of artificial intelligence that involves using algorithms and statistical models to allow computers to 'learn' from data, provide deep insights and improve based on past experiences, similar to how we humans operate. Even though the concepts of AI and ML have existed since the 1950s, it is only in recent times that they have gained prominence.

There are many reasons for this but one of the most important ones is data. Data is the lifeblood of AI and ML and it is only in the recent times, owing to the popularity of the internet, that the necessary volumes and types of data has existed.

There are several industries today that gain the competitive advantage by integrating machine learning systems into their operations. They are extensively used in agriculture, banking, marketing, search engines, healthcare, speech recognition and a plethora of other places. The analysis and predictions provided by these systems are invaluable and the whole process starts with data.

Relevant Data - Building A Strong Foundation

Since data is so important, proper attention should be given right from the very first stage of data collection. For this, a proper data infrastructure needs to be designed. Since an ML algorithm is only as good as the quality of data that we feed into it, it is very important to ensure that the data being collected is of a good quality as bad data can lead to insights that are not actionable, results that are misleading and it will waste valuable time and resources.

To understand what makes a good data, we can take an example of the agriculture sector. The goal of agriculture is to maximise productivity while ensuring minimal waste. To ensure this, data needs to be collected about as many relevant variables as possible. This can include data on soil types, conditions and fertility, weather data like temperature, humidity, wind speed and rainfall, seed quality and variety, yield, types of crop protection chemicals, types of diseases and a whole host of others.

All of this data needs to be highly accurate and in a common format which can be imported into a common system so that relevant models can be built to use them. For example, disease detection is really important to minimise crop loss. For this, hundreds of thousands of photos of diseased plants serve as the right data as it trains the ML to recognise the type of disease and its severity through pattern recognition so that pesticides can be applied in a timely and targeted way which further reduces the resources required.

A recent HR fiasco at Amazon can provide one more example about the importance of right data. In late 2018, it was reported that Amazon had been using machine learning to screen the resumes of job applicants. The ML system shortlisted mostly male candidates because it was trained on the historical pool of data on technical positions held at the company which were mostly males. So, the ML system replicated that in its results as well, leading to a bias against female candidates.

Therefore it is important to come up with a proper data strategy that will ensure smooth, efficient and accurate data collection which will in turn help machine learning algorithms to do what they do best. Analyse data, learn from them and provide ever better results to help achieve the goals of a person or organisation.



0
0
0.000
2 comments
avatar

To listen to the audio version of this article click on the play image.

Brought to you by @tts. If you find it useful please consider upvoting this reply.

0
0
0.000