While everyone has an opinion on the field of Artificial Intelligence (AI), there are still some objective points that one should follow when thinking about building an AI solution.
In this post, I will give the reader some tips and tricks on what to consider when starting an AI-oriented project.
1. Go simple
It might sound obvious to say that someone should always start from the bottom of the problem and as the complexity grows, the solution will also become more complex. However, most of the times people tend to think that AI is the solution to all our problems, and it is so magical that they also believe it is easy to build.
The idea behind going simple is that if we have a business problem in mind that we would like to solve using tools such as Machine Learning (ML) it is easier to start with simple ideas and simple models which would generate us coherent results and good understanding of what we want to do, and then we can start advancing and building bigger solutions. It is good to start drafting some hypothesis with business objectives and try to answer them with the data, so you will have a better picture of what you actually want to build and the capabilities for it.
2. It is a slow process
Building AI solutions is not an easy task. I like to project into people the idea of thinking about machine learning as a set of tools that are exchangeable among fields, and they are used to explain patterns in all kind of situations. Therefore, I strongly believe that a solid understanding of the field/business you are working with is crucial. Therefore, there will be a necessary amount of time spent on research before digging into building any type of solution.
People also tend to think that building AI is a fast process, as we have a lot of open source libraries out there that help us when building a model. We should be aware of malicious packages, specially if we are building a corporate solution.
3. High-quality data is important
It is quite common to listen to people (mainly the ones not that familiar with the field of AI) saying that any data is fine to build your models on. This is completely wrong! Make sure that your data corresponds to the business case you are working with. High-quality data is important and difficult to obtain, so don’t blindly trust any source, make sure that your data is reliable.
4. Getting more data will be one of your tasks
It is funny sometimes to listen complaints about the amount of time that it requires to obtain data. Yes! It is true! Getting data is difficult, and you might find yourself spending tons of time looking for it and making sure it is in a good quality or even building your own dataset for the business purpose. This is not a bad thing, but it is a laborious task no one wants to do but it is sometimes necessary.
5. You might need to transform the data
Transforming the data it is definitely the most creative process of the data science pipeline. It is sometimes more an art than a skill. Data comes in many different forms and it is our job to figure out the easiest way that should be interpreted by the model. As an example, audio data like voice recordings are normally processed and transformed (Fourier transform) to generate the spectogram, which is a visual representation of the spectrum of frequencies of the audio signal as it varies with time, and this data format is easier to be interpreted by the model than just the time sequence.
6. Split to validate
Make sure that you split properly your data with validation test so you can see how well the model performs with new data. The key performance metric for evaluating the model is the accuracy of it. There are other metrics that also help when we want to describe the complete performance of the model, such as confusion matrices, or area under curve.
7. Train different machine learning models
In order to assess better a model, it is good to have other models’ performance for comparison. Of course, requires time.
8. Keep your code clean
Data Science projects can end up being very messy. It is important to have a good and structured environment where the different functions, models, notebooks, etc., are kept together and clean as the project grows.
9. The model should go to production
If you are starting to implement a small idea which will likely get more advanced, you should always keep in mind that if you pretend to generate any type of value out of it, you should be prepared to quickly adapt, start thinking about scalability and easy ways of deployment.
Monitoring the model can be something that a company will tend to throw to their IT services. It is something that should also concern us if we are building specially real-time solutions, even though will take us some time to get a good feedback out of it. We need to monitor to assess the performance.
11. From local to global explainability
Transparency is a key factor when building AI solutions, as we don’t want our model to be misused but generate tremendous business value. Global explainability is about making simpler models more powerful and local explainanility is about making complex models more interpretable.
12. Model impact
The last nut not least point is to make sure that we asses the impact and risks of the model in other contexts such as social life and human rights! Be aware!
Am I missing something?