Machine learning is about learning from data
The first thing to know about machine learning is that it’s often referred to as Artificial Intelligence by the popular media and they make it sound like the computer has become super intelligent and self-aware. In reality, machine learning – or AI – is just about learning patterns in data and we shouldn’t be afraid of being colonized by robots anytime soon.
This said, with the right data to train algorithms there are numerous problems that can be solved and automated, everything from stock trading to uncovering diseases and self-driving cars.
Further, we often hear excitement about fancy machine learning algorithms, but the fundamental building block of machine learning is and will be data and not the algorithm itself; the algorithms can only be as good as the data its trained with. Machine learning itself discovers patterns that are present in the training data, and if its feed with poor data it won’t work well on new data or what’s called to generalize.
The most time-consuming task of machine learning is most often handling data
Most of the hype about machine learning is about tuning models, but in reality to prepare data is the real tedious task. The data first have to be fetched, sometimes we have ready-made well-structured data, but in most cases, the data is a big mess where it has to be fetched from complicated APIs, extracted from a spiderweb of systems or even collected from the ground up. When the data finally is fetched, it most likely has to be transformed and cleaned – joining tables, filling in missing values, removing outliers doing feature extraction to find new features and much more. This process can take days, weeks or months to complete, sometimes the biggest frustration is to nest up just how to extract the data from various systems and approaches.
Deep Learning isn’t the potion that will fix the world – just an ingredient
Deep learning is receiving a lot of attention these days, and it’s being well credited for numerous advances in solving a range of problems. Deep learning especially shines through when we have access to huge amounts of data, as it’s able to find complex patterns in even messy data. Essentially automating some of the tasks done through feature engineering and data transformation – this especially applies to computer vision related problems.
Regardless of this, Deep Learning isn’t some magic that works right out of the box on any problem. The first prerequisite is enough (descent) data, but even with a lot of data, there has to be invested a significant time into harvesting and transforming data – not to even mention the time it takes to tune a deep neural network. Despite the fancy name and problems it solves, it won’t rise to some sort of self-aware super intelligence ready to take over the world.