The data driven renaissance of artificial intelligence

Artificial Intelligence or to be more specific machine learning has been around for decades, for example the widely popular Support Vector Machine (SVM) was invented in 1963 by Vladimir N. Vapnik and Alexey Ya. Further Neural Nets which is one enjoying a renaissance today originated back in the 1940s and even Deep Learning has been around since the year 2000 and the expression deep learning even longer. Aside from the fundamental algorithms, what has changed in the recent decade that has made machine learning and artificial intelligence really take off? The answer doesn’t lie in the algorithms itself, but the surrounding advances in computing and data.

Cheaper storage and better access to data
Data brings with it two main drivers for evolving machine learning; storing and generating more data. With the rise of the internet and advances in data storage, we now have access to large datasets and cheap storage. For example, a one terabyte hard drive can be bought for £40, this means £0.042 per gigabyte. While back in the 1980s the average price for one gigabyte was $437,500 (≈£329,442). This means that storing terabytes of data is easier and cheaper today than in the prior decades, this also means that we have more food for our algorithms to chew through.

Further, not just is data cheaper to store, but we also generate more of it through our digital footprint and there are more publicly available datasets from websites such as Kaggle, Physionet, Image Net and other projects and libraries. The internet also open up for collaborations and community driven data projects where people create open source datasets. This utilises the power of many people to not just upload data, but also help labelling training data.

The combination of cheap storage and easy sharing of data makes it’s easier than ever to find training data for algorithms. For example, I recently used a dataset consisting of biosignals downloaded from Physionet to train a model that detects fluctuations in heart rate variability, this data set saved me for the tedious task of collecting my own dataset and with that constraining me from doing the project.

Access to more processing power
Processing power is today so cheap and accessible that chewing through big amounts of data is a piece of cake compared to a few years back. For decades computers were huge machines with low processing power and nothing like the small beasts we have today. For example, the model mentioned in the previous paragraph was trained on the CPU of a laptop, a portable computer! Within about one hour my i5 processor tested tons of different algorithms through the genetic machine learning library TPOT. Getting a machine that could run through this even just 15 years ago would’ve been so expensive that it wouldn’t have been possible for me to do my master dissertation. This is even without mentioning the power of GPUs used to train for example deep neural networks.

Accessibility for more people
In addition to the internet and hardware, we also have the cloud and related technologies to run data center. With services such as AWS and Azure, it’s easy to get your hands on vast amounts of computing power for a relatively cheap cost, allowing us to run intense machine learning algorithms. Further, technologies such as Hadoop makes it accessible to analyse huge amounts of data, amounts of data that in theory today’s machines can’t handle alone, but by linking multiple computers together the processing power of our machines are extended beyond their capabilities.

Whats next?
The big question is how the capabilities of machine learning and data will evolve in the next decades, only time and people with access to the secret labs can tell. What we for sure can tell is that Artificial intelligence won’t take over in the imminent future.