Machine Learning Introduction
Machine Learning (ML), is the most widely used term linked to Artificial Intelligence (AI) with the widest possible range of application in the business industry. From sales prediction, to the detection of bank frauds, through the optimisation and the automatic planification of processes, ML can tackle a variety of issues in all branches of a company.
From a global point of view, it can be defined as a set of statistical and informatic tools as well as algorithms allowing to automate the construction of a prediction function from a set of data (' the training data '). Two types of issues can be covered:
Regression problems : predict data or build mathematical models (sales prices forecast, optimisation of operators based on the current resources at hand etc...);
Classification problems : determine underlying subgroups in a dataset (identification of objects in a picture, market and client segmentation, etc...).
Three types of methods are to be distinguished :
- Supervised Learning ;
- Unsupervised Learning ;
- Reinforcement Learning.
Machine Learning Supervised Learning
What is it, an overview :
- Prediction of the driving time for a given itinerary :
By considering the weather forecast, the time of the day and the chosen itinerary, a driving time can be estimated by correlating these elements. Indeed, by using a supervised learning approach and by feeding to an algorithm a set of training data including various driving time for different hours of the day, weather forecast and itinerary. The model obtained would therefore make it possible to predict the driving time for new routes according to the parameters provided.
- Robotic sorting on a production line:
A robot equipped with mini visual sensors, is assigned to sort out different types of components on a production line. Once a day, an operator is responsible for controlling the robot’s work : he validates the successes and points out the errors. The robot then uses data linked to his mistakes to improve himself, by adjusting the criteria used in the selection procedure.
- Intelligent trading system
Intelligent trading systems offer real-time analysis of structured data (databases, spreadsheets) and unstructured data (social networks, media news) in order to make trading decisions. Predictions are made using data from past transactions as well as the investor's short and long-term objectives to recommend the best investment strategy. The artificial intelligence learns with previous transactions to reproduce the right investment decisions made previously under similar conditions. There are now investment funds where all transactions are carried out by AIs.
How it works
The purpose of the supervised learning method is to produce a prediction function : a mathematical model aimed to predict a value, the output, based on the attributes of a given data set: the input. The learning phase involves feeding the algorithm a labelled set of data, where the output is known, in order for the algorithm to learn the links between the characteristics of the data and the output.
For example, to predict the real estate price evolution, the output value is a forecast of real estate prices depending on the time of the year, the interest rates and the geographical zone.
The programmer labels the training set of data (the price of a house according to input values : the time of the year, the interest rates etc..).
The algorithm learns the inferred function using the training data set. In other words, he learns the correlation between the price of a house and the input values.
Once the accuracy of the model is verified with a test data set, different than the training data, it can be used.
When to use it & major models :
The major issue of these learning models remains in the fact that the algorithms are dependent on the expert’s labelling of the training data. On the other hand, supervised learning is comprehensible: the causes of the links between the inputs and the outputs can easily be understood by the operator. Nonetheless, it cannot be denied that the operator needs a strong expertise and a deep knowledge of the issue to verify the accuracy of the data set used during the learning phase.
You will find below different examples of supervised learning methods:
Linear regressions are used to link the evolution of a parameter we are looking to optimise and to those of parameters we can easily measure or control. This method makes it possible to predict the budget allocation between different means at hand (tv, radio, newspaper, social networks ...) of a communication campaign using the results of previous campaigns.
Decision trees are visual representations that can be very easily interpreted : an output value is assigned to each data input using elementary decision rules. This can be applied to automate a phase of a recruitment process or to develop a process for suggesting apartments for a real estate agency website.
Example of a decision tree for apartments suggestions. ‘Bedroom Hall Kitchen’ (BHK) is a term used to classify apartments according to their size. In this case, criterias regarding the client’s situation are used to suggest the most adequate apartment in the most effective way. The perk of this type of graphic relies in the easiness of interpretation as well as the speed of execution of the method.
Machine Learning Unsupervised Learning
What is it, an overview : test text
Unsupervised learning is often associated to the term clustering. Indeed, it is one of the most widely used applications in companies, particularly regarding customer segmentation. This process is key to building a sales strategy adapted to the customer in order to anticipate his behaviour. Thus, by analysing easily accessible indicators such as the date of the last purchase, the average time between each purchase, the consumption of a product on sale or not, it is possible to create "customer groups" that will then be targeted by different commercial measures in order to effectively reach them all.
A well-known concrete example :
At first, the customer segmentation of multinational e-commerce companies, such as Amazon, was based on a number of simple elements:
- What a customer has already bought in the past ;
- The items he has in his virtual cart ;
- The items he has rated and liked ;
- What items other clients where interested in or bought.
This type of mathematical sorting - "item-to-item collaborative filtering", uses this algorithm to highly customize the customer's browsing experience and encourage him to return on the website.
- Identification of incorrect documents:
One of the major objectives of AI is to emancipate itself from human intervention for the accomplishment of simple, repetitive and long tasks. Automatic document analysis is at the origin of the pre-processing of documents by algorithms capable of identifying errors in data, proofreading documents before publishing them or extracting information from documents in a particular format, such as financial reports for example. Such processing saves time, improves reliability and significantly reduces costs.
In this case, Unsupervised learning is particularly relevant as identifying errors in documents is a very time-consuming and laborious task. By providing an algorithm with a large number of documents of the same type and means to check the consistency of the information, it can learn to identify the typology of errors in order to apply this reasoning to future documents to be analysed.
How it works :
Unlike supervised learning, unsupervised learning is needed when the algorithm must operate from unlabelled data: the « output » is not known in advance.
The most common unsupervised learning method is clustering, which is used to perform exploratory data analysis to detect hidden patterns or define subgroups in the data. Clusters (data groups) are designed using a similarity measure which is an index that assigns a similarity score between the different subgroups of data through the combination of their attributes. For example, in the case of customer profiles on the web, customers will have a high similarity score if they have purchased the same products and/or belong to the same sociodemographic category.
The algorithm receives an unlabelled set of data (for example: a set of data describing a customer’s « itinerary » on a website) :
The algorithm produces a model from the data set : rules linking the purchase behaviour to the itinerary on the website ;
The algorithm identifies subgroups of data representing similar customer’s behaviour ie types of itinerary on the website.
When to use it & major models :
On paper, unsupervised learning methods seems like the most adequate solutions to most of ML problems, as it dos not need labelled set of data. Yet, these methods can only be used in very specific contexts, such as for example :
• Clustering makes it possible to separate a data set into underlying subgroups with similar characteristics. Clustering, for example, enables an online purchasing platform to separate its customers into logical groups that reflect their consumption habits. This is how advertising is targeted to better correspond to a customer who spends a lot of time comparing offers or to a customer who has an impulsive profile ;
• Outliers detection: in many situations it is essential to detect outliers or abnormal values in the most accurate way. A bank can use machine learning techniques to combine information such as the customer's ID, the customer's consumption habits, the geographic localisation of an internet purchase and the order time, to detect if a banking transaction is fraudulent, and block it if necessary: this is the principle of fraud prevention ;
• Dimension reduction : when a database contains many measurements, it is important to only keep the significant measurements - this is the purpose of dimension reduction. This is how an advertising company obtains a measure of the importance of variables such as gender, age, education level, salary, size, to target specific populations and increase the effectiveness of its strategy. Indeed, it is very likely that the correlation between wages and education level is strong, making it unnecessary to consider both measures.
Machine Learning Reinforcement Learning
What is it, an overview :
• Portfolio Management and Investment strategies
Learning by reinforcement mainly comes from heuristic theories. Therefore, financial applications are possible, especially in portfolio management, like the creation of an investment robot, improving its strategy by repeating investment scenarios:
The robot makes a transaction in a financial portfolio;
He receives a reward if this action brings the machine closer to maximising the total rewards available - for example, the highest total return of the portfolio. The robot then assigns a score to the current transaction and those before the reward. If the reward is positive, then the scores of the transactions leading to it, will be increased so that the robot tends to reproduce such a set of actions;
The algorithm optimises the choice of its actions by correcting itself over time. Thus, when in action, the robot determines at every state, the transaction to be performed by using the history of previous similar situations. He then chooses the transaction that is the most favourable to him, i.e. with the highest score the scores being calculated according to his investment strategy and the rewards obtained in similar situations.
How it works :
Reinforcement learning refers to objective-based algorithms that learn how to achieve a complex objective, called a score. The latter is a mathematical index defined in such a way as to reflect the relevance of the actions carried out. For example, in order to teach an algorithm to play chess, it is essential to determine a scoring function for each situation to decide whether or not it is favourable to the player.
The advantage of this type of method is being able to quickly compare different situations by evaluating a single index: the score. The scoring function depends on the problem, and its complexity determines the effectiveness of the method.
Algorithms using reinforcement learning can start from a blank sheet, and under good conditions, achieve outstanding performances. These algorithms are penalised when they make wrong decisions and rewarded when they make right ones, hence the name by reinforcement.
When to use it & major applications:
To use these methods, it is necessary to obtain a rating and evaluation at each step of the learning process.
• In Game theory and multi-agent interaction, reinforcement learning has been widely used to enable software play. A recent example would be Google's DeepMind which was able to beat the top-ranked Go player in the world and later, the best rated chess program Komodo.
• Robotics - robots have often relied on learning reinforcement to perform better in the work environment. Learning reinforcement has the advantage of being a scalable solution for robots that may have to deal with unknown or constantly changing environments. For example, robots using sensors to analyse their 3D environment such as mechanical arms like the Smart Tissue Autonomous Robot can assist surgeons in operations where a human being's precision is insufficient.
• Vehicle Navigation - the more information the vehicle receives, better they navigate on the road. Indeed, autonomous driving is only possible if the environment is correctly apprehended. However, it is impossible to simulate all the situations that a vehicle may encounter on the road. Hence, it is in this context that learning by reinforcement appears indispensable - the use of multiple sensors provide a large amount of data to analyse. The algorithms by reinforcement decipher the different actors of the environment: other vehicles, living beings or traffic signs. They can then correlate the current situation with situations previously encountered to make the necessary choice during the navigation. Experts estimate that it will take about ten billion kilometres of algorithm training to ensure the safety required to put these vehicles into service.
Machine Learning Deploy algorithms in business
5 major constraints:
The deployment : move to full scale on a distributed environment;
The robustness: supporting real-world, inconsistent and incomplete data;
The transparency: automatically detect a deterioration in the application's performance as the learning process progresses;
The adequacy to the available skills: do not develop AI solution that requires too much expertise for the implementation and optimization;
Proportionality: the time and money invested in a ML algorithm or its optimisation must be proportional to the gain obtained.
Deep Learning Introduction
Whether it is for Siri, Cortana, the facial recognition software of Facebook - deep learning is an integral part of the most advanced AI applications. It is a subcategory of Machine Learning using mainly artificial neural network - algorithms inspired by the human brain able to process large amounts of data.
Thus, in the same way that we learn from experience, deep learning algorithms perform a task repeatedly, each time adjusting it a little to improve the result. The term "deep learning" refers to the several (deep) layers of neural networks that allow learning. Problems that require thinking can usually be solved with deep learning methods.
Faithfully modelled on the human brain, a neural network consists of thousands or even millions of simple processing nodes that are highly interconnected. Most of today's neural networks are organized into layers of nodes where the data moves in one direction only. An individual node can be connected to several nodes in the lower layer, from where it receives the data, and to several nodes in the upper layer, where it sends the data.
Deep Learning Example
Image classification : A neural network takes an image as an input and using successive layers, extracts increasingly refined features in the image to assign an image to a class. For example, if a neural network tries to identify the nature of a vehicle, the first layer can first detect the number of wheels, then a second the size of the vehicle, a third the brand of the vehicle, and so on until it is able to accurately identify the vehicle in the image.
Example of a convolutional neural network: the network takes many vehicle images as an input to learn the characteristics corresponding to each class of the initial dataset. Once the network is trained, it is able to identify new vehicles as they will have characteristics related or similar to those previously encountered.
Deep Learning: The training phase
For each of its incoming connections, a node will assign a number called weight. When the network is active, the node receives an encrypted data on each of its input connections and multiplies it by the weight associated to the input. It then adds up the resulting products, giving a unique number.
If the number exceeds the threshold value, the node "pulls", which in current neural networks generally means sending the number calculated by the node along all its outgoing connections. Otherwise, the node does not transmit anything to the next level.
During the training phase of a neural network, all the initial weights and thresholds are set to random values. The training data is transmitted from layer to layer, being multiplied and added up as explained above, until they reach the output, radically transformed. During this training, weights and thresholds are continuously adjusted until the training data with the same labels give similar results.
Example of a deep neural network on the prediction of a flower species according to its features (length and width of petal etc...):
Inputs are measurements on different flowers, and outputs correspond to the species to which the flower belongs. In the middle, we find the "hidden" layers of the network that correspond to the different levels of analysis. Training the network means determining the coefficients to be chosen to optimize the success of the algorithm by feeding it with images for which we know the answer. Then, the algorithm can find the answer from any image, without any human intervention.
Deep Learning: Structure of a neural network
There are several types of artificial neural networks. These types of networks are built based on mathematical operations and a set of parameters necessary to determine the output. The chosen structure (number of layers, number of nodes and links between them) will be different depending on the problem being addressed and the amount of data available.
Recurrent neural networks are adapted to situations where the data are not independent. This is the case with speech recognition - for example: predicting the next word is easier if you know the beginning of the sentence. Knowing the words 'how' and 'will', the network assigns a high probability to the word 'you' to simplify the detection of the next word.
Convolutional neural networks are structured in successive layers - each layer allowing the extraction of increasingly precise features concerning the input data. They are particularly suitable for unstructured data, such as images for example. Moreover, they can also be used to assist in medical diagnosis by analysing the results of imaging devices or to identify failures in a production line by using control cameras.
Big Data: Giving value to the data
The term ' Big Data ' refers to technologies that allow companies to quickly analyse a very large volume of data and obtain a synoptic view. By combining storage integration, predictive analytics and applications, Big Data saves time interpreting data in an effective and qualitative way.
Despite the term 'Big data' being relatively new, the gathering and storage of large amounts of information for analysis purposes is centuries old. The concept gained momentum in the early 2000s when the industry analyst Doug Laney articulated the now common definition of large data as the 'three Vs':
Volume. Organisations collect data from a variety of sources, including business transactions, social networks and information from sensors or machine data. In the past, storage would have been a problem, but new technologies such as Hadoop, now makes it possible.
Velocity. Data is flowing at an unprecedented rate and must be processed at the right moment. RFID tags, sensors and smart meters make it necessary to process data streams in near-real time.
Variety. Data are available in all types of formats - from structured and digital data in traditional databases, to unstructured text documents: emails, videos, audio files, stock market data and financial transactions.
Big Data one field, many expertises
Big Data is ultimately the crossover between all disciplines of data science. Data collection, analysis and visualisation - implementation of Machine or Deep Learning models and optimisation of calculation architectures. The expertise is as extensive as it is diverse:
- Artificial Intelligence (AI Experts);
- Analysis and data visualisation (Data analyst);
- Data mining and real-time processing (Data minor);
- High Performance Computing, HPC (HPC expert).
For example, to develop intelligent trading applications, the combination of the different fields of expertise that Big Data represents is essential to guarantee the desired efficiency. Collecting financial data in real time, analysing them using AI algorithms on parallelised computing architectures to provide synthetical dashboards presenting the results require a diverse panel of experts at each step of the value chain.