Global Introduction


One of the challenges facing organisations globally is the incorporation of Artificial Intelligence in a way that provides people with the opportunity to succeed in the face of changing workplace conditions.

MACHINE LEARNING AND DEEP LEARNING

  • There are a range of approaches when it comes to AI (specifically MACHINE LEARNING), including, among others, supervised, unsupervised, and reinforcement learning
  • The machines, by using these approaches, are able to build models through the use of learning algorithms, in order to make predictions. These techniques are crucial to unlocking AI’s potential value. Recent advances in DEEP LEARNING are proving to be particularly useful for organisations, as they are providing more accurate classifications and predictions than traditional supervised learning techniques

 

LFZ Partners is composed of different experts who will help to implement these machine learning or deep learning approaches

 

THE IMPORTANCE OF DATA

In order for a machine learning algorithm to learn, it requires data to process; and the more it has, the better it is able to learn. There are different ways to get machines to learn, depending on the type of task you are trying to accomplish, and the amount and type of data that you have available. It is therefore necessary to identify not only the right amount of data, but also the right type of data.

  • Data quality

There are two aspects that determine the power of data: quantity (how much data you have) and quality (how accurate the data represents what you seek to observe).

Quality relates to how much error is contained in the data. The data could be biased, or inaccurate, both of which are examples of errors in data. Or it could contain measurement errors, which makes it an inaccurate reflection of the true values.

  • Types of Data

We can distinguish “structured” and “unstructured” data.

  •  Structured data

The term “structured data” refers to data that has a predefined format and length, and is typically stored in a database. Structured data is expressed in clearly defined variables that fit neatly into rows and column.

Structured data is consequently characterised as being well organised and easily processed by a machine. It works best with a learning approach known as supervised learning. supervised learning refers to a machine learning algorithm learning to map an input to an output by using labelled examples.

  • Unstructured data

Unstructured data refers to data that “does not have a pre-defined data model” with “no identifiable structure”, which includes images, free text, and audio.

Unstructured data is characterised as disorganised and unsystematic, and works best with a machine learning approach known as unsupervised learning. Unsupervised learning refers to an algorithm learning to organise data into groups (or clusters) according to similarity (or proximity) on one of the data attributes. These dimensions can be defined automatically by the algorithm or specified by a user.

In the unsupervised learning, the machine does not know what to look for, so it starts classifying the data into clusters based on similarities.  Unsupervised learning, on its own, finds a way to distinguish trends with a high level of accuracy by searching for similarities in the input data in order to produce an output.

Machine Learning Introduction


Machine Learning (ML), is the most widely used term linked to Artificial Intelligence (AI) with the widest possible range of application in the business industry. From sales prediction, to the detection of bank frauds, through the optimisation and the automatic planification of processes, ML can tackle a variety of issues in all branches of a company.

From a global point of view, it can be defined as a set of statistical and informatic tools as well as algorithms allowing to automate the construction of a prediction function from a set of data (' the training data ').

Machine learning is programmed to learn from data or maximise performance based on past experience, in order to perform a task more efficiently without explicit instruction. Machine learning is therefore a key component of AI. As a field of study, machine learning sits at the interface of statistics and computer science, and seeks to enable machines to autonomously process and learn from data. To achieve this, a machine uses algorithms that process and learn from data in a meaningful way. As the algorithms absorb data, they are able to produce predictive models that can be continuously refined with the input of additional data.

Machine learning algorithms identify patterns and regularities in data sets by building a mathematical model. In order to identify patterns, machine learning requires a lot of example data and sufficient computing power to run the learning methods on that much data, bootstrapping the necessary algorithms from data (Refer to the chapter on data). However, to understand how these approaches learn from data, it is important to build an understanding of the types of data sets available and the repercussions they may have for the choice of learning approach.

Two types of issues can be covered:

  • Regression problems : predict data or build mathematical models (sales prices forecast, optimisation of operators based on the current resources at hand etc...);
  • Classification problems : determine underlying subgroups in a dataset (identification of objects in a picture, market and client segmentation, etc...).

Three types of methods are to be distinguished :

  • Supervised Learning ;
  • Unsupervised Learning ;
  • Reinforcement Learning.

Machine Learning Supervised Learning


What is it ? an overview :

  • Prediction of the driving time for a given itinerary : 

By considering the weather forecast, the time of the day and the chosen itinerary, a driving time can be estimated by correlating these elements. Indeed, by using a supervised learning approach and by feeding to an algorithm a set of training data including various driving time for different hours of the day, weather forecast and itinerary. The model obtained would therefore make it possible to predict the driving time for new routes according to the parameters provided.

  • Robotic sorting on a production line: 

A robot equipped with mini visual sensors, is assigned to sort out different types of components on a production line. Once a day, an operator is responsible for controlling the robot’s work : he validates the successes and points out the errors. The robot then uses data linked to his mistakes to improve himself, by adjusting the criteria used in the selection procedure.

  • Intelligent trading system 

Intelligent trading systems offer real-time analysis of structured data (databases, spreadsheets) and unstructured data (social networks, media news) in order to make trading decisions. Predictions are made using data from past transactions as well as the investor's short and long-term objectives to recommend the best investment strategy. The artificial intelligence learns with previous transactions to reproduce the right investment decisions made previously under similar conditions. There are now investment funds where all transactions are carried out by AIs.

How does it work ?

The purpose of the supervised learning method is to produce a prediction function : a mathematical model aimed to predict a value, the output, based on the attributes of a given data set: the input.

The learning phase involves feeding the algorithm a labelled set of data, where the output is known, in order for the algorithm to learn the links between the characteristics of the data and the output.

In this approach, the machine is provided with two data sets: a training set and a test set.

  • In a training set, the inputs and corresponding outputs (i.e. labelled examples) are given to the program.
  • In a test set, only the inputs (i.e. unlabeled examples) are given to the program, and the correct outputs are then compared to the answers the algorithm produces

A training set contains labelled examples, which the computer program uses to determine patterns in the input and matching output, in order for it to be able to identify unlabeled examples from a test set with the highest possible accuracy. The purpose of this learning approach is for the machine to develop a rule that allows it to categorize the new examples from the test set by processing the examples it was given in the training set.

For example, to predict the real estate price evolution, the output value is a forecast of real estate prices depending on the time of the year, the interest rates and the geographical zone.

  1. The programmer labels the training set of data (the price of a house according to input values : the time of the year, the interest rates etc..).

  2. The algorithm learns the inferred function using the training data set. In other words, he learns the correlation between the price of a house and the input values.

  3. Once the accuracy of the model is verified with a test data set, different than the training data, it can be used.

When to use it? & major models 

The major issue of these learning models remains in the fact that the algorithms are dependent on the expert’s labelling of the training data. On the other hand, supervised learning is comprehensible: the causes of the links between the inputs and the outputs can easily be understood by the operator. Nonetheless, it cannot be denied that the operator needs a strong expertise and a deep knowledge of the issue to verify the accuracy of the data set used during the learning phase.

You will find below different examples of supervised learning methods:

  • Linear regressions are used to link the evolution of a parameter we are looking to optimise and to those of parameters we can easily measure or control. This method makes it possible to predict the budget allocation between different means at hand (tv, radio, newspaper, social networks ...) of a communication campaign using the results of previous campaigns.

  • Decision trees are visual representations that can be very easily interpreted : an output value is assigned to each data input using elementary decision rules. This can be applied to automate a phase of a recruitment process or to develop a process for suggesting apartments for a real estate agency website.

 

Example of a decision tree for apartments suggestions. ‘Bedroom Hall Kitchen’ (BHK) is a term used to classify apartments according to their size. In this case, criterias regarding the client’s situation are used to suggest the most adequate apartment in the most effective way. The perk of this type of graphic relies in the easiness of interpretation as well as the speed of execution of the method.

 

 

Machine Learning Unsupervised Learning


What is it ? an overview : 

Unsupervised learning is often associated to the term clustering. Indeed, it is one of the most widely used applications in companies, particularly regarding customer segmentation. This process is key to building a sales strategy adapted to the customer in order to anticipate his behaviour. Thus, by analysing easily accessible indicators such as the date of the last purchase, the average time between each purchase, the consumption of a product on sale or not, it is possible to create "customer groups" that will then be targeted by different commercial measures in order to effectively reach them all.

A well-known concrete example : 

At first, the customer segmentation of multinational e-commerce companies, such as Amazon, was based on a number of simple elements: 

  • What a customer has already bought in the past ;
  • The items he has in his virtual cart ;
  • The items he has rated and liked ;
  • What items other clients where interested in or bought.

This type of mathematical sorting - "item-to-item collaborative filtering", uses this algorithm to highly customize the customer's browsing experience and encourage him to return on the website. 

  • Identification of incorrect documents:

One of the major objectives of AI is to emancipate itself from human intervention for the accomplishment of simple, repetitive and long tasks. Automatic document analysis is at the origin of the pre-processing of documents by algorithms capable of identifying errors in data, proofreading documents before publishing them or extracting information from documents in a particular format, such as financial reports for example. Such processing saves time, improves reliability and significantly reduces costs. 

In this case, Unsupervised learning is particularly relevant as identifying errors in documents is a very time-consuming and laborious task. By providing an algorithm with a large number of documents of the same type and means to check the consistency of the information, it can learn to identify the typology of errors in order to apply this reasoning to future documents to be analysed. 

How does it work ?

Unlike supervised learning, unsupervised learning is needed when the algorithm must operate from unlabelled data: the « output » is not known in advance.
The most common unsupervised learning method is clustering, which is used to perform exploratory data analysis to detect hidden patterns or define subgroups in the data. Clusters (data groups) are designed using a similarity measure which is an index that assigns a similarity score between the different subgroups of data through the combination of their attributes. For example, in the case of customer profiles on the web, customers will have a high similarity score if they have purchased the same products and/or belong to the same sociodemographic category.

The algorithm receives an unlabelled set of data (for example: a set of data describing a customer’s « itinerary » on a website) :

  1. The algorithm produces a model from the data set : rules linking the purchase behaviour to the itinerary on the website ;

  2. The algorithm identifies subgroups of data representing similar customer’s behaviour ie types of itinerary on the website.

When to use it? & major models :

On paper, unsupervised learning methods seems like the most adequate solutions to most of ML problems, as it dos not need labelled set of data. Yet, these methods can only be used in very specific contexts, such as for example :

• Clustering makes it possible to separate a data set into underlying subgroups with similar characteristics. Clustering, for example, enables an online purchasing platform to separate its customers into logical groups that reflect their consumption habits. This is how advertising is targeted to better correspond to a customer who spends a lot of time comparing offers or to a customer who has an impulsive profile ;

• Outliers detection: in many situations it is essential to detect outliers or abnormal values in the most accurate way. A bank can use machine learning techniques to combine information such as the customer's ID, the customer's consumption habits, the geographic localisation of an internet purchase and the order time, to detect if a banking transaction is fraudulent, and block it if necessary: this is the principle of fraud prevention ;

• Dimension reduction : when a database contains many measurements, it is important to only keep the significant measurements - this is the purpose of dimension reduction. This is how an advertising company obtains a measure of the importance of variables such as gender, age, education level, salary, size, to target specific populations and increase the effectiveness of its strategy. Indeed, it is very likely that the correlation between wages and education level is strong, making it unnecessary to consider both measures.

 

Machine Learning Reinforcement Learning


What is it ? an overview :

• Portfolio Management and Investment strategies

Learning by reinforcement mainly comes from heuristic theories. Therefore, financial applications are possible, especially in portfolio management, like the creation of an investment robot, improving its strategy by repeating investment scenarios:

  1. The robot makes a transaction in a financial portfolio;

  2. He receives a reward if this action brings the machine closer to maximizing the total rewards available - for example, the highest total return of the portfolio. The robot then assigns a score to the current transaction and those before the reward. If the reward is positive, then the scores of the transactions leading to it, will be increased so that the robot tends to reproduce such a set of actions;

  3. The algorithm optimizes the choice of its actions by correcting itself over time. Thus, when in action, the robot determines at every state, the transaction to be performed by using the history of previous similar situations. He then chooses the transaction that is the most favorable to him, i.e. with the highest score the scores being calculated according to his investment strategy and the rewards obtained in similar situations.
     

How does it work ?

With reinforcement learning, a machine is not told how to process the data, or which actions to take, but instead learns by examining the outcomes that follow each behavior. Thus, this learning approach requires the machine to explore despite the uncertainty that it faces.

For a machine to obtain a positive outcome, it needs to give preference to past behavior that produced a positive outcome. However, for the machine to determine what behavior will produce a positive outcome, it needs to try behavior that it has not tried in the past. In this way, reinforcement learning is the closest of the three learning approaches to human learning.

Reinforcement learning refers to objective-based algorithms that learn how to achieve a complex objective, called a score. The latter is a mathematical index defined in such a way as to reflect the relevance of the actions carried out. For example, in order to teach an algorithm to play chess, it is essential to determine a scoring function for each situation to decide whether or not it is favorable to the player.

  • Advantages of this method

The advantage of this type of method is being able to quickly compare different situations by evaluating a single index: the score. The scoring function depends on the problem, and its complexity determines the effectiveness of the method.

Algorithms using reinforcement learning can start from a blank sheet, and under good conditions, achieve outstanding performances. These algorithms are penalized when they make wrong decisions and rewarded when they make right ones, hence the name by reinforcement.

Reinforcement learning offers great promise in various settings, especially if the following three aspects are true: The context or environment is complex, training data is not available, continuous learning is possible.

In situations where these three aspects are true, reinforcement learning can play out its main advantage, which is that training data can be generated as learning occurs. The training data is generated as the algorithm tries and either succeeds or fails in its task. Each success and failure then becomes a datapoint and helps the algorithm improve. The computer learns by doing and through experience, just as humans do.

Another key advantage of this approach is the innovative solution strategies that can emerge. Because the algorithm does not receive any instructions, it is able to come up with its own innovative strategy to solve the problem.

  • Disadvantages of this method

The one disadvantage of reinforcement learning is that a clear reward function needs to be stated, because the algorithm works towards a single objective. Therefore, this learning approach works especially well in situations with large complex problems, uncertain chaotic environments, and where continuous learning is involved. It works less optimally in realistic settings where several variables define a successful outcome, or the reward function is not so clearly stated.

 

When to use it ? & major applications:

 

To use these methods, it is necessary to obtain a rating and evaluation at each step of the learning process.

In Game theory and multi-agent interaction, reinforcement learning has been widely used to enable software play. A recent example would be Google's DeepMind which was able to beat the top-ranked Go player in the world and later, the best rated chess program Komodo.

Robotics - robots have often relied on learning reinforcement to perform better in the work environment. Learning reinforcement has the advantage of being a scalable solution for robots that may have to deal with unknown or constantly changing environments. For example, robots using sensors to analyze their 3D environment such as mechanical arms like the Smart Tissue Autonomous Robot can assist surgeons in operations where a human being's precision is insufficient.

Vehicle Navigation - the more information the vehicle receives, better they navigate on the road. Indeed, autonomous driving is only possible if the environment is correctly apprehended. However, it is impossible to simulate all the situations that a vehicle may encounter on the road. Hence, it is in this context that learning by reinforcement appears indispensable - the use of multiple sensors provide a large amount of data to analyze. The algorithms by reinforcement decipher the different actors of the environment: other vehicles, living beings or traffic signs. They can then correlate the current situation with situations previously encountered to make the necessary choice during the navigation. Experts estimate that it will take about ten billion kilometers of algorithm training to ensure the safety required to put these vehicles into service.

Machine Learning Deploy algorithms in business


5 major constraints:

The deployment : move to full scale on a distributed environment;

The robustness: supporting real-world, inconsistent and incomplete data;

The transparency: automatically detect a deterioration in the application's performance as the learning process progresses;

The adequacy to the available skills: do not develop AI solution that requires too much expertise for the implementation and optimization;

Proportionality: the time and money invested in a ML algorithm or its optimisation must be proportional to the gain obtained.

Deep Learning Introduction



 

Deep learning is an integral part of the most advanced AI applications. It is a subcategory of Machine Learning using mainly artificial neural network - algorithms inspired by the human brain able to process large amounts of data.

Faithfully modelled on the human brain, a neural network consists of thousands or even millions of simple processing nodes that are highly interconnected.

Thus, in the same way that we learn from experience, deep learning algorithms perform a task repeatedly, each time adjusting it a little to improve the result. The term "deep learning" refers to the several (deep) layers of neural networks that allow learning. Problems that require thinking can usually be solved with deep learning methods.

Artificial neural networks, also commonly referred to as neural nets, are at the heart of the recent developments in deep learning. Deep learning is an approach to machine learning where artificial neurons are connected in networks to generate an output, based on weights associated with the inputs.

An Artificial neural network tries to replicate the way in which neurons (i.e. nerve cells) in the brain receive inputs, process the inputs, and then produce an output (i.e. the activation of a synapse).

Artificial neural networks are one of the most effective forms of learning systems and are commonly used to solve problems associated with image recognition, speech recognition, and natural language processing

Deep Learning Example


Image classification : A neural network takes an image as an input and using successive layers, extracts increasingly refined features in the image to assign an image to a class. For example, if a neural network tries to identify the nature of a vehicle, the first layer can first detect the number of wheels, then a second the size of the vehicle, a third the brand of the vehicle, and so on until it is able to accurately identify the vehicle in the image.

Example of a convolutional neural network: the network takes many vehicle images as an input to learn the characteristics corresponding to each class of the initial dataset. Once the network is trained, it is able to identify new vehicles as they will have characteristics related or similar to those previously encountered.

Deep Learning: The training phase


For each of its incoming connections, a node will assign a number called weight. When the network is active, the node receives an encrypted data on each of its input connections and multiplies it by the weight associated to the input. It then adds up the resulting products, giving a unique number.

If the number exceeds the threshold value, the node "pulls", which in current neural networks generally means sending the number calculated by the node along all its outgoing connections. Otherwise, the node does not transmit anything to the next level.

During the training phase of a neural network, all the initial weights and thresholds are set to random values. The training data is transmitted from layer to layer, being multiplied and added up as explained above, until they reach the output, radically transformed. During this training, weights and thresholds are continuously adjusted until the training data with the same labels give similar results.

Example of a deep neural network on the prediction of a flower species according to its features (length and width of petal etc...):

Inputs are measurements on different flowers, and outputs correspond to the species to which the flower belongs. In the middle, we find the "hidden" layers of the network that correspond to the different levels of analysis. Training the network means determining the coefficients to be chosen to optimize the success of the algorithm by feeding it with images for which we know the answer. Then, the algorithm can find the answer from any image, without any human intervention.

Deep Learning: Structure of a neural network


There are several types of artificial neural networks. These types of networks are built based on mathematical operations and a set of parameters necessary to determine the output. The chosen structure (number of layers, number of nodes and links between them) will be different depending on the problem being addressed and the amount of data available.

Neural Networks description

A neural network graph usually depicts the network “on its side”, where processing goes from left to right and learning goes from right to left.

The input layer of an Artificial neural Network represents the input data that is fed into the network. The middle layers of an ANN are referred to as the hidden layers, which simply means that they are not classified as input or output. The output layer of the ANN represents the result of the network, and the way it behaves depends on the activity of the preceding hidden layers

An Artificial neural Network can have one or many hidden layers, depending on the complexity of the problem it is designed to solve. When an ANN has multiple hidden layers, it is commonly referred to as a deep neural network. It is a relatively simple task to program learning when the input is directly connected to the output.

However, when there are hidden layers of which the nodes are not pre-programmed (i.e. they are not task-specific), learning becomes more difficult, because the ANN must be able to decide which neurons in the hidden layers to activate to achieve the desired output

Various types of architectures of ANNs exist, including unsupervised pretrained networks, convolutional neural networks, recurrent neural networks, and recursive neural networks: The architecture is usually chosen depending on the type of problem to be solved and the type of data that is available to do so.                

Functionality of a neural Network:

ANNs consist of neurons, or nodes, connected to each other by links, which serve as the drivers to propagate the activation of each input into the appropriate output. It is these links that are awarded a specific weight, determining the strength of the connection between each output and input. An ANN uses forward and backpropagation to function.

In order to fully comprehend the functionality of ANNs, specifically when referring to backpropagation, it is also essential to understand the role that weights and errors play in the process. Firstly, when the ANN is processing data from the input layers, neurons in one layer are activated by the neurons in previous layer of the ANN. The activation of the neuron that takes place depends on the weights allocated to the connections between these neurons (i.e. the higher the weight allocated to a particular connection, the greater the contribution of that neuron activated to ultimately produce an accurate output). To change the output, the weights of the connections between neurons must also be adjusted, which is a highly complicated process. Although this is a difficult process, algorithms have been created to do this. The process whereby the weights of the connections between neurons are adjusted is known as “training” or “learning”.

An error is defined as the difference between the actual result and the expected result. The value and sign of the error are used to calculate the correction to the weight that has led to this error.

  • Forward propagation

Forward propagation entails signals moving from the input on the left to the output on the right. Every neuron in one layer receives its input from all the upstream neurons and delivers its output to the next layer of neurons to the right, depending on the allocated weights of the connections or links between them. The signal from the input to the output only moves in one direction, and there is no feedback between neurons in the same layer of the ANN.

In short, forward propagation is the way a neural network produces an output based on the inputs it is given.

  • Backpropagation

backpropagation is the procedure that repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector.

In other words, backpropagation in ANNs functions by adjusting or correcting the weights between neurons in order to minimise the error in the output of the ANN, and is the process by which an ANN learns

The aim of backpropagation is to reduce the ANN’s error by learning the training data. Backpropagation uses the supervised learning method of machine learning and is the most commonly used method of determining an output from given input. The connections between neurons are initially assigned random weights, and the algorithm then uses the training data to adjust the weights to reduce the error to its absolute minimum

Backpropagation functions by propagating the error from the output layer to the hidden layers, basically using the output on the right side of the ANN to identify the error in the hidden layers on the left. Thus, by including backpropagation in the process, an iterative loop is created, where the signal flows in both directions, improving with each iteration . As a result of this bidirectional movement of data, ANNs that include backpropagation possess greater potential to learn and produce accurate output from input.

Different types of neural network:

  1. Recurrent neural networks are adapted to situations where the data are not independent. This is the case with speech recognition - for example: predicting the next word is easier if you know the beginning of the sentence. Knowing the words 'how' and 'will', the network assigns a high probability to the word 'you' to simplify the detection of the next word.
  2. Convolutional neural networks are structured in successive layers - each layer allowing the extraction of increasingly precise features concerning the input data. They are particularly suitable for unstructured data, such as images for example. Moreover, they can also be used to assist in medical diagnosis by analyzing the results of imaging devices or to identify failures in a production line by using control cameras.

Conclusion

ANNs, with their ability to identify patterns in data to make predictions that drive decisions, are enabling various innovations, including autonomous vehicles, translation, fraud detection, image recognition, and speech recognition. As is evident from the discussion in this set of notes, the complexity of the ANN is dependent on the type of problem the network aims to solve. The more complex the problem, the more hidden layers will be needed to successfully process the input into an accurate output.

Big Data: Giving value to the data




The term ' Big Data ' refers to technologies that allow companies to quickly analyse a very large volume of data and obtain a synoptic view. By combining storage integration, predictive analytics and applications, Big Data saves time interpreting data in an effective and qualitative way.

Despite the term 'Big data' being relatively new, the gathering and storage of large amounts of information for analysis purposes is centuries old. The concept gained momentum in the early 2000s when the industry analyst Doug Laney articulated the now common definition of large data as the 'three Vs':

Volume. Organisations collect data from a variety of sources, including business transactions, social networks and information from sensors or machine data. In the past, storage would have been a problem, but new technologies such as Hadoop, now makes it possible.

Velocity. Data is flowing at an unprecedented rate and must be processed at the right moment. RFID tags, sensors and smart meters make it necessary to process data streams in near-real time.

Variety. Data are available in all types of formats - from structured and digital data in traditional databases, to unstructured text documents: emails, videos, audio files, stock market data and financial transactions.

Big Data one field, many expertises


Big Data is ultimately the crossover between all disciplines of data science. Data collection, analysis and visualisation - implementation of Machine or Deep Learning models and optimisation of calculation architectures. The expertise is as extensive as it is diverse: 

  • Artificial Intelligence (AI Experts);
  • Analysis and data visualisation (Data analyst);
  • Data mining and real-time processing (Data minor);
  • High Performance Computing,  HPC (HPC expert).  

For example, to develop intelligent trading applications, the combination of the different fields of expertise that Big Data represents is essential to guarantee the desired efficiency. Collecting financial data in real time, analysing them using AI algorithms on parallelised computing architectures to provide synthetical dashboards presenting the results require a diverse panel of experts at each step of the value chain.