Supervised machine learning describes the recognition of connections in data sets. In contrast to unsupervised machine learning, both the input and the output are already available as a data set. The algorithms learn (“train”) in this data the connection between input (features) and output (e.g., label).
The algorithm trained in this way can be applied to new input data to predict a result based on the learned relationships. Accordingly, these algorithms serve the purpose of carrying out regressions or classifying data. They are often used in forecasts in particular.
A particular challenge with this form of technology is generating a sufficiently large data set with input and output. In many cases, this is still associated with manual work.
Many use cases of AI discussed today fall into supervised learning – often combined with (complex) artificial neural networks (deep learning). A property of these algorithms is that their training with large amounts of data is time-consuming, but in many cases, a trained model can calculate an output for new input much faster than classic methods.
This leads, among other things, to the simulation of physical effects, e.g., B. from fluids (see non-Newtonian fluids like honey or the interaction of water with solids ), light rays, or other material properties to significant progress in near real-time. Accordingly, the potential of this application area is also very large in the energy industry.
Selected algorithms of “Supervised Machine Learning” are presented below:
k – Nearest Neighbor (KNN):
The ANN is a simple algorithm that can be implemented quickly to assign data points to exist clusters/groups. An (e.g., Euclidean) distance to all data points in the training data set is calculated, and the respective k-nearest neighbors (with the smallest distance) are selected.
The labels of the selected k-nearest neighbors serve as the basis for the classification of the data point to be classified. The mean of the k-nearest neighbors is used in regressions and the mode in classifications.
Strengths:
- Easy and quick to implement
- No training is necessary (lazy learner) because it is a non-parametric method (no functional connection between input and output is learned)
- Few hyperparameters or complex model construction necessary
- It can be used for both regressions and classifications.
Weaknesses:
- low scalability with large amounts of data
- low scalability with many dimensions
Decision Trees
Decision trees are ordered, directed graphs used to visualize decision rules. In the training phase, decision rules are formed by the algorithm from the features of a data set. This is achieved by breaking the features into subsets using “questions.” These “questions” (mathematically, these are simple (in)equations) aim to achieve the clearest possible separation of labels into the respective subsets. Their order is determined by the information gained. This is high when, for example, the entropy or variance of the subsets is as low as possible, i.e., they are as homogeneous as possible.
Strengths:
- Easy to understand, interpret and visualize
- Works with numeric and categorical features
- Relatively low preprocessing effort (missing data are not a problem, for example)
- No scaling/normalization is necessary
Weaknesses:
- Fast overfitting
- The algorithm only sometimes learns the global optimum.
- Small changes in the input can cause significant changes in the decision tree.
- Long computing times for complex data sets
- Only suitable for classifications
Artificial Neural networks (ANN)
Artificial neural networks aim to replicate the neural structure of biological life. For this purpose, neurons are arranged in layers. Various layers are connected, transporting information between the neurons from the input layer via the hidden layer to the output layer. The connections between the neurons are provided with so-called “weights” and “biases,” which in turn are offset against the respective information. A neuron only passes on information to the downstream layer if the incoming information exceeds a certain threshold value (specified by an activation function). As part of the training, these “weights” and “biases” are set iteratively between the layers so that the desired output is generated. Of all the methods presented so far, ANN has the highest complexity but can, therefore, cope with very complex challenges.
Strengths:
- can also learn highly complex relationships between input and output
- A trained network can be used in almost real-time, even on devices with low computing capacity.
- Copes with incomplete information
- Lowering the barrier to entry through frameworks (such as Keras or Pytorch)
- Great global research efforts lead to rapid improvements
Weaknesses:
- Network structure and hyperparameter tuning are very complex and require a lot of “trial and error” in addition to experience
- Low traceability of the learned functional connection between input and output (black box)
- Strong hardware dependency
The advantage of these algorithms is that the quality of the learning process results can be checked using the already-known outputs. The disadvantage is that neural networks, in particular, are a “black box,” which makes traceability difficult. In principle, however, supervised machine learning algorithms are very versatile, and neural networks, in particular, have great potential in many application areas.