Normalization

Neural networks can only work properly if your data is well distributed. In most cases, a simple standardization (subtracting the mean and dividing by the standard deviation) is more than enough. However, data encountered in power grids is quite atypical and is very likely to display multimodal distributions.

Moreover, we wish to have a normalization process that does not alter the permutation-equivariance of the data. For more details about some properties of our data, please refer to Data Formalism.

Our data being composed of multiple instances of various classes (buses, generators, loads, lines, etc.), we wish to have a normalizing mapping for each class. Even for a given class, there are multiple features that may be defined in different units. For instance the active power of generators is usually defined in MW, while the voltage setpoint of generators are usually defined in p.u.. Those quantities being defined in different units, it would make no sense to use the same normalizing mapping for all features.

To sum things up, we need to build a normalizing function for each feature of each class. As a consequence all active power of all generators will be normalized using the exact same mapping.

Note

One may argue that we could also use a different normalizing function for the different instances of a given class, the rationale being that, for instance, two different generators may produce very different power orders of magnitudes. Thus, using a separate normalizing function for each instance may also work.

By doing so, we would actually break the permutation-equivariance of the data. If the neural network used is a simple fully connected architecture, then this may not have that much of an impact. But if we were to use a permutation-equivariant neural network architecture (such as a Graph Neural Network), then this would introduce a detrimental noise, which could prevent the neural network from learning anything meaningful.

Fitting normalizing functions

Let us consider a single feature of a single class of objects (e.g. active power of loads, expressed in MW). If we take a look at the distribution of values across all objects of this class and across all power grid instances (e.g. all active power of all loads of all power grids in a given dataset), we may observe some atypical and multimodal distribution, as illustrated in the figure below. In this case, standardization is not enough to make data suitable for our neural network. We are looking for another way of mapping this odd distribution to a more appropriate one.

_images/distribution.png

Cumulative Distribution Function

Fortunately, the CDF (Cumulative Derivative Function) provides by definition an efficient way of converting our data to a uniform law over the interval [0, 1]. Moreover, for computational reasons, we may even want to consider a subset of the empirical distribution (see amount_of_samples parameter).

_images/cdf.png

Approximating the CDF

The empirical CDF has one major drawback, as it is made of discrete increments. To solve this issue, we propose to build a piecewise linear approximation of this function. To do so, we introduce a parameter break_points which define the amount of breakpoints we want to have in our normalizing function. We split the interval [0,1] into break_points equal chunks, and look at the corresponding quantiles (displayed as red dots in the figure below). We then use the linear interpolation provided by scipy.

_images/approximation.png

Merging equal quantiles

In general, it is possible that you obtain multiple equal quantiles. As a result, the obtained interpolation is not continuous. In such a case, we simply merge equal quantiles by taking the mean of the corresponding probabilities. For instance, in the figure below, we merged the 20% and 40% quantiles into a 30% quantile. The interpolation is now continuous.

_images/conflicts.png

Out of distribution extrapolation

Since we only have access to a partial empirical distribution, it is very likely that some values in the train and/or test sets will be out of the range of observed values. If we only took the interpolation as it is, then those values would all be mapped to either 0 or 1 (depending if it is above or below the range of observed values). This would prevent the neural network to make a distinction between values that are out of range.

Thus, we propose to extrapolate by extending the first and last slopes. The rationale behind this choice is the following. Larger (resp. smaller) values should have a very similar order of magnitude as the max (resp. min) value that was used to fit the normalizing function. Since we want a continuous and non-constant function, extending the largest (resp. smallest) non-zero slope will map new values very close, disregard the data order of magnitude. These extensions are illustrated in the figure below.

_images/extrapolation.png

Usage

A normalizer can be built using a dataset :

import ml4ps as mp
normalizer = mp.Normalizer(data_dir = data_dir, backend_name = 'pandapower')

Once built, it can normalize feature data. We recommend to pass it directly into a Dataset, so that your pipeline directly returns normalized data.

x_norm = normalizer(x)

A normalizer can be saved into a .pkl file.

normalizer.save('my_normalizer.pkl')

It can then be loaded from the said .pkl file.

normalizer = mp.Normalizer(filename='my_normalizer.pkl')

Contents

class ml4ps.normalization.Normalizer(filename=None, **kwargs)[source]

Normalizes power grid features while respecting the permutation equivariance of the data.

functions

Dict of dict of single normalizing functions. Upper level keys correspond to objects (e.g. ‘load’), lower level keys correspond to features (e.g. ‘p_mw’) and the value corresponds to a normalizing function. Normalizing functions take scalar inputs and return scalar inputs.

Type:

dict of dict of ml4ps.normalization.NormalizationFunction

__call__(x)[source]

Normalizes input data by applying .

Note If one feature and/or one object present in the input has no corresponding normalization function, then it is returned as is.

__init__(filename=None, **kwargs)[source]

Initializes a Normalizer.

Parameters:
  • filename (str, optional) – Path to a normalizer that should be loaded. If not specified, a new normalizer is created based on the other arguments.

  • backend (ml4ps.backend.interface.Backend) – Backend to use to extract features. Changing the backend will affect the objects and features names.

  • data_dir (str) – Path to the dataset that will serve to fit the normalizing functions.

  • n_samples (int, optional) – Amount of samples that should be imported from the dataset to fit the normalizing functions. As a matter of fact, fitting normalizing functions on a small subset of the dataset is faster, and usually provides a relevant normalization.

  • shuffle (bool, optional) – If true, samples used to fit the normalizing functions are drawn randomly from the dataset. If false, only the first samples in alphabetical order are used.

  • n_breakpoints (int, optional) – Amount of breakpoints that the piecewise linear functions should have. Indeed, in the case of multiple data quantiles being equal, the actual amount of breakpoints will be lower.

  • features (dict of list of str) – Dict of list of feature names. Keys correspond to objects (e.g. ‘load’), and values are lists of features that should be normalized (e.g. [‘p_mw’, ‘q_mvar’]).

load(filename)[source]

Loads a normalizer.

save(filename)[source]

Saves a normalizer.

class ml4ps.normalization.NormalizationFunction(x, n_breakpoints)[source]

Normalization function that applies an approximation of the Cumulative Distribution Function.

interp_func

Piecewise linear function that will serve to normalize data.

__call__(x)[source]

Normalizes input by applying an approximation of the CDF of values provided at initialization.

__init__(x, n_breakpoints)[source]

Initializes a normalization function.

Note In the case where all provided values are equal, there is no interpolation possible. Instead, the normalization function will simply subtract this unique value to its input.

Note The piecewise linear approximation of the Cumulative Distribution Function is extended for larger (resp. smaller) values by extending the last (resp. first) slope.

Parameters:
  • x (dict of dict of np.array) – Batch of input data which will serve to fit a piecewise linear approximation of the Cumulative Distribution Function.

  • n_breakpoints (int) – Amount of breakpoints that should be present in the piecewise linear approximation of the Cumulative Distribution Function.