Original software publication| Volume 15, 100465, March 2023

Ok

# Anomaly Detection, Classification and Identification Tool (ADCIT)

Open AccessPublished:January 13, 2023

## Highlights

• ADCIT is used for detection, classification and identification of anomalies in power system state estimation.
• Outputs of state estimators are used as inputs for machine learning algorithms.
• ADCIT does not require retraining of the machine learning algorithm in the presence of network topology changes.
• ADCIT can help the power system operator to design proper countermeasures in case of an anomaly occurrence.

## Abstract

The Anomaly Detection, Classification and Identification Tool (ADCIT) is an open source Matlab and Python code used for detection, classification and identification of anomalies in power system state estimation. Outputs of weighted least squares (WLS) and extended Kalman filter (EKF) state estimators, developed in Matlab, are used as inputs for machine learning algorithms developed in Python. The ADCIT can address hard anomaly cases; for example, it can detect and classify the case when load is abruptly changed at multiple nodes simultaneously, or when false data injection attack targets multiple states at the same time. Additionally, the ADCIT does not require retraining of the machine learning algorithm in the presence of network topology changes. Application of the ADCIT within power grid energy management system can help system operator to design proper countermeasures in case of an anomaly occurrence.

## Keywords

Tabled 1
 Current code version v1 Permanent link to code/repository used for this code version https://github.com/SoftwareImpacts/SIMPAC-2022-297 Permanent link to Reproducible Capsule https://codeocean.com/capsule/8988211/tree/v1 Legal Code License MIT Code versioning system used none Software code languages, tools, and services used Matlab, Python Compilation requirements, operating environments & dependencies Matlab: Matpower Python: libraries such as Pandas, NumPy, Scikit-Learn If available Link to developer documentation/manual Support email for questions Matlab: [email protected] Python: [email protected]

## 1. Introduction

Power system state estimation (SE) plays an important role in energy management systems. Its task is to provide accurate estimates of voltage magnitudes and phase angles for all nodes in the system [
• Gomez-Exposito A.
• Conejo A.J.
• Canizares C.
Electric Energy Systems: Analysis and Operation.
]. SE can be subjected to many types of anomalies, among which bad data (BD), sudden load changes (SLC) and false data injection attacks (FDIA) are common. In order to take proper countermeasures by the system operator, anomalies must be reliably detected, classified and identified [
• Asefi S.
• Mitrovic M.
• Ćetenović D.
• Levi V.
• Gryazina E.
• Terzija V.
Power system anomaly detection and classification utilizing WLS-EKF state estimation and machine learning.
]. To this end, the Anomaly Detection, Classification and Identification Tool (ADCIT) is developed. Detection of anomalies takes place in the first ADCIT stage through application of the Matlab source code. In the second ADCIT stage, classification of anomalies and identification of anomalies’ origin is done in Python.
Detection of BD is usually done by $χ2$-test [
• Abur A.
• Exposito A.G.
Power System State Estimation: Theory and Implementation.
]. However, it is difficult to detect either SLC or FDIA by applying $χ2$-test within weighted least squares (WLS) estimator. The ADCIT combines estimated states of WLS and extended Kalman filter (EKF) to set up an anomaly detection index (ADI) capable of detecting both SLC and FDIA; however, ADI cannot discriminate between SLC and FDIA. To classify (or, discriminate) SLC and FDIA correctly, various supervised machine learning (ML) algorithms have been implemented. Moreover, the case when load is abruptly changed at multiple nodes simultaneously (named “multi-bus SLC”), or FDIA is targeting multiple states at the same time (named “multi-state FDIA”) are for the first time considered and ADCIT is capable to correctly discriminate between multi-bus SLC and multi-state FDIA. Furthermore, the features utilized for training the ML algorithm(s) are associated only with the network nodes. This increases the robustness of the algorithms by eliminating the need to retrain the ML algorithm in case of network topology changes. Inside the ADCIT, different ML algorithms are available and they are user-defined. Finally, different types of anomalies can be successfully analyzed by the ADCIT.

The ADCIT algorithm has been implemented in Matlab and Python. Fig. 1 demonstrates the general scheme of the ADCIT algorithm. The details regarding each code are presented below.

### 2.1 Matlab: Data preparation and detection

To provide the labeled data for the training of the ML algorithms power system simulations are conducted within Matlab environment. Firstly, raw measurements are generated using the procedure described below. Next, raw measurements are processed by two types of state estimators, namely WLS and EKF, to get the estimated (and, in case of EKF, predicted) electrical quantities. IEEE 14 bus test system [

R. Christie, Power Systems Test Case Archive. 14 Bus Power Flow Test Case, 1993, University of Washington, Department of Electrical Engineering, [Online] Available at https://labs.ece.uw.edu/pstca/pf14/pg_tca14bus.htm.

] has been selected as the benchmark.
MATPOWER, an open-source Matlab extension for solving steady-state power system optimization problems, has been utilized to execute consecutive optimal power flows (OPFs) over the time [

R.D. Zimmerman, C.E. Murillo-Sanchez, Matpower, URL https://matpower.org.

]. Considering that the load at each consumption node is given, the OPF provides nodal voltage magnitudes, active/reactive power flows in branches and active/reactive power injections at generator nodes. These values are used as the true values of measurements. A noise term, having Gaussian distribution with zero mean and 0.01 standard deviation, is added to the true measurements to get the raw measurements.
To simulate a BD case, corresponding raw measurements are corrupted with a random error which does not fall under the predefined Gaussian distribution. For simulating a SLC, a pre-specified amount of load is curtailed at the desired time instant during the execution of the consecutive OPFs. In the case of FDIA, the raw measurements are modified according to the attack vector. To simulate multi-bus SLC or multi-state FDIA, the user can change the setting of parameters SLC_bus or FDIA_state from a scalar value to a vector. This change has to be made in the main m-file. For instance, SLC_bus $=$ [5 10 12] means that the SLC is happening at the nodes 5, 10 and 12 simultaneously.
WLS and EKF based state estimations are carried out under normal operating conditions (i.e., quasi steady state) and abnormal operating conditions (BD, SLC or FDIA). Apart from the estimated states, other outputs, such as predicted states, normalized residuals and normalized innovations, are obtained. To detect BD, measurement residuals obtained via WLS state estimation are used to carry out $χ2$-test. In case there is no BD, the algorithm will check for SLC or FDIA using ADI [
• Asefi S.
• Mitrovic M.
• Ćetenović D.
• Levi V.
• Gryazina E.
• Terzija V.
Power system anomaly detection and classification utilizing WLS-EKF state estimation and machine learning.
]. In case ADI value is equal or higher than a specific threshold, anomaly is detected; otherwise, system is considered to be in the normal operation mode.
It is to be noted that the moment when an anomaly occurs and vanishes can be specified within the code. Additionally, it is also possible to change the test system or network topology; however, it requires further modification of the parameter settings in MATPOWER source file and several m-files.

### 2.2 Python: Classification and identification

Python environment is used for input data pre-processing and application of the ML algorithms for anomaly classification and identification. For the sake of comparison, four supervised ML algorithms, namely Random Forest (RF), Extreme Gradient Boosting (XGB), Logistic Regression (LR) and K-Near Neighbors (KNN) are applied [
• Asefi S.
• Mitrovic M.
• Ćetenović D.
• Levi V.
• Gryazina E.
• Terzija V.
Power system anomaly detection and classification utilizing WLS-EKF state estimation and machine learning.
].
As mentioned before, to eliminate the need to retrain the ML algorithms when the network topology changes, the features associated with the power lines are excluded and only the features associated with the nodes are utilized. The features associated with the nodes are: (a) Nodal measurements and normalized measurement innovations of voltage magnitudes and active/reactive power injections; (b) Estimates and predictions of voltage magnitudes, phase angles and active/reactive power injections.
Maximum relevance – minimum redundancy (MRMR) algorithm has been applied for the feature selection [
• Zhao Z.
• Anand R.
• Wang M.
Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform.
]. The parallelization function has been included in the MRMR script to utilize multiple cores of the CPU and run tasks in parallel. Accordingly, the most relevant features can be found fast and without additional computational complexity.
For executing the ML algorithms, standard models from the scikit-learn library have been applied [

L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, G. Varoquaux, API design for machine learning software: experiences from the scikit-learn project, in: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013, pp. 108–122.

]. All models are trained by tuning appropriate hyperparameters. Hyperparameters are tuned using sequential optimization with gradient boosting as a surrogate probability model of the objective function [
• Friedman J.H.
]; the scikit-optimize library is used for this purpose.

## 3. Illustrative example

An example of the results obtained by ADCIT is demonstrated in Fig. 2. If $JBDD≥χ(m−n),p2$ holds, then there is a high probability of the existence of a BD. Here, $JBDD$ stands for $χ2$-test’s objective function; $χ(m−n),p2$ corresponds to a value from the $χ2$ distribution table with the probability $p$ and ($m−n$) degrees of freedom; $m$ and $n$ is the number of observed measurements and number of estimated states, respectively.
If $max{ADIi}≥γ$, SLC or FDIA presence is detected. Here, $max{ADIi}$ stands for maximum ADI value, and $γ$ represents the detection threshold that has to be selected to clearly discriminate between normal operation and anomalies [
• Asefi S.
• Mitrovic M.
• Ćetenović D.
• Levi V.
• Gryazina E.
• Terzija V.
Power system anomaly detection and classification utilizing WLS-EKF state estimation and machine learning.
].
Fig. 2 shows detection indices when the system is affected by FDIA. FDIA tends to increase the voltage magnitude at bus $14$ for 0.1 p.u., starting from $t=350$ and persists until the end of the simulation.
It is obvious that the largest ADI test is highly capable of detecting anomaly presence, while anomaly bypass the $χ2$-test. Yet, this test is not able to classify the occurred anomaly according to its type. Therefore, ML algorithms have been utilized for the classification of the anomalies. Table 1 summarizes the performances of the ML algorithms in terms of calcification accuracy and training time, for the classification of SLC and FDIA, i.e., to discriminate between SLC and FDIA. [
• Asefi S.
• Mitrovic M.
• Ćetenović D.
• Levi V.
• Gryazina E.
• Terzija V.
Power system anomaly detection and classification utilizing WLS-EKF state estimation and machine learning.
]. Due to the fact that some of the features might be redundant or less relevant for the training of the ML algorithms, MRMR has been applied to select the most relevant features. In this example, the number of features selected by MRMR is $70$ compared to the number of features without MRMR (WO MRMR) which is $214$. This has helped to reduce the training time of the ML algorithms. Although the accuracy of LR and KNN algorithm slightly decreases, in the case of RF and XGB algorithm the accuracy remains the same.
Table 1Single bus/state SLC/FDIA classification.
MLClassification accuracyTrainingAccuracy usingTraining time
algorithmWO MRMR (%)time (s)MRMR (%)using MRMR (s)
LR98.55577.1487.94136.29
KNN99.7442.2697.5338.35
RF100946.76100561.29
XGB100858.98100324.35

## 4. Software impacts

Accurate anomaly detection, classification, and identification are of great importance for power system state estimation. The impacts of the ADCIT are twofold. Firstly, it can be applied as an educational tool. It provides an opportunity for the researchers to observe the adverse effects of the anomalies on the state estimates, and to analyze how the ADCIT enables anomaly detection, classification, and identification in order to avoid these effects [
• Abur A.
Power education toolbox (P.E.T): An interactive software package for state estimation.
]. In another word, the researchers can modify/extend the ADCIT to implement their own ideas. This means that the ADCIT can be used as a platform for the future research work.
Yet another impact of the tool is its capability for industrial implementation. Without the requirement of any additional hardware installation, the ADCIT can be integrated within the energy management systems in power system control rooms [
• Gomez-Exposito A.
• Conejo A.J.
• Canizares C.
Electric Energy Systems: Analysis and Operation.
]. The ADCIT enables a better situational awareness in the presence of the anomalies, which are typical in case of system emergencies [
• Asefi S.
• Mitrovic M.
• Ćetenović D.
• Levi V.
• Gryazina E.
• Terzija V.
Power system anomaly detection and classification utilizing WLS-EKF state estimation and machine learning.
,
• Yohanandhan R.V.
• Elavarasan R.M.
• Manoharan P.
• Mihet-Popa L.
Cyber-physical power system (cpps): A review on modeling, simulation, and analysis with cyber security applications.
]. Besides, the specification of the anomaly type (i.e., anomaly classification) by the ADCIT will assist the system operator for proper decision making.
Creating a graphical user interface, considering other types of anomalies such as network parameter errors, and designing suitable countermeasures against anomalies, can be considered as future directions for the development of ADCIT.

## Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## Acknowledgments

Work of S. Asefi, M. Mitrovic, E. Gryazina and V. Terzija was supported by Skoltech and The Ministry of Education and Science of Russian Federation, Grant Agreement No 075-10-2021-067, Grant identification code 000000S707521QJX0002. Work of D. Ćetenović and V. Levi was supported by the Engineering and Physical Sciences Research Council (EPSRC) of UK (Grant No. EP/S00078X/2 and Grant No. EP/T021969/1), and the Ministry of Education, Science and Technological Development of the Republic of Serbia (Grant No. 451-03-68/2022-14/200132 with University of Kragujevac - Faculty of Technical Sciences Čačak).

## References

• Gomez-Exposito A.
• Conejo A.J.
• Canizares C.
Electric Energy Systems: Analysis and Operation.
CRC Press, 2018
• Asefi S.
• Mitrovic M.
• Ćetenović D.
• Levi V.
• Gryazina E.
• Terzija V.
Power system anomaly detection and classification utilizing WLS-EKF state estimation and machine learning.
2022 (arXiv preprint arXiv:2209.12629)
• Abur A.
• Exposito A.G.
Power System State Estimation: Theory and Implementation.
CRC Press, 2004
1. R. Christie, Power Systems Test Case Archive. 14 Bus Power Flow Test Case, 1993, University of Washington, Department of Electrical Engineering, [Online] Available at https://labs.ece.uw.edu/pstca/pf14/pg_tca14bus.htm.

2. R.D. Zimmerman, C.E. Murillo-Sanchez, Matpower, URL https://matpower.org.

• Zhao Z.
• Anand R.
• Wang M.
Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform.
in: 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA IEEE, 2019: 442-452
3. L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, G. Varoquaux, API design for machine learning software: experiences from the scikit-learn project, in: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013, pp. 108–122.

• Friedman J.H.