## Highlights

- •Use of percolation theory as a fractal measure for feature extraction.
- •Local and global percolation features are obtained from RGB images.
- •Evaluation of number of clusters, percolation occurrence and clusters size.
- •Relevant results obtained by classifying histology images using the obtained features.

## Abstract

Percolation is a physical phenomenon that describes the transport of fluids through porous media. We present a new approach for extracting local and global percolation features from RGB images. The proposed method is based on multiscale and multidimensional approaches for extracting other fractal measures from histology images. Our method is able to obtain three percolation properties: number of clusters, occurrence of percolation, and size of the largest cluster. This approach aimed to provide an alternative to fractal dimension and lacunarity as ways of evaluating fractal properties in an image.

## Keywords

Code metadata

Current code version | v1 |

Permanent link to code/repository used for this code version | https://github.com/SoftwareImpacts/SIMPAC-2022-101 |

Permanent link to Reproducible Capsule | https://codeocean.com/capsule/2522280/tree/v1 |

Legal Code License | MIT license |

Code versioning system used | git |

Software code languages, tools, and services used | MATLAB |

Compilation requirements, operating environments & dependencies | MATLAB |

If available Link to developer documentation/manual | Not provided |

Support email for questions | [email protected] |

## 1. Introduction

Percolation is a physical phenomenon that consists on the transport of fluids in porous media. Several natural systems exhibit this type of behaviour, such as water flowing through coffee powder or gases through rocks. If the fluid is able to cross the whole system from one end to the other, a phenomenon known as percolation occurs. These concepts can also be adapted to image analysis.

In natural systems, the distribution of pores happens randomly. Based on this principle, studies have found that each system has a value $p$, which corresponds to the probability that a space is or is not a pore [

[1]

]. In addition to indicating what the ratio between the existence of pores and non-pores in a system is, $p$ has an important property known as the percolation threshold. This property denotes that, after a certain value of $p$, there is the guarantee of percolation occurrence. These concepts can also be adapted for images and avoid efforts in searching for clusters that indicate percolation. Each type of system has a different percolation threshold value, wherein the threshold for square matrices, which also fits for images, is 0.59275 [[2]

].Thus, percolation-based features have become more exploited to quantify images due to the ability to analyse percolating structures and clusters, complementing observations made with other fractal measures such as Fractal Dimension (FD) and Lacunarity (LAC). Percolation has been applied to quantify vascular [

[3]

], cardiac [[4]

], and bone [[5]

] images. However, these applications have been limited for quantifying binary or greyscale images. Therefore, we propose an approach based on percolation theory for extracting features from colour images. This approach uses multiscale and multidimensional observations in order to evaluate the following properties: average number of clusters; occurrence of percolation; and coverage ratio of the largest cluster. The method was applied to different histology image datasets and the obtained features were given as input to classifiers such as Random Forest and Multilayer Perceptron, which provided accuracy values ranging from 87% up to 98% [[6]

].- Candelero David
- Roberto Guilherme Freire
- Do Nascimento Marcelo Zanchetta
- Rozendo Guilherme Botazzo
- Neves Leandro Alves

Selection of CNN, haralick and fractal features based on evolutionary algorithms for classification of histological images.

in: 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM IEEE,
2020: 2709-2716

## 2. Fractal features for colour images

Fractal features can be obtained from coloured images initially by applying the

*gliding-box*algorithm [[7]

]. This algorithm consists in positioning a box of size $L\times L$ in the upper left corner of the image, where $L$ represents the side dimension of the box in pixels. This box moves from left to right to the bottom region of the image, passing through all pixels. After gliding through the entire image, the box is repositioned to the starting point and the value of $L$ is incremented by $2$ units. The value of $L$ must always be an odd number, since the existence of a central pixel is required for next steps of the method. In an image of size $H\times W$, the total number $T$ of boxes ${\beta}_{i}$ as a function of $L$ is given by Eq. (1): $$T\left(L\right)=(H-L+1)\times (W-L+1)\phantom{\rule{8.5359pt}{0ex}}|\phantom{\rule{8.5359pt}{0ex}}L\le min(H,W).$$

(1)

Each time the box ${\beta}_{i}$ is shifted, a colour similarity analysis is performed for each pixel of ${\beta}_{i}$. This analysis is performed by fixing the central pixel and assigning it to a vector ${f}_{c}=\{{r}_{c},{g}_{c},{b}_{c}\}$, where ${r}_{c},{g}_{c}$ and ${b}_{c}$ correspond to the luminous intensity of each of the colour channels, considering RGB images. The other pixels of the box are assigned to another vector ${f}_{i}=\{{r}_{i},{g}_{i},{b}_{i}\}$ and compared to the central pixel by calculating the Minkowski distance, which allows verifying which pixels belong to the RGB hyperspace formed by the central pixel of the box ${\beta}_{i}$. The Minkowski distance $\Delta $ is given by Eq. (2):

$$\Delta =max\left(|{f}_{i}\left({k}_{i}\right)-{f}_{c}\left({k}_{c}\right)|\right),k\in \{r,g,b\}.$$

(2)

If the value of $\Delta $ corresponding to the distance between ${f}_{i}$ and ${f}_{c}$ is less than or equal to the scale $L$, then ${f}_{i}$ is labelled $1$, otherwise ${f}_{i}$ is labelled $0$.

After calculating the number of pixels that satisfy $\Delta $ for different values of $L$, different operations can be applied to the labelled boxes to obtain measures such as local FD and LAC.

## 3. Our approach: Percolation Features

The method is based on Ivanovici and Richard’s models for obtaining fractal measures from colour images [

7

, 8

]. This approach follows the steps of the *gliding-box*algorithm and calculation of the*Minkowski*distance between a central pixel and the other pixels of a box for colour similarity analysis. Once all pixels in a box are labelled, the procedure to evaluate percolation properties is initiated. We extract two types of features: local and global.### 3.1 Local features

In order to evaluate percolation properties, we consider the pixels that satisfied the

*Minkowski*distance as pores. Then, the MATLAB function*bwlabel*is applied to each box to label all clusters. For each scale $L$, three different features are extracted: average number of clusters per box, percolation occurrence rate, and average area of the largest cluster. The average number of clusters $C\left(L\right)$ is given by Eq. (3), which consists of the sum of the total number of clusters $c$ in each box ${\beta}_{i}$ divided by the total number of boxes $T$.$$C\left(L\right)=\frac{\sum _{i=1}^{T\left(L\right)}{c}_{i}}{T\left(L\right)}.$$

(3)

In Eq. (5), it is shown how the percolation occurrence rate $Q\left(L\right)$ is calculated. This corresponds to the sum of boxes ${\beta}_{i}$ whose number of pixels labelled as pores (${\Omega}_{i}$) satisfy the percolation threshold, as given by Eq. (4).

$${q}_{i}=\left\{\begin{array}{cc}1,\phantom{\rule{1em}{0ex}}\hfill & \frac{{\Omega}_{i}}{{L}^{2}}\u2a7e0.59275.\hfill \\ 0,\phantom{\rule{1em}{0ex}}\hfill & \frac{{\Omega}_{i}}{{L}^{2}}<0.59275.\hfill \end{array}\right.$$

(4)

$$Q\left(L\right)=\frac{\sum _{i=1}^{T\left(L\right)}{q}_{i}}{T\left(L\right)}.$$

(5)

Eq. (6) corresponds to the calculation of the average coverage area of the largest cluster $M\left(L\right)$, given by the sum of the occupancy rates of the largest clusters in each box $\frac{{M}_{i}}{{L}^{2}}$ divided by the total number of boxes $T$.

$$M\left(L\right)=\frac{\sum _{i=1}^{T\left(L\right)}\frac{{M}_{i}}{{L}^{2}}}{T\left(L\right)}.$$

(6)

### 3.2 Global features

Global percolation values can be obtained by generating a curve from the values $logL\times log\Phi $, wherein $\Phi $ corresponds to one of the three local percolation features ($C\left(L\right)$, $Q\left(L\right)$ or $M\left(L\right)$). Using the method proposed in [

[8]

], global values can be obtained by applying the following metrics to such curve: area under curve ($A\left(\Phi \right)$), skewness ($S\left(\Phi \right)$), area ratio (${A}_{r}\left(\Phi \right)$), maximum point ($Max\left(\Phi \right)$) and scale of the maximum point ($\sigma \left(\Phi \right)$).The area under the function curve can be obtained by applying the trapezoidal method, as given by Eq. (7), where $a$ and $b$ are the minimum and maximum values of $L$ respectively, $\Phi $ is the percolation function and $N$ is the number of observation scales:

$$A\left(\Phi \right)=\frac{b-a}{2N}\sum _{n=a}^{b-1}(f\left({\Phi}_{n}\right)+f\left({\Phi}_{n+1}\right)).$$

(7)

The second global percolation feature obtained is the skewness, which corresponds to a measure of asymmetry. For a percolation curve wherein $N$ local values were calculated, the skewness is obtained according to Eq. (8), wherein ${\Phi}_{i}$ is the ${i}_{th}$ value of the sample and $\overline{\Phi}$ is its average value.

$$S\left(\Phi \right)=\frac{\frac{1}{N}\sum _{i=a}^{b}{({\Phi}_{i}-\overline{\Phi})}^{3}}{\sqrt[2]{{\left[\frac{1}{N-1}\sum _{i=a}^{b}{({\Phi}_{i}-\overline{\Phi})}^{2}\right]}^{3}}}.$$

(8)

Another feature that can be calculated is the ratio between the right and left halves of the curve as shown in Eq. (9).

$${A}_{R}\left(\Phi \right)=\frac{{A}_{(\frac{b}{2}+1,b)}}{{A}_{(a,\frac{b}{2})}}.$$

(9)

The values of $Max\left(\Phi \right)$ and $\sigma \left(\Phi \right)$ can also be obtained, which gives a total of five global percolation features.

## 4. Software description

The code is written in MATLAB R2019a and it is available on the Code Ocean platform. The software consists of three functions and one script. The functions are described in Table 1. The script

*main.m*extracts local and global percolation features from an RGB image given as input according to the parameter*maxL*. Although not mandatory, the value for the parameter*maxL*should be between 41 and 65 for generating a good sample of local values without increasing the running cost by more than necessary. This script can be used by pathologists or computer vision researchers for evaluating percolation properties on medical images, or using the obtained features on a classification system for diagnosis support.Table 1Description of software functions.

Function | Input | Output | Description |
---|---|---|---|

“percolation.m” | RGB image, $maxL$ parameter | Local and global percolation features | Extracts percolation features from the input image |

“getLocalFeatures.m” | Input image in double format and array o $L$ values | Three arrays containing the local features | Calculates the features by applying the process described in Section 3 |

“getGlobalFeatures.m” | Three arrays for the $C$, $Q$ and $M$ functions | Struct containing the 15 global features | Applies the procedures described in Section 3.2 |

## 5. Impact

Computer vision methods should be able to represent an object-level analysis of the features present in histological images, based on the criteria used by pathologists such as shape or size of cell nuclei. To quantify such information present in the tissues, computational algorithms must be able to extract features from histological images that explore their morphology, texture, topology and colouration. Fractal measures such as FD and LAC have often been used in this context for the past decade.

The concept of percolation can be explored to define fractal features of images. The information obtained with percolation has the ability to describe properties related to the presence, the dimension and the quantity of clusters in the images. These properties are related to features such as shape, size and quantity of objects, which are relevant information for experts in histopathology. When percolation features were given as input to the classification of non-Hodgkin lymphomas, breast tumours, colorectal tumours and liver tissue datasets, accuracy values of 94.2%, 86.2%, 97.6% and 99.6% were obtained, respectively [

9

, 10

]. These results represent an improvement of up to 17% higher when compared to other fractal measures [[11]

]. We believe this new approach can be used as a complementary measure to FD and LAC in order to improve the performance of CAD systems.## 6. Limitations & future work

In this work, we have evaluated the proposed technique by applying it to RGB images. Although the use of our method in greyscale is possible with some minor adaptations, other colour models would require major changes to the Minkowski distance calculation. Moreover, a study can be made regarding the ideal value for the input parameter

*maxL*, since a greater value would generate more features but significantly increase computing cost. The application of this technique could also be evaluated on non-medical images.## Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## Acknowledgements

This study was financed in part by the : Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001; National Council for Scientific and Technological Development (CNPq), Brazil (Grants #311404/2021-9 and #313643/2021-0); and State of Minas Gerais Research Foundation - FAPEMIG, Brazil
(Grants #APQ-00578-18 and #APQ-01129-21).

## References

- Simple cubic random-site percolation thresholds for neighborhoods containing fourth-nearest neighbors.
*Phys. Rev. E.*2015; 91043301 - Multiscale percolation properties of a fractal pore network.
*Geoderma.*2010; 160: 105-110 - Characterization of tumor angiogenesis using fractal measures.in: 2013 19th International Conference on Control Systems and Computer Science. IEEE, 2013: 345-349
- Aprendizado de máquina simbólico e técnicas fractais para caracterizar rejeiçăo em biópsia miocárdica.in: V Latin American Congress on Biomedical Engineering, CLAIB 2011 May 16-21, 2011, Habana, Cuba Springer, 2013: 272-275
- Characterization of trabecular architecture in femur bone radiographs using succolarity.in: 2013 39th Annual Northeast Bioengineering Conference. IEEE, 2013: 225-226
- Selection of CNN, haralick and fractal features based on evolutionary algorithms for classification of histological images.in: 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM IEEE, 2020: 2709-2716
- Fractal dimension and lacunarity of psoriatic lesions-a colour approach.
*Medicine.*2009; 6: 7 - Psoriasis image analysis using color lacunarity.in: Optimization of Electrical and Electronic Equipment (OPTIM), 2012 13th International Conference on. IEEE, 2012: 1401-1406
- Classification of breast and colorectal tumors based on percolation of color normalized images.
*Comput. Graph.*2019; 84: 134-143 - Fractal neural network: A new ensemble of fractal geometry and convolutional neural networks for the classification of histology images.
*Expert Syst. Appl.*2021; 166114103 - Features based on the percolation theory for quantification of non-hodgkin lymphomas.
*Comput. Biol. Med.*2017; 91: 135-147

## Article info

### Publication history

Published online: July 25, 2022

Accepted:
July 16,
2022

Received in revised form:
July 12,
2022

Received:
June 19,
2022

### Footnotes

The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.

### Identification

### Copyright

© 2022 The Author(s). Published by Elsevier B.V.

### User license

Creative Commons Attribution (CC BY 4.0) | How you can reuse

Elsevier's open access license policy

Creative Commons Attribution (CC BY 4.0)

## Permitted

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes

Elsevier's open access license policy