## Abstract

Quantifying the connectivity of material microstructures is important for a wide range of applications from filters to biomaterials. Currently, the most used measure of connectivity is the Euler number, which is a topological invariant. Topology alone, however, is not sufficient for most practical purposes. In this study, we use our recently introduced connectivity measure, called the contour tree connectivity (CTC), to study microstructures for flow analysis. CTC is a new structural connectivity measure that is based on contour trees and algebraic graph theory. To test CTC, we generated a dataset composed of 120 samples and six different types of artificial microstructures. We compared CTC against the Euler parameter (EP), the parameter for connected pairs, the nominal opening dimension (*d*_{nom}) and the permeabilities estimated using direct pore scale modelling. The results show that *d*_{nom} is highly correlated with permeability (*R*^{2} = 0.91), but cannot separate the structural differences. The groups are best classified with feature combinations that include CTC. CTC provides new information with a different connectivity interpretation that can be used to analyse and design materials with complex microstructures.

## 1. Introduction

Several engineered and natural processes such as filtering, cell seeding into biomaterials and groundwater flow are governed by transport through porous media. Most relevant physical processes of flow take place at the micro or pore scale. This detailed morphological information can be obtained using high-resolution three-dimensional imaging devices such as microcomputed tomography (μ-CT) [1]. The capability of the imagers to reveal pore scale details has made three-dimensional microstructure imaging very attractive. As a result of the rapidly expanding number of devices, there is a growing demand for effective image processing techniques to process and analyse binary images [2].

Among the characterizations of porous medium, understanding the role of connectivity has become a very active research topic during the past decade. A paradigm shift towards studying connectivity has been pointed out earlier in the fields of hydrogeology, surface hydrology, geomorphology, landscape ecology and pore scale or soil physics [3]. Connectivity is an important property to study because it complements the information from local spatial variances and higher-order correlations that are traditionally used to model heterogeneity. One problem associated with connectivity is the lack of a unique mathematical definition. Being an intuitive notion, the quantification of connectivity is based on how it is interpreted. For example, it is common to refer to static, dynamic, local and global measures of connectivity [3]. Among the static methods, notable approaches include the Euler number [4], percolation theory [5] and connectivity function [6]. Dynamic connectivity metrics involve the physical processes of flow or transport. They are estimated experimentally or with modelling such as effective hydraulic conductivity [7].

Static and scalar measures of connectivity can be calculated from the binary images of microstructures. Static measures are not only faster, but also cheaper to obtain compared with dynamic connectivity measures. Additionally, scalar values are easier to understand and use. However, static and scalar connectivity measures are very limited in the literature. To the best of our knowledge, the proportion of connected pairs (*Γ*) from percolation theory and the Euler number (*χ*) are the only scalar measures described in the literature.

High-resolution μ-CT images of many microstructures have sharp boundaries, which makes it straightforward to segment permeable (pores) and impervious (material) regions using simple techniques such as thresholding [2]. The proportion of connected pairs is expressed as and shows the ratio of the pairs of cells that are connected to all the pairs of permeable cells. Here, *n* is the total number of permeable cells, *N* is the number of connected permeable regions and *n _{i}* is the number of permeable cells in the

*i*

^{th}region. One common case in which

*Γ*does not provide information is when there is a medium with a single fully connected permeable region, for which

*Γ*= 1.

The most widely accepted and commonly used connectivity measure for binary images is the Euler number that is described with the Euler–Poincaré formula as the alternating sum of Betti numbers, *β _{k}*,

*χ*= ∑

*(−1)*

_{k}*. For two-dimensional images,*

^{k}β_{k}*χ*is simply the number of connected components (

*β*

_{0}) minus the number of holes (

*β*

_{1}). For three-dimensional, it is the number of connected components (

*β*

_{0}), minus the number of tunnels (

*β*

_{1}), plus the number of voids (

*β*

_{2}). Although the Euler number is an elegant topological invariant, topology alone is not sufficient to describe connectivity in most cases. For example, although the Euler number includes information regarding the number of connected components, information about the proximity of the connected components is not provided. In addition, the Euler number can give information about holes, but not cavities (holes are topologically not connected to the outside, cavities are).

To overcome the limitations of the available approaches, we use contour trees and develop an alternative static and scalar connectivity measure called the contour tree connectivity (CTC) [8]. Contour trees are topological abstractions that follow the evolution of level sets as they appear, join, split or disappear by means of graphs. They have mostly been used to simplify volumes and to visualize high dimensional scalar fields [9–14].

An application of contour trees for analysing microstructures, or binary images in general, is given in our earlier work [15]. In reference [15], a graph representation of microstructures is proposed that takes into account both the geometrical properties of foreground object(s) and how the object(s) is/are placed in the background. This representation of images transforms complex three-dimensional microstructures into simple graphs. In a follow-up study, we developed CTC that calculates the connectivity of binary images by calculating the connectivity of their contour tree representations [8]. The first application of CTC is given for the analysis of trabecular bone microstructure [16]. In reference [16], it is shown that CTC is the third best morphometric parameter to predict ultimate bone strength among the 10 most commonly used measures.

In this study, we assess and compare the capability of various geometrical characteristics including CTC in relating structural properties with the permeability of microstructures. To achieve this goal, we generate 120 samples that represent six different types of microstructures with equal porosities using Matlab. We correlate the nominal opening dimension (*d*_{nom}), CTC, the EP, the parameter for connected pairs (PCPs), and the permeabilities estimated using direct pore scale modelling. We show that the connectivity interpretation and information provided by CTC is different from the other measures and can be used to quantify connectivity in order to identify the differences and changes in microstructures.

## 2. Description of materials

We prepared random porous structures that resemble segmented microstructures obtained from a *µ*-CT device with an isotropic voxel spacing of 7.8125 µm. All samples were designed to have physical dimensions of 1 × 1 × 1 mm which correspond to 128 × 128 × 128 voxels in image domain. In order to quantify connectivity independently of porosity, all samples were designed to have approximately 85% porosity. Porosity was calculated as the ratio of the number of permeable voxels to the number of all voxels.

We generated six groups of data in which each group contains 20 samples. Three of the groups contain samples made of spheres and the rest of the three contain samples of fibres. We abbreviate the sphere samples as S1, S2, S3 and the fibre samples as F1, F2, F3. S1 and F1 have 45 big disconnected objects, S2 and F2 have 100 medium-sized disconnected objects and S3 and F3 have 300 small disconnected objects.

For S1, S2 and S3, the diameters of the spheres vary between 15–26, 9–20 and 3–14 voxels, respectively. All fibres in F1, F2 and F3 are square prisms with a varying ratio of width to height between 10 and 20. The widths of fibres in F1, F2 and F3 vary between 6–8, 4–6 and 2–4 voxels, respectively. Figure 1 shows sample images from each group.

We designed and generated the images using in-house developed Matlab software. Starting with an empty image, our code iteratively places a sphere or a fibre of desired dimensions in a random fashion wherever there is no collision. The iteration stops when the necessary number of objects are placed in the image.

## 3. Characterization of microstructures using image processing

We calculated four geometrical characteristics of microstructures. These were (i) nominal opening dimension, (ii) CTC, (iii) EP and (iv) PCPs.

### 3.1. Nominal opening dimension

*d*_{nom} is a morphological feature which is also used as an estimate for the characteristic length scale [17]. *d*_{nom} is computed based on the opening operator which eliminates features smaller than the structure element. By iteratively using the opening operator with increasing sizes of spheres as structure elements, the opening size distribution is obtained [4].

Let denote the opening operator with a spherical structure element with diameter *d*. Then, the cumulative opening size distribution of image *I* is a function of *d* that is defined as
3.1where xor denotes the *exclusive or* operator. The opening size distribution is *o*(*d*) = *O’*(*d*). Using *o*(*d*), *d*_{nom} is calculated by the following expected value
3.2

### 3.2. Contour tree connectivity

The main idea of CTC is to compute the structural connectivity of a binary image by computing the connectivity of its contour tree representation. Figure 2 shows an example of a contour tree representation on a small two-dimensional image, and figure 3 shows the flowchart for extracting CTC.

The connectivity of a contour tree is calculated using algebraic graph theory. The formal expression for CTC is given as follows [8]
3.3where is the supplemented contour tree (SCT). *λ*_{1} is the algebraic connectivity of . and are the global max and min of Euclidean distance transform (EDT) image and is the rounding operator.

SCT is a concept similar to the augmented contour tree [9]. However, unlike augmented contour tree, the levels of the vertices in SCT do not necessarily exist in the EDT image. Because the denominator of equation (3.3) is the algebraic connectivity of the path from global max to global min of the EDT image, it sets the upper bound for any *λ*_{1} with the same global extrema. Therefore, 0 < CTC ≤ 1.

The formal definition of SCT and other technical details to extract CTC were explained in reference [8]. The only difference compared with reference [8] is that in this work, the pruning step is fully automated by setting the level threshold to 0.1 and volume threshold to 64. The complete flowchart used to extract CTC is shown in figure 3.

### 3.3. Euler parameter

The Euler number (*χ*) is the most commonly used connectivity feature in the literature. *χ* is calculated using graph counts as follows [18]
3.4where *v* is the number of vertices, *e* is the number of edges, *f* is the number of faces and *s* is the number of solids calculated in three-dimensional with six-connectivity.

Note that, for each group of test samples, permeable regions have a constant Euler number. For example, permeable regions of all the test samples in group S1 and F1 have *χ* = 46, because there is a single connected permeable region with 45 holes. In order to include information regarding the local variations in structures, the Euler number is generally calculated on excursion sets obtained by thresholding scalar fields [4,19,20]. The scalar fields for our test samples are obtained using EDT. A serious of thresholded images along with corresponding *χ*-values are shown in figure 4 for a sample from group S1.

By repeating the experiments for 400 threshold levels that are linearly separated between the minimum and maximum values of each scalar field, functions for *χ* were obtained. An example is shown in figure 5*a* that is applied on the sample shown in figure 4. As the EP, we used the following simple expression where *N* = 400.

### 3.4. Parameter for connected pairs

The proportion of connected pairs (*Γ*) is a commonly used feature to characterize structural connectivity as explained in §1. However, for our test images, because all permeable regions are made of a single connected component, *Γ* = 1 for all samples. In order to include the effects of local variability, we apply the same approach done in the computation of EP. Similar to *χ*, this iterative way of computing *Γ* was also done earlier, for example recently in reference [3]. We calculate *Γ* using the same excursion sets and thresholds used for the EP, as shown in the example of figure 4. The resulting *Γ* profile is shown in figure 5*b*. As the parameter of connected pairs, we used the following expression where *N* = 400.

We calculate *Γ* values using a short Matlab script which is based on the regionprops command. This command calculates the number of permeable cells in each connected component when called with the area parameter.

## 4. Estimation of permeability using direct pore scale modelling

Direct pore scale modelling or pore-level numerical simulation (DPLS) has become a popular method to estimate permeability during the past few decades owing to its speed and low cost compared with conducting experiments [21]. The common approach used in DPLS is based on the set-up shown in figure 6 and Darcy's law, which states that
4.1here ∇ is the *del* operator, *p* is the pressure (N m^{−2}), *μ* is the dynamic viscosity (kg m^{−1} s^{−1}) of the fluid, *K* is the permeability (m^{2}) of the porous media and *u*_{D} is the Darcian velocity (m s^{−1}) of the fluid.

The main goal of the numerical simulation is to calculate the pressure distribution inside the porous medium for a given velocity. Therefore, the model requires an input *u*_{D}, in addition to the fluid parameters such as *μ*. The value for *u*_{D} is determined based on an assumption on the Reynold's number (*Re*). Because equation (4.1) is valid only for low fluid velocities, turbulence is neglected and Reynold's number is assumed to be less than 2. Followed by this assumption, *u*_{D} can be estimated from the modified expression for the Reynolds number:
4.2where *ρ* is the density of the fluid (kg m^{−3}), *ε* is the porosity and *d*_{nom} is the nominal opening dimension (*m*) of the medium.

Lastly, by combining equations (4.1) and (4.2), *K* can be expressed as
4.3here the only unknown is the pressure gradient, which can be estimated by a Navier–Stokes solver. The governing equations are
4.4where **v** is the velocity vector. Equation (4.4) couples velocity and pressure and can be solved by the semi-implicit method for pressure linked equation algorithm (SIMPLE), which is a popular approach applied to obtain a steady-state solution for incompressible fluid flow [22].

As a Newtonian, incompressible fluid, we use water in the models. Specifically, we use the fluid properties at 20°C with a density of *ρ* = 998.2 kg m^{−}^{3} and a dynamic viscosity of *μ* = 0.001 kg m^{−1} s^{−1} [23]. In the models, water is flowed through a rectangular pipe of dimensions 1 × 1 × 3 mm. This set-up is shown in figure 4. The meshing of the geometry is done using the enGrid software [24]. As boundary conditions, 0 velocity on the sides of the tube and 0 pressure on the outlet is used. For the walls of the porous medium, no slip condition is assumed (figure 6). Turbulence is neglected and a laminar flow with *Re* = 0.12 is modelled. The initial condition for *u*_{D} at the inlet is calculated using equation (4.2). Time step for the simulation is determined by setting the Courant number to 0.1. OpenFoam and SIMPLE algorithms are used for the solution of the pressure distribution.

## 5. Results

Here, starting with example contour trees and CTC values, all the values calculated for the parameters are given. In the last part of the results, we check the correlation between the parameters and perform a classification test to identify which of the connectivity features are useful to discriminate microstructures.

### 5.1. Contour trees and contour tree connectivity values

Figure 7 shows example contour trees and the corresponding CTC values for each of the six microstructure types shown in figure 1.

Samples with a fewer objects (S1 and F1) have fewer vertices in their contour trees owing to a smaller number of local maxima and minima in their EDT image (not shown). With the increase in the number of objects, the number of local maxima and minima also increase. As a result, samples with more objects contain more vertices in their contour trees (S3 and F3). Increasing the number of pendant vertices decreases algebraic connectivity. Therefore, in general, increasing the number of objects in a random manner decreases CTC.

### 5.2. Permeability, nominal opening dimension, contour tree connectivity, Euler parameter, parameter for connected pair

Figures 8 and 9 show the results of the FE analysis. Figure 8*a* shows an example of the velocity distribution through the material, and the resulting pressure distribution is shown in figure 8*b*. A negative pressure gradient between 0.5 and 1.5 mm is shown in figure 8*c* where the material is placed. This trend is observed for all the samples. By using equation (4.3) and the obtained pressure gradients, permeabilities are estimated. Example values are shown in figure 8*c*.

The calculated permeability, *d*_{nom}, CTC, EP and PCP values are plotted for each test group in figure 9. Table 1 shows the means and variances of the parameters for each group.

From table 1, it is observed that among the test groups, S1 has the highest mean permeability and is followed by S2 and F1. The lowest mean permeability value belongs to group F3. F2 and S3 also have low permeability values. A similar trend is observed with *d*_{nom} and EP (inversely). However, CTC and PCP do not show this trend.

### 5.3. Correlation, clustering analysis and classification results

Figure 10 shows scatter plots between permeability, *d*_{nom} and CTC, EP, PCP. Table 2 shows the values for the coefficients of determination (*R*^{2}). The first column shows the *R*^{2}-values calculated using all the test samples, and the rest of the columns show values corresponding to each group separately.

From figure 10*a–d* and table 2, it is observed that *d*_{nom} has the highest linear correlation with permeability (*R*^{2} = 0.91). CTC, EP and PCP follows *d*_{nom} with *R*^{2}-values of 0.78, 0.7 and 0.45, respectively.

In order to quantify how well the groups are separated in figure 10, we calculated the distances between the groups using the Bhattacharyya distance. The Bhattacharyya distance measures the overlap of two statistical populations as explained in appendix A. A larger Bhattacharyya distance indicates better separation of groups. Figure 11 shows the logarithm of the distances between groups. It is observed that although *d*_{nom} is the best predictor for permeability, *d*_{nom}–permeability combination is not able to separate S2 from F1 and S3 from F2. These groups are also not discriminated well with EP–permeability and PCP–permeability combinations. CTC–permeability combination gives the largest distance between the similar groups. Using *d*_{nom} instead of permeability as the second feature decreases almost all the distances between groups. However, CTC–*d*_{nom} combination still provides a good contrast between groups. EP–*d*_{nom} combination is better at discriminating S3 and F2 compared with EP–permeability, however, S2 and F1 are still not separated well. PCP–*d*_{nom} combination does not provide useful information to separate different groups.

Lastly, we performed a classification test using the parameter combinations shown in figure 10. We used a supervised classification approach based on linear discriminant analysis (LDA). Half of the samples from each group were used to train a classifier, and the rest of the samples were used for the tests. We repeated the experiments 100 times with different training and test groups. Figure 12 shows the average results for the classification tests and accuracies using each parameter combination. Rows in figure 12 show the true classes and columns show the predicted ones.

## 6. Discussions

The main goal of our study is to introduce CTC as an alternative quantitative measure for connectivity that can be used to relate structural properties with permeability. By restricting our designs to certain numbers of non-overlapping objects and porosity, we aimed to establish an intuition towards understanding CTC. In general, the use of complicated structures as in references [25,26] indeed creates a more realistic set-up. With such an arrangement however, it is difficult to explain why and how the value of the CTC varies within and among groups which is the goal of our work. CTC is mainly designed to complement the missing geometrical information of the Euler number. By using simpler structural designs, we successfully introduced local variations within groups that alter the permeability while keeping the Euler number (*χ*) and the proportion of connected pairs (*Γ*) the same and the porosity almost constant.

Structural properties, especially porosity and connectivity, have been previously reported to be well correlated with permeability [27]. Because we aimed to study the effect of connectivity on flow that is independent of porosity, we designed six different types of microstructures with the same porosities (approx. 85%). To the best of our knowledge, our work is the first in which connectivity is analysed for different microstructures with similar porosities. We aimed to work with a statistically meaningful amount of data, and therefore we used 20 different samples for each group, 120 samples in total.

An AMD Phenom II X4 955 processor with 12 GB system memory was used to test the samples in this study. The average computation times for the nominal opening dimension (*d*_{nom}), CTC, the EP and the PCPs were approximately 27 min, approximately 6 min, approximately 21 s and approximately 39 s per sample, respectively. For CTC, the computation speed can be increased significantly and can be done in under a minute using optimized C++ code. Finite-element (FE) modelling, on the other hand, comprises several steps, some of which have to be done manually. The total time spent for the preparations, running and post processing took approximately 6 h per sample for the FE analysis, which is significantly longer than the time used by any image processing method.

The Euler number has been the most commonly used connectivity measure in the literature and has been reported to be closely linked to flow through porous media [19,28]. Despite being an elegant topological invariant, the Euler number has two major limitations when quantifying connectivity: (i) the distances between the connected components do not affect *χ*, and (ii) the holes in the structures change *χ* but cavities do not. Owing to these limitations, *χ* is constant for all the samples in each test group, and there is no variability within the groups. Similarly, there is no variance in the proportion of connected pairs, because for all the test samples there is only one connected component for the permeable region and *Γ* = 1. It is worth mentioning that the results from percolation theory and the Euler number were earlier found to yield similar information [29].

Because the Euler number and the proportion of connected pairs fail to explain the variation in permeability as static and scalar parameters, we computed *χ* and *Γ* iteratively on excursion sets. The resultant curves shown in figure 5 for *χ* and *Γ* were used earlier in references [4,19,20]. However, CTC and *d*_{nom} are scalars, and therefore in order to make comparisons, we used the average values of *χ* and *Γ* as the EP and the PCPs, respectively. Because we could not find any study that shows how the sign of *χ* relates to flow and permeability, we used the absolute value for computing the EP. Another approach could have been to use the excursion sets and the curves obtained for the impervious region together with the permeable region as proposed in reference [3]. Although it is an important research question whether or not it is possible to extract better measures from these curves, this question is beyond the scope of this study.

The value of CTC varies owing to several factors that affect geometry and topology such as the number of connected components, their proximity to each other and the cavities or holes they have. The example contour trees shown in figure 7 demonstrate that, in general, compact graphs give larger CTC values and details decrease CTC. Although the information CTC provides is important in quantifying structural properties, we are also interested in correlating this information with permeability.

Table 2 and the scatter plots show the correlation between parameters. *d*_{nom} is the most correlated parameter with permeability (*R*^{2} = 0.91). The scatter plots in figure 10*e*–*g* show the relationship between *d*_{nom}, CTC, the EP and the PCP. This demonstrates the case where costly FE modelling or experiments are not possible, and permeability values are not known. Because *d*_{nom} is highly correlated with permeability, we investigated its relation with other parameters. The relationships between CTC, the EP and the PCP are not shown because they yielded very little information about flow and microstructures.

In order to visually explain how each parameter separates different groups, we plotted the Bhattacharyya distances in figure 11. It is observed that although *d*_{nom} is the most correlated parameter with permeability, it does not provide as useful information as CTC in terms of separating structural differences. For example, S3 and F2 have similar permeability and *d*_{nom} but with different structures. On the other hand, CTC is both correlated with permeability (*R*^{2} = 0.78) and can separate differences in structures. Both the EP and the PCP have lower correlations with permeability. Additionally, they both fail to separate the structural differences within groups.

We used the Bhattacharyya distance, because it is commonly used as a class separability measure for feature selection [30–32]. In order to further verify that CTC information is useful in characterizing microstructures, we performed a different experiment that was not based on the Bhattacharyya distance. In this second experiment, a classification test was carried out using LDA that is a standard classification method [33]. The results of the classification tests show that structural properties are best separated using the CTC–permeability combination with an accuracy of 96.1%. Although this is not significantly larger than the EP–permeability combination (94.2% accuracy), the gap between CTC–*d*_{nom} and EP–*d*_{nom} becomes 9.4% which is the case where costly experimentation or modelling results are not available. It is worth mentioning that the results gained by using the Bhattacharyya distance and classification tests were consistent, and both showed that the feature combinations that use the EP are not as good as the ones that use CTC to separate S2 from F1 and S3 from F2.

## 7. Conclusion

CTC is a new structural measure that quantifies connectivity with a single real number that is between 0 and 1. In this work, we characterized different microstructures with CTC and showed that the connectivity information provided by CTC is different from the commonly used measures based on the Euler number and the proportion of connected pairs.

The results of our analysis show that although mean opening dimension (*d*_{nom}) is highly correlated with permeability, it is not sufficient in discriminating the structural differences. The differences between groups are best explained with a feature combination that includes CTC. The Euler number-based parameter (EP) performed close to CTC where permeability information was available. Whereas in the case where permeability is not known, CTC performed significantly better compared with the EP and the PCPs.

Compared with experimental methods to estimate permeability or FE modelling, image processing approaches can be applied fully automatically and in very short times with little cost. With this work, we have shown that CTC can be used as a structural connectivity measure with some advantages over other available approaches. With its new connectivity interpretation, CTC is a promising connectivity measure that can be used to characterize microstructures, analyse materials and aid the design of new structures.

## Funding statement

This work is supported by the Jenny and Antti Wihuri Foundation.

## Acknowledgements

This work is conducted in the Department of Electronics and Communications Engineering, Tampere University of Technology, Tampere, Finland.

## Appendix A. The Bhattacharyya distance

The Bhattacharyya distance calculates the similarity of classes and has been used as a measure for feature selection in pattern recognition to separate classes from each other [32]. For two multivariate normal distributions, the Bhattacharyya distance is defined as follows [34]
A1where *μ_{i}* and

**∑**

*are the means and covariances of the*

_{i}*i*

^{th}distribution and . The normality of the parameters was verified using the Lilliefors test [35].

- Received November 11, 2013.
- Accepted March 5, 2014.

- © 2014 The Author(s) Published by the Royal Society. All rights reserved.