## Abstract

We present a computational framework for model inversion based on multi-fidelity information fusion and Bayesian optimization. The proposed methodology targets the accurate construction of response surfaces in parameter space, and the efficient pursuit to identify global optima while keeping the number of expensive function evaluations at a minimum. We train families of correlated surrogates on available data using Gaussian processes and auto-regressive stochastic schemes, and exploit the resulting predictive posterior distributions within a Bayesian optimization setting. This enables a smart adaptive sampling procedure that uses the predictive posterior variance to balance the exploration versus exploitation trade-off, and is a key enabler for practical computations under limited budgets. The effectiveness of the proposed framework is tested on three parameter estimation problems. The first two involve the calibration of outflow boundary conditions of blood flow simulations in arterial bifurcations using multi-fidelity realizations of one- and three-dimensional models, whereas the last one aims to identify the forcing term that generated a particular solution to an elliptic partial differential equation.

## 1. Introduction

Inverse problems are ubiquitous in science. Being inherently ill-posed, they require solution paths that often challenge the limits of our understanding, as reflected by our modelling and computing capabilities. Unlike forward problems, in inverse problems we have to numerically solve the principal equations (i.e. the forward problem) multiple times, often hundreds of times. The complexity in repeatedly solving the forward problem is further amplified in the presence of nonlinearity, high-dimensional parameter spaces and massive datasets; all common features in realistic physical and biological systems. The natural setting for model inversion finds itself within the principles of Bayesian statistics, which provides a formal ground for parameter estimation, i.e. the process of passing from prior belief to a posterior predictive distribution in view of data. Here, we leverage recent advances in machine learning algorithms for high-dimensional input spaces and big data [1] to design a framework for parameter estimation in blood flow models of the human circulation. The proposed paradigm aims at seamlessly using all available information sources, e.g. experimental measurements, computer simulations, empirical laws, etc., through a general multi-fidelity information fusion methodology in which surrogate models are trained on available data, enabling one to explore the interplay between all such sources in a systematic manner.

In general, we model the response of a system as a function of *d* input parameters . The goal of model inversion is to identify the parametric configuration in **x** that matches a target response . This translates into solving the following optimization problem:
1.1in some suitable norm. In practice, **x** is often a high-dimensional vector and *g* is a complex, nonlinear and expensive to compute map that represents the system's evolving dynamics. These factors render the solution of the optimization problem very challenging and motivate the use of surrogate models as a remedy for obtaining inexpensive samples of *g* at unobserved locations. To this end, a surrogate model acts as an intermediate agent that is trained on available realizations of *g*, and then is able to perform accurate predictions for the response at a new set of inputs. Ever since the seminal work of Sacks *et al.* [2], the use of surrogates in the design and analysis of computer experiments has undergone great growth, establishing a data-driven mindset for design, optimization and, recently, uncertainty quantification problems [1,3]. Of particular importance to our work is the approach of Kennedy & O'Hagan [4], who introduced the use of stochastic auto-regressive maps for building surrogates from a multitude of information sources of variable fidelity. We reckon that this framework enables the meaningful integration of seemingly disparate methodologies and provides a universal platform in which experiments, simulations and expert opinion can coexist in unison. The main challenges here arise from scaling this surrogate-based approach to constructing response surfaces in high-dimensional input spaces and performing tractable machine learning on massive datasets. Here, we highlight the recent work in [1] that aims to address these challenges and enable the construction of multi-fidelity surrogates for realistic high-dimensional cases.

*In silico* modelling of blood flow in the human vasculature has received great attention over the last 20 years, resulting in the development of computational tools that helped elucidate key biophysical aspects but also aim to provide a customized, patient-specific tool for prediction, intervention and treatment. Despite great growth in computing power and algorithmic sophistication, the applicability of such models is limited to truncated arterial domains, as the complexity introduced from considering circulation in the entire arterial network remains intractable [5]. Addressing this complexity often leads to the introduction of a series of assumptions, parameters and simplified models. Consequently, the physiological relevance of our computations directly relies on the calibration of such parameters and models, which, due to the inherent difficulty of them being determined in the clinical setting, remains very empirical, as the resulting haemodynamic problem could admit an infinite number of solutions [6]. To this end, recent efforts for developing model inversion techniques in haemodynamics have been proposed in [7–11]. A common theme among these efforts is the utilization of reduced order models that can be sampled extensively and with low computational cost, returning a response that enables a computationally tractable solution to the optimization problem of equation (1.1). Among possible reduced order model candidates, the most widely used are nonlinear one-dimensional (1D) fluid–structure interaction (FSI) models, linearized 1D-FSI models and zero-dimensional (0D) lumped parameter models [5].

The goal of this work is to incorporate elements of statistical learning towards building a surrogate-based framework for solving inverse problems in haemodynamics, and beyond. Motivated by recent progress in scalable supervised learning algorithms [1], we propose an information fusion framework that can explore cross-correlations between variable fidelity blood flow models (e.g. measurements versus three-dimensional (3D)-FSI, versus 1D-FSI, versus 0D models, etc.), allowing for the efficient construction of high-dimensional response surfaces that guide the pursuit for a solution to the optimization problem of equation (1.1), while keeping the number of expensive function evaluations at a minimum. Moreover, we aim to demonstrate that this framework is robust with respect to model misspecification, resulting for example from inaccurate low-fidelity models or noisy measurements. Although our main focus here is on parameter estimation for physiologically correct blood flow simulations, the implications of the proposed methodology are far reaching, and practically applicable to a wide class of inverse problems.

In what follows we provide an outline of this paper. In §2.1, we present an overview of Gaussian process (GP) regression that is the basic building block of the proposed multi-fidelity information fusion algorithms. In §2.2, we elaborate on the implementation of efficient multi-source inference schemes using the recursive inference ideas of Le Gratiet & Garnier [12]. In §2.3, we outline how the parameters defining each surrogate model can be calibrated to the data using maximum-likelihood estimation (MLE). In §2.4, we introduce the efficient global optimization (EGO) algorithm, and highlight how maximizing the expected improvement of the surrogate predictor can lead to efficient sampling strategies for global optimization. The framework is put to the test in §3, where we present results for benchmark problems involving the calibration of outflow boundary conditions in blood flow simulations, as well as parameter estimation in elliptic partial differential equations.

## 2. Material and methods

The basic building block of the proposed framework is GP regression [13] and auto-regressive stochastic schemes [4]. This toolset enables the construction of a flexible platform for multi-fidelity information fusion that can explore cross-correlations between a collection of surrogate models, and perform predictive inference for quantities of interest that is robust with respect to model misspecification. The main advantage of GPs versus other surrogate models, such as neural networks, radial basis functions or polynomial chaos, lies in its Bayesian non-parametric nature. In particular, it results in a flexible prior over functions, and, more importantly, in a fully probabilistic workflow that returns robust posterior variance estimates that are key to Bayesian optimization. Although there have been attempts in the literature to apply Bayesian optimization using Bayesian neural networks (mainly motivated by alleviating the unfavourable cubic scaling of GPs with data, see [14]), GPs provide several favourable properties, such as analytical tractability, robust variance estimates and the natural extension to the multi-fidelity setting, that currently give it the edge over competing approaches to Bayesian optimization. In what follows we provide an overview of the mathematical framework that is used throughout this work. For a more detailed exposition to the subject, the reader is referred to [1,3,4,12,13].

### 2.1. Gaussian process regression

The main idea here is to model *N* scattered observations *y* of a quantity of interest as a realization of a GP, , . The observations could be deterministic or stochastic in nature and may also be corrupted by modelling errors or measurement noise , which is thereby assumed to be a zero-mean Gaussian random variable, i.e. . Therefore, we have the following observation model:
2.1

The GP prior distribution on *f* is completely characterized by its mean and covariance , where is a vector of hyper-parameters. Typically, the choice of the prior reflects our belief on the structure, regularity and other intrinsic properties of the quantity of interest. Our primary goal here is to compute the conditional posterior distribution of and explore its predictive capacity at estimating *y* for unobserved sets of **x**. This is achieved by calibrating the parametrization of the Gaussian prior, i.e. and , by maximizing the marginal likelihood of the model. Once has been trained on the observed data *y* (see §2.3), its inferred mean , variance , noise variance and kernel hyper-parameters are known and can be used to evaluate the predictions , as well as to quantify the prediction variance (see [15] for a derivation), i.e.
2.2and
2.3where is the correlation matrix of , is a vector containing the correlation between the prediction and the *N* training points, and is a vector of ones. Note that for the predictor exactly interpolates the training data *y*, returning zero variance at these locations. This is a linear regression scheme that is also known in the literature as Wiener–Kolmogorov filtering (time-series modelling [16]), kriging (geostatistics [17]), and best linear unbiased estimator (statistics [18]).

### 2.2. Multi-fidelity modelling via recursive Gaussian processes

Here, we provide a brief overview of the recursive formulations recently put forth by Le Gratiet & Garnier [12]—a more efficient version of the well-known auto-regressive inference scheme proposed by Kennedy & O'Hagan [4] in the context of predicting the output from a complex computer code when fast approximations are available. To this end, suppose that we have *s* levels of variable fidelity information sources producing outputs , at locations , sorted by increasing order of fidelity and modelled by GPs , . Then, the auto-regressive scheme of Kennedy & O'Hagan [4] reads as
2.4where is a Gaussian field independent of and distributed as . Also, is a scaling factor that quantifies the correlation between . In the Bayesian setting, is treated as a random field with an assigned prior distribution that is later calibrated to the data. Here, to simplify the presentation we assume that is a deterministic scalar, independent of , and learned from the data through MLE (see §2.3). In general, one can obtain a more expressive scheme by introducing a spatial dependence on by using a representation in terms of a basis, i.e. , where the *M* unknown modal amplitudes *w* are augmented to the list of parameters that need to be estimated through MLE. For cases that exhibit a smooth distribution of the cross-correlation scaling, only a small number of modal components should suffice for capturing the spatial variability in , thus keeping the computational cost of optimization to tractable levels.

The key idea put forth by Le Gratiet & Garnier [12] is to replace the Gaussian field in equation (2.4) with a Gaussian field that is conditioned on all known models up to level , while assuming that the corresponding experimental design sets , have a nested structure, i.e. . This essentially allows one to decouple the *s*-level auto-regressive co-kriging problem to *s* independent GP estimation problems that can be efficiently computed and are guaranteed to return a predictive mean and variance that are identical to the coupled Kennedy and O'Hagan scheme [4]. To underline the advantages of this approach, note that the scheme of Kennedy and O'Hagan requires inversion of covariance matrices of size , where is the number of observed training points at level *t*. In contrast, the recursive co-kriging approach involves the inversion of *s* covariance matrices of size , .

Once has been trained on the observed data (see §2.3), the optimal set of hyper-parameters is known and can be used to evaluate the predictions , as well as to quantify the prediction variance at all points in (see [12] for a derivation),
2.5and
2.6where is the correlation matrix of , is a vector containing the correlation between the prediction and the training points, and is a vector of ones. Note that for the above scheme reduces to standard GP regression as presented in equations (2.2) and (2.3). Also, is the auto-correlation kernel that quantifies spatial correlations at level *t*.

### 2.3. Maximum-likelihood estimation

Estimating the hyper-parameters requires learning the optimal set of from all known observations at each inference level *t*. In what follows we will confine the presentation to MLE procedures for the sake of clarity. However, in the general Bayesian setting all hyper-parameters are assigned with prior distributions, and inference is performed via more costly marginalization techniques, typically using Markov chain Monte Carlo (MCMC) integration [13,19,20]. Indeed, a fully Bayesian approach to hyper-parameter estimation has been documented to be more robust (hence advantageous) in the context of Bayesian optimization [21]. Although great progress has been made in training GPs using sampling-based techniques, in this work we employed MLE mainly due to its straightforward formulation and reduced computational complexity versus MCMC approaches, as well as to simplify the introduction of the proposed concepts to a wider audience that can readily implement the methods.

Parameter estimation via MLE at each inference level *t* is achieved by minimizing the negative log-likelihood of the model evidence,
2.7where we have highlighted the dependence of the correlation matrix on the hyper-parameters . Setting the derivatives of this expression to zero with respect to and , we can express the optimal values of and as functions of the correlation matrix ,
2.8and
2.9where and Finally, the optimal can be estimated by minimizing the concentrated restricted log-likelihood
2.10

The computational cost of calibrating model hyper-parameters through MLE is dominated by the inversion of correlation matrices at each iteration of the minimization procedure in equation (2.10). The inversion is typically performed using the Cholesky decomposition that scales as , leading to a severe bottleneck in the presence of moderately big datasets. Alternatively, one may employ iterative methods for solving the linear system and reduce the cubic cost (especially if a good pre-conditioner can be found), but in this case another method for computing the determinant in equation (2.10) should be employed. In general, alleviating the cubing scaling is a pivotal point in the success and practical applicability of GPs, and several studies in the literature have been devoted to this [1,22–25]. This limitation is further pronounced in high-dimensional problems where abundance of data is often required for performing meaningful inference. Moreover, cases where the noise variance is negligible and/or the observed data points are tightly clustered in space may introduce ill-conditioning that could jeopardize the feasibility of the inversion as well as pollute the numerical solution with errors. Also, if an anisotropic correlation kernel is assumed, then the vector of correlation lengths is *d*-dimensional, leading to an increasingly complex optimization problem (see equation (2.10)) as the dimensionality of the input variables increases. These shortcomings render the learning process intractable for high dimensions and large datasets, and suggest seeking alternative routes to parameter estimation. A possible solution path was recently proposed in [1], where the authors have employed a data-driven additive decomposition of the covariance kernel in conjunction with frequency-domain inference algorithms that bypass the need to invert large covariance matrices.

### 2.4. Bayesian optimization

Our primary goal here is to use the surrogate models generated by the recursive multi-fidelity formulation towards identifying the global optimum of the optimization problem defined in equation (1.1) by replacing the unknown and expensive to evaluate function *g* with its corresponding surrogate predictor . The probabilistic structure of the surrogate predictors enables an efficient solution path to this optimization problem by guiding a sampling strategy that balances the trade-off between exploration and exploitation, i.e. the global search to reduce uncertainty versus the local search in regions where the global optimum is likely to reside. One of the most widely used sampling strategies in Bayesian optimization that adopts this mindset is the EGO algorithm proposed by Jones *et al.* [26]. Given a predictive GP posterior, the EGO algorithm selects points in the input space that maximize the *expected improvement* of the predictor. To this end, let be the global minimum of the observed response at the *t*th inference level. Consequently, the improvement of the Gaussian predictor upon is defined as . Then, the infill criterion suggested by the EGO algorithm implies sampling at locations in the input space that maximize the expected improvement
2.11where and are the standard normal cumulative distribution and density function, respectively, and is the square root of the predictor variance at level *t* (see equation (2.6)). Consequently, the value of is large if either the value predicted by is smaller than or there is a large amount of uncertainty in the predictor , hence is large. Here we must underline that, although the expected improvement acquisition function is widely used in the literature, it may have some important pitfalls, mainly due to its vulnerability to noise in the observations, especially in the heteroscedastic case. The design of robust and efficient data acquisition techniques is an active area of research and several interesting studies exist in the literature. The work of Picheny *et al.* [27] addresses some of these issues, including parallel batch sampling and robustness to noise using a quantile-based expected improvement criterion. A different class of robust methods has branched out of the entropy search algorithm put forth in [28]. These approaches are good candidates in cases where the simple expected improvement acquisition fails, and can be implemented with minor overhead to the existing computational cost.

Once a recursive cycle has been completed and the final predictor and variance at level *s* are known, the expected improvement can be readily computed by equation (2.11). Then, the EGO algorithm suggests to re-train the surrogates by augmenting the design sets with a set of points that correspond to locations where the expected improvement is maximized. This procedure iterates until a stopping criterion is met. Owing to the potentially high computational cost of generating training data from sampling *g* in equation (1.1), it is common to use the maximum number of function evaluations as the stopping criterion or the convergence rate of the objective function. Another termination criterion stems from setting a target value for the expected improvement, allowing the next cycle to be carried out only if the expected improvement is above the imposed threshold. Although the theoretical analysis of such probabilistic algorithms is challenging, theoretical guarantees of convergence to the global optimum have been studied in the works of [29,30].

### 2.5. Workflow and computational cost

The EGO algorithm can be summarized in the following implementation steps:

*Step 1*: Given samples of the true response , at all fidelity levels we employ the recursive scheme outlined in §2.2 to construct a family of predictive surrogates . This step scales cubically with the number of training points using MLE. The number of required training points is generally problem dependent as it is related to the complexity of the underlying response surface we try to approximate. A rule of thumb is to start with 6*d*points at the lowest fidelity level, where*d*is the dimension of the input space, and halve this number for each subsequent higher fidelity level. This initialization is usually performed using Latin hypercube sampling and has been proven robust in our experiment so far. A reason for this is that in every iteration of the Bayesian optimization loop the acquisition function will choose a new informative sampling location. Therefore, even if the initial sampling plan is inadequate, new informative points will be added in the first couple of iterations and the issue will be resolved.*Step 2*: For each surrogate, we compute the set of input configurations that maximize the expected improvement, i.e. we solve the optimization problem 2.12This step involves probing the trained surrogate to perform predictions at new locations, each of which can be obtained in parallel and with linear cost.*Step 3*: We evaluate the response , at the new points suggested by maximizing the expected improvement at each fidelity level. Therefore, this step scales according to the computation cost of computing the given models .*Step 4*: We train again the multi-fidelity surrogates using the augmented design sets and corresponding data to obtain new predictive schemes that resolve in more detail regions of the response surface where the minimum is more likely to occur. Similarly, training the surrogates scales cubically with the data, while predictions of the trained surrogate are obtained in linear cost.*Step 5*: Perform step 2 and check if the desired termination criterion is met. This could entail checking whether a maximum number of function evaluations is reached, whether the minimum of the objective function has converged, or whether the maximum expected improvement is less than a threshold value. If the chosen criterion is not satisfied, we repeat steps 3–5.

### 2.6. Haemodynamics

There exists a wide variety of modelling approaches to study the multi-physics phenomena that characterize blood flow in the human circulation. At the continuum scale, our understanding is mainly based on clinical imaging and measurements, *in vitro* experiments, and numerical simulations. Each of these approaches complements one another as they are able to reveal different aspects of the phenomena under study at different levels of accuracy. Without loss of generality, in what follows we focus our attention on 1D and 3D formulations of blood flow in compliant and rigid arteries, respectively. In particular, we consider three different layers of fidelity as represented by (i) 3D Navier–Stokes models in rigid domains, (ii) nonlinear 1D FSI models, and (iii) linearized 1D-FSI models.

#### 2.6.1. Three-dimensional models in rigid arterial domains

We consider 3D unsteady incompressible flow in a rigid domain *Ω*, described by the Navier–Stokes equations subject to the incompressibility constraint
2.13where is the velocity vector, *p* is the pressure, *t* is time, is the kinematic viscosity of the fluid and are the external body forces. The system is discretized in space using the spectral/*hp* element method (SEM) [31], according to which the computational domain *Ω* is decomposed into a set of polymorphic non-overlapping elements . Within each element, the solution is approximated as a linear combination of hierarchical, mixed-order, semi-orthogonal Jacobi polynomial expansions. This hierarchical structure consists of separated vertex (linear term) , edge , face and interior (or bubble) modes . According to this decomposition, the polynomial representation of a field at any point is given by the linear combination of the basis functions multiplied by the corresponding modal amplitudes:

The numerical solution to the above system is based on decoupling the velocity and pressure by applying a high-order time-splitting scheme [32]. We solve for an intermediate velocity field in physical space (equation (2.14)) before a Galerkin projection is applied to obtain the weak formulation for the pressure and velocity variables (equations (2.15) and (2.16)), which are computed by solving a Poisson and a Helmholtz problem, respectively. At time step *n* and sub-iteration step *k*, we solve equations (2.15) and (2.16) for and , respectively, using previous time-step solutions , where denotes the order of the stiffly stable time integration scheme used,
2.14
2.15
2.16where we define
2.17

Also, , are the coefficients of a backward differentiation formula, are the coefficients of the stiffly stable time integration scheme, , and and are the mass and stiffness matrices, respectively. Moreover, and are the unknown modal amplitudes of the velocity and pressure variables, respectively, while variables with a *D* superscript are considered to be known. In particular, the last term in equations (2.15) and (2.16) is due to lifting a known solution for imposing Dirichlet boundary conditions, and the corresponding operators and need to be constructed only for these boundaries [31]. The system is closed with a Neumann boundary condition for the velocity at the outlets, and a consistent Neumann pressure boundary condition at the boundaries with a prescribed Dirichlet condition for the velocity field [32]:
2.18where is defined as
2.19

At the first sub-iteration step , the solution from previous sub-iterations is not available and are extrapolated in time as follows: 2.20and 2.21

After computing and we can accelerate the convergence of the iterative procedure by applying the Aitken under-relaxation scheme to the computed velocity field: 2.22where is computed using the classical Aitken rule: 2.23

Finally, convergence is checked by requiring and to be less than a pre-specified tolerance.

For a complete analysis of this scheme, the reader is referred to [32,33]. Results in the context of arterial blood flow simulations can be found in [34], while parallel performance has been documented in [35].

#### 2.6.2. Nonlinear one-dimensional models in compliant arterial domains

We consider viscous incompressible 1D flow in a compliant tube. The flow dynamics is governed by a nonlinear hyperbolic system of partial differential equations that can be directly derived from the Navier–Stokes equations under the assumptions of axial symmetry, dominance of the axial velocity component, radial displacements of the arterial wall and constant internal pressure on each cross section [5]. Although such averaging introduces approximations that neglect some of the underlying physics, this reduced model has been widely used and validated in the literature, returning reasonable predictions for pulse wave propagation problems [36]. Here, we would like to highlight that such simple models can become extremely useful when employed as low-fidelity models within the proposed multi-fidelity framework, because (despite their inaccuracies) they can capture the right trends at very low computational cost (figure 1).

The conservation of mass and momentum can be formulated in space–time variables as (see [37] for a detailed derivation) 2.24

where *x* is the axial coordinate across the vessel's length, *t* is time, is the cross-sectional area of the lumen, is average axial fluid velocity, is the mass flux, is the internal pressure averaged over the tube's cross section and is a friction parameter that depends on the velocity profile chosen [37]. Here, we use an axisymmetric fluid velocity profile that satisfies the no-slip condition,
2.25with being the lumen radius, ζ a constant and *r* the radial coordinate. Following [5], gives a good fit to experimental blood flow data and returns the parabolic flow profile. Moreover, and the friction parameter can be expressed as [37], with being the blood viscosity that is a function of the lumen radius and the blood haematocrit, based on the findings of Pries *et al.* [38].

System (2.24) can be solved for after introducing a constitutive law for the arterial wall that relates the cross-sectional area *A* to pressure *p*. For simplicity, in this work, we have adopted a purely elastic wall model, leading to a pressure–area relation of the form [39]
2.26where is the external pressure on the arterial wall, *E* is the Young modulus of the wall, *h* is the wall thickness and *v* is the Poisson ratio. More complex constitutive laws accounting for viscoelastic creep and stress relaxation phenomena can be derived based on the theory of quasi-linear viscoelasticity [40], and the reader is referred to [39] for implementation details in the context of 1D blood flow solvers.

For the numerical solution of system (2.24), we have adopted the discontinuous Galerkin scheme first put forth by Sherwin *et al*. [37]. Spatial discretization consists of dividing each arterial domain *Ω* into Nel elemental non-overlapping regions. A continuous solution within each element is retrieved as a linear combination of orthogonal Legendre polynomials, while discontinuities may appear across elemental interfaces. Global continuity is restored by flux upwinding across elemental interfaces, bifurcations and junctions, with conservation of mass, continuity of Riemann invariants and continuity of the total pressure being ensured by the solution of a Riemann problem [37]. Finally, time integration is performed explicitly using a second-order accurate Adams–Bashforth scheme [37].

#### 2.6.3. Linear one-dimensional models in compliant arterial domains

The analysis can be further simplified if one adopts a linear formulation [37,41]. This can be achieved by expressing the system of equations in equation (2.24) in terms of the variables, with , and linearizing them about the reference state , with and held constant along *x*. This yields
2.27
2.28
2.29where and are the perturbation variables for area, pressure and volume flux, respectively, i.e. , and
2.30are the viscous resistance to flow, blood inertia and wall compliance, respectively, per unit length of vessel [41]. For a purely elastic arterial wall behaviour, it follows that
2.31

Then, the corresponding linear forward and backward Riemann invariant can analytically be expressed as 2.32where is the characteristic impedance of the vessel.

## 3. Results

Here, we aim to demonstrate the effectiveness of the proposed framework through three benchmark problems. In the first two cases, consider the important aspect of calibrating outflow boundary conditions of blood flow simulations in truncated arterial domains. In the first demonstration, we employ simple nonlinear and linearized 1D-FSI models in a symmetric arterial bifurcation, aiming to provide a pedagogical example that highlights the main aspects of this work. In the second case, we target a more realistic scenario in which we seek to calibrate a 3D Navier–Stokes solver by leveraging cheap evaluations of 1D models in an asymmetric Y-shaped bifurcation topology. Finally, we conclude with a higher dimensional problem that involves inferring the forcing term that gave rise to a given solution of an elliptic partial differential equation.

### 3.1. Calibration of a nonlinear one-dimensional solver

We consider a benchmark problem for calibrating the outflow parameters in a Y-shaped symmetric bifurcation with one inlet and two outlets. The geometry resembles the characteristic size and properties of an idealized carotid bifurcation, yet is kept symmetric to enhance clarity in our presentation. Figure 2 presents a schematic of the problem set-up. In particular, the effect of the neglected downstream vasculature is taken into account through three-element Windkessel models, while the inflow is driven by a physiologically correct flow-rate wave.

Our goal is to calibrate the outflow resistance parameters to obtain a physiologically correct pressure wave at the inlet. To enable a meaningful visualization of the solution process, we confine ourselves to a two-dimensional input space defined by variability in the total resistances imposed at the two outlets, , . The total resistance is defined as , where corresponds to the characteristic impedance of each terminal vessel (see equation (2.32)). Moreover, the capacitance parameters are empirically set following the findings of Grinberg & Karniadakis [42] as , . To this end, our objective is to identify a pair of that matches a physiologically correct systolic pressure of at the inlet. Assuming no prior knowledge on possible sensible combinations in the input , we assume a large input space that spans four orders of magnitude: measured in (Pa s m^{–3}), and consider the following optimization problem:
3.1

First, we employ a nonlinear 1D-FSI model in order to construct a detailed representation of the response surface of the relative error in systolic pressure, i.e. , that will be later used as a reference solution to assess the accuracy and convergence of the proposed multi-fidelity model inversion techniques. Here, the choice of studying haemodynamics using 1D models is motivated by their ability to accurately reflect the interplay between the parametrization of the outflow and the systolic pressure at the inlet, yet at a very low computational cost [37,43]. To this end, figure 3 shows the resulting response surface obtained by probing an accurate nonlinear 1D-FSI solver on 10 000 uniformly spaced samples of the input variables. Since we have considered a symmetric geometry, the resulting response surface exhibits symmetry with respect to the diagonal plane, and can be visibly subdivided using the path of its local extrema into four different subregions. The first part is confined in a flat square region near the origin, suggesting that the solution is relatively insensitive to the choice of inputs, as any combination in the inputs within this region results in a reasonably small error in the systolic pressure. The second and third parts correspond to the symmetric rectangular regions defined by further increasing one of the two resistance parameters. There, the response starts to exhibit sensitivity to the increasing input parameter, leading to a noticeable error in the systolic pressure. Lastly, in the fourth region, the solution exhibits strong sensitivity on the inputs, leading to an explosive growth of the systolic pressure deviation as the resistance parameters take larger values. Interestingly, a closer look at the path of local extrema reveals a subtle undershoot in the response surface along the interface of the aforementioned subregions. Although any input combination within the first subregion results in a very small systolic pressure error, the global solution of the minimization problem in equation (3.1) resides right at the interface of the four subregions (figure 3). Note that this topology of the global response is likely to pose serious challenges to any gradient-based optimization strategy, leading to a potentially very large number of function evaluations until convergence, especially in the absence of a very accurate initial guess.

Now, we turn our attention to solving the optimization problem introduced by equation (3.1) using the proposed surrogate-based framework. To this end, in order to build multi-fidelity surrogates for the systolic pressure we have considered probing two models: high-fidelity solutions are obtained through the aforementioned nonlinear 1D-FSI solver, while the lower fidelity response is measured through a linearized 1D-FSI model. Here, we emphasize that the linearized model has been purposely derived around a biased reference state, returning erroneous predictions that deviate from the correct solution up to 30%. This is to demonstrate that the proposed methodology is robust with respect to misspecification in the lower fidelity observations. It is important to stress here that, despite its inaccurate predictions, the low-fidelity model can still be highly informative as long as it is statistically correlated to the high-fidelity model.

Initially, we perform 20 linearized low-fidelity simulations, supplemented by five high-fidelity realizations, all randomly sampled within the bounds that define the space of inputs. Then, based on the computed response , we train a two-level surrogate using the recursive scheme presented in §2.2 with a stationary Matérn 5/2 kernel function [13]. The Matérn family of kernels provides a flexible parametrization that has been commonly employed in the literature mainly due to its simplicity and the ability to model a wide spectrum of responses ranging from rough Ornstein–Uhlenbeck paths to infinitely differentiable functions [13]. However, one of its limitations stems from its stationary behaviour (i.e. the Matérn auto-correlation between two spatial locations is translation invariant). Accounting for non-stationary effects has been found to be particularly important in the context of Bayesian optimization, and here we address this by employing the *warping* transformation put forth by Snoek *et al.* [44], leading to kernels of the form , where *h* denotes a nonlinear warping function. From our experience, this step is more crucial than the initial choice of kernel and appears to be quite robust for either the Matérn family or any other popular alternative, such as the radial basis function, spherical and polynomial kernels [13].

The resulting response surface of the error in the systolic pressure at the inlet is presented in comparison with the reference solution in figure 4. Once the surrogate predictor and variance are available, we can compute the spatial distribution of the corresponding expected improvement using equation (2.11). This highlights regions of high expected improvement, as illustrated in figure 4, thus suggesting the optimal sampling locations for the next iteration of the EGO algorithm (see §2.4). Note that the maximum expected improvement in this zeroth iteration is already attained very close to the global minimum of the reference solution (figure 3).

The next step involves performing an additional set of low- and high-fidelity simulations at the suggested locations of high expected improvement (see §2.4). Specifically, in each iteration, we augment the training data with 20 low-fidelity and five high-fidelity observations placed in the neighbourhood of local maximizers of the expected improvement. Then, the recursive multi-fidelity predictor is re-trained to absorb this new information, and this procedure is repeated until a termination criterion is met. Here, we have chosen to exit the iteration loop once the minimum of the predicted response surface (i.e. the relative error between the computed inlet systolic pressure and a target value) is less than .

Figure 5 presents the resulting response surface of the error in the inlet systolic pressure after four iterations of the EGO algorithm. The predictor is trained on 100 low-fidelity and 25 high-fidelity realizations that clearly target resolving the region of the response surface where the global minimum is likely to occur. In figure 5, we also depict the spatial distribution of the expected improvement after the new data have been absorbed. By comparing figures 4 and 5, we observe that within four iterations of the EGO algorithm the expected improvement has been reduced by a factor greater than 5, effectively narrowing down the search for the global minimum in a small region of the response surface. In fact, four iterations of the EGO algorithm were sufficient to meet the termination criterion, and, therefore, identify the set of outflow resistance parameters that returns an inlet systolic pressure matching the target value of . Here, we underline the robustness and efficiency of the proposed scheme, as convergence was achieved using only 25 samples of the high-fidelity nonlinear 1D-FSI model, supplemented by 100 inaccurate low-fidelity samples of the linearized 1D-FSI solver. In the absence of low-fidelity observations, the same level of accuracy was achieved with nine EGO iterations and 50 high-fidelity simulations—a twofold increase in computational cost.

In table 1, we summarize the optimal predicted configuration for the total terminal resistances, along with the relative -error in the corresponding inlet pressure waveform compared with the optimal reference configuration (see in figure 3), over one cardiac cycle, and for every iteration of the EGO algorithm. The convergence of the inlet pressure waveform to the reference solution is demonstrated in figure 6. As the EGO iterations pursue a match with the target inlet systolic pressure , the pressure waveform gradually converges to the reference solution after each iteration of the optimizer.

### 3.2. Calibration of a three-dimensional Navier–Stokes solver

This benchmark is designed to highlight an important aspect of the proposed framework, namely the significant efficiency gains one can obtain by employing low-fidelity models that exhibit correlation to the target high-fidelity model. In particular, we will examine two scenarios; the first involving low- and intermediate-fidelity models that are only weakly correlated to the high-fidelity model, while in the second case we perform an educated adjustment to obtain an intermediate-fidelity model that exhibits strong correlation with the high-fidelity solver. In both cases, we adopt a similar problem set-up to the one above, namely the calibration of outflow resistance parameters of a high-fidelity 3D Navier–Stokes blood flow solver in an asymmetric Y-shaped bifurcation targeting a systolic inlet pressure of (figure 7). Here, this is done by employing a 3D spectral/hp element solver (high fidelity), a nonlinear 1D-FSI solver (intermediate fidelity) and a linearized 1D-FSI solver (low fidelity). The rigid 3D computational domain is discretized by 45 441 tetrahedral spectral elements of polynomial order 4, and the simulation of one cardiac cycle takes approximately 6 h on 16 cores of an Intel Xeon X56502.67 GHz. The nonlinear and linearized 1D-FSI solvers represent each arterial segment using one discontinuous Galerkin element of polynomial order 4, and the resolution of one cardiac cycle takes 1 min, and 15 s, respectively, on a single core of an Intel Xeon X56502.67 GHz. It is therefore evident that significant computational gains can be achieved if we can calibrate the high-fidelity solver mainly relying on sampling the low- and intermediate-fidelity models.

#### 3.2.1. Weakly correlated models

The aforementioned set-up leads to a discrepancy in the corresponding arterial wall models where a rigid wall is assumed in the 3D case in contrast to the linear elastic wall response in the 1D-FSI cases. Although all models provide a good agreement in predicting the flow-rate wave and pressure drop between the inlet and the outlets, this modelling discrepancy is primarily manifested in the predicted point-wise pressure distribution. As depicted in figure 8, the inlet pressure prediction of the 3D rigid model is up to 62% higher than the 1D-FSI models, a fact attributed to the inherently different nature of wave propagation between the two cases. This inconsistency is directly reflected in the inferred scaling factor appearing in the recursive scheme of §2.2 that quantifies how correlated our quantity of interest is (systolic inlet pressure) between the different levels of fidelity. In particular, starting from an initial nested Latin hypercube experimental design in containing 20 low-fidelity, 10 intermediate-fidelity and two high-fidelity samples, we obtain a correlation coefficient of 0.91 between the low- and intermediate-fidelity levels, and a correlation of 0.12 between the intermediate- and high-fidelity solvers. These values confirm the strong correlation between levels 1 and 2 in our recursive inference scheme (linearized and nonlinear 1D-FSI models), while also revealing the very weak correlation between levels 2 and 3 (nonlinear 1D-FSI and the rigid 3D model). Essentially, this suggests that in this case one cannot expect the low- and intermediate-fidelity models to be highly informative in the construction of the high-fidelity response surface for .

With our goal being the identification of the optimal configuration of such that the relative error between the predicted and target inlet systolic pressure is minimized, we may enter the EGO loop with the aforementioned initial nested design plan and iterate until convergence is achieved. Setting a stopping criterion of on the expected improvement of the multi-fidelity predictor resulted in six EGO iterations, where in each iteration we added 10 low-fidelity, five intermediate-fidelity and one high-fidelity observation at the locations suggested by the highest local maxima of the expected improvement at each fidelity level. Therefore, a total of 80 low-fidelity, 60 intermediate-fidelity and eight high-fidelity simulations were performed in order to achieve a relative error less than . The resulting response surface and corresponding expected improvement after the 6th iteration are presented in figure 9. Here, we must note that, despite the low cross-correlation between the high and lower fidelity levels, the multi-fidelity approach was still able to pay some dividends as, in order to achieve the same result by only sampling the high-fidelity solver, the EGO required 10 iterations.

#### 3.2.2. Strongly correlated models

In order to demonstrate the full merits of adopting a multi-fidelity approach, we can enhance the correlation between the rigid 3D model and the nonlinear 1D-FSI model by artificially stiffening the arterial wall response of the 1D model. This is done by appropriately scaling the spring constant *β* in the 1D pressure–area relation (see equation (2.26)):
3.2

Practically, we can find an appropriate scaling factor such that the 1D solver accurately reproduces the results predicted by the full 3D model. However, if the chosen value for is not close to the ‘best-fit’ value , the 1D solver under-predicts the pressure wave for and over-predicts it for . The identification of the appropriate scaling factor is case dependent. However, we can explicitly derive a formula for based on the physiological properties of the network. To this end, consider a single vessel in an arterial network and let be the total peripheral resistance downstream, and the vessel's characteristic impedance: 3.3

Then, we must have that , and, therefore, we can scale *β* and stiffen the 1D model, respecting the upper bound:
3.4

The total resistance for each segment can be computed by the properties of the network downstream and the Windkessel resistance parameters imposed at the outlets. This effectively enables the nonlinear 1D-FSI solver to reproduce the pressure wave propagation predicted by the full 3D simulation (figure 8).

In order to estimate the optimal parametrization of the outflow boundary conditions, we enter the EGO loop using an initial nested Latin hypercube sampling plan containing 70 low-fidelity, 15 intermediate-fidelity and two high-fidelity observations. This initialization confirms a relatively weak correlation of 0.3 between the elastic low-fidelity and the stiffened intermediate-fidelity 1D-FSI solvers, but confirms a strong correlation of 0.95 between the stiffened 1D-FSI solver and the high-fidelity 3D Navier–Stokes model. As a consequence, the EGO algorithm converges to the desired accuracy level after only one iteration, as the 20 samples of the intermediate-fidelity solver have been very informative in the construction of the high-fidelity predictive distribution. In figure 10, we demonstrate the resulting response surface for the relative error in systolic inlet pressure after one iteration of the EGO loop, along with the resulting map of the expected improvement criterion. Leveraging the accurate predictions of the stiffened nonlinear 1D-FSI solver we were able to achieve a 4× speed-up in calibrating the outflow parametrization compared with the case where the calibration was carried out by only evaluating the high-fidelity 3D solver.

### 3.3. Helmholtz equation in 10 dimensions

Consider a Helmholtz boundary value problem in *x* [0,1]
3.5and
3.6with forcing term weights . Now, assume that we are given a reference solution that corresponds to weights generated by a spectrum with a random draw of parameters , and (figure 11). Our goal now is to identify the exact values of that generated the solution , assuming no knowledge of the underlying spectrum. This translates to solving the following optimization problem:
3.7

Here we consider three levels of fidelity corresponding to different discretization methods for solving equation (3.5). At the lowest fidelity level, we have chosen a fast centred finite difference scheme on a uniform grid containing 15 points. The medium-fidelity solver is a spectral collocation scheme using trigonometric interpolation on a uniform grid of 30 collocation points. This results to a more accurate scheme but is computationally more expensive as it involves the solution of a linear system. Finally, the highest fidelity data are generated by solving equation (3.5) exactly using a Galerkin projection on a basis consisting of 10 sine modes. Our goal now is to construct a 10-dimensional multi-fidelity surrogate for mostly using realizations of the lower fidelity solvers supplemented by a few high-fidelity data, and solve equation (3.7) using Bayesian optimization.

Our workflow proceeds as follows. First, we create the training sets for a three-level multi-fidelity surrogate by generating weights using random sampling, and compute the corresponding solution to equation (3.5) using the three different discretization methods outlined above. In particular, we use a nested experimental design consisting of 6*K* points at the lower fidelity level, 3*K* at the intermediate-fidelity level and *K* points at the highest fidelity level. This collection of input/output pairs provides the initialization of the EGO algorithm that aims to identify the combination of weights that minimizes the objective function of equation (3.7). Then, at each EGO iteration new sampling points are appended to the training set of each fidelity level according to the expected improvement acquisition function.

Figure 12*a* shows the solution corresponding to the optimal combination of weights identified by the multi-fidelity EGO algorithm after 100 iterations versus the reference target solution . As we can see in figure 12*b*, the algorithm is able to reduce the objective function down to only after 25 iterations and identify a sensible configuration of weights using only a total of 35 evaluations of the highest fidelity model.

## 4. Discussion

We have presented a probabilistic, data-driven framework for inverse problems in haemodynamics. The framework is based on multi-fidelity information fusion and Bayesian optimization algorithms, allowing for the accurate construction of response surfaces through the meaningful integration of variable sources of information. Leveraging the properties of the proposed probabilistic inference schemes, we use the EGO algorithm [26] to perform model inversion using only a few samples of expensive high-fidelity model evaluations, supplemented with a number of cheap and potentially inaccurate low-fidelity observations. This approach enables a computationally tractable global optimization framework that is able to efficiently ‘zoom in’ to the response surface in search of the global minimum, while exhibiting robustness to misspecification in the lower fidelity models.

All benchmark cases presented were deterministic inverse problems. Our main motivation was to provide a clear and intuitive presentation of the proposed framework, underlining the conditions under which a multi-fidelity approach can result in substantial computational gains. Such gains are expected to be more profound as the dimensionality of the parameter space is increased, as in such cases a very large number of samples will be required to perform model inversion. An application of this methodology to stochastic inverse problems requires two changes in the proposed methodology. First, the quantities to be estimated are no longer crisp numbers and our goal would be to estimate a proper parametrization of their distribution. For instance, if some properties are treated as independent Gaussian random variables, then the estimation problem would target the computation of their mean and variance, while if they were treated as correlated one should also account for their covariance. In general, a proper parametrization of uncertain parameters can be accommodated at the expense of increasing the dimensionality of the optimization problem. A second change incurred by stochasticity is related to the objective function itself. While in the deterministic case we consider minimizing a quadratic loss function, in the stochastic case the minimization objective would involve minimizing the distance to the mean and/or higher order moments, or, more generally, a moment matching criterion such as the Kullback–Leibler divergence [13] between the distribution of observed outputs and a target distribution. Accounting for variability in the model parameters is indeed a problem of great interest and we plan to address this in future work.

Summarizing, the proposed framework provides a principled workflow that allows one to limit evaluations of expensive high-fidelity solvers to a minimum, and extensively use cheap low-fidelity models to explore such vast parameter spaces. Extending and applying this methodology to realistic high-dimensional inverse problems requires the challenges of constructing response surfaces in high dimensions and potentially in the presence of massive datasets to be addressed. These challenges define an area of active research, and we refer the reader to the recent ideas put forth in [1] for a tractable solution plan.

## Authors' contributions

P.P. developed and implemented the methods, performed the numerical experiments, and drafted the manuscript; G.E.K. supervised this study and revised the manuscript. Both authors gave final approval for publication.

## Competing interests

We report no competing interests linked to this study.

## Funding

We gratefully acknowledge support from DARPA grant HR0011-14-1-0060, AFOSR grant no. FA9550-12-1-0463, NIH grant no. 1U01HL116323-01 and the resources at the Argonne Leadership Computing Facility (ALCF), through the DOE INCITE programme.

- Received December 24, 2015.
- Accepted April 21, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.