## Abstract

More than 44% of building energy consumption in the USA is used for space heating and cooling, and this accounts for 20% of national CO_{2} emissions. This prompts the need to identify among the 130 million households in the USA those with the greatest energy-saving potential and the associated costs of the path to reach that goal. Whereas current solutions address this problem by analysing each building in detail, we herein reduce the dimensionality of the problem by simplifying the calculations of energy losses in buildings. We present a novel inference method that can be used via a ranking algorithm that allows us to estimate the potential energy saving for heating purposes. To that end, we only need consumption from records of gas bills integrated with a building's footprint. The method entails a statistical screening of the intricate interplay between weather, infrastructural and residents' choice variables to determine building gas consumption and potential savings at a city scale. We derive a general statistical pattern of consumption in an urban settlement, reducing it to a set of the most influential buildings' parameters that operate locally. By way of example, the implications are explored using records of a set of (*N* = 6200) buildings in Cambridge, MA, USA, which indicate that retrofitting only 16% of buildings entails a 40% reduction in gas consumption of the whole building stock. We find that the inferred heat loss rate of buildings exhibits a power-law data distribution akin to Zipf's law, which provides a means to map an optimum path for gas savings per retrofit at a city scale. These findings have implications for improving the thermal efficiency of cities' building stock, as outlined by current policy efforts seeking to reduce home heating and cooling energy consumption and lower associated greenhouse gas emissions.

## 1. Introduction

In 2012, the aggregate home energy expenditure of the 130 million dwellings in the USA [1] reached 10 Quads (a Quad is approx. 2.9 × 10^{11} kWh) [2]. This inordinate energy use stems from a diverse set of end-use activities, which includes space heating, ventilation and air conditioning (HVAC), water heating, cooking and lighting among others [3,4]. Across different climate zones, HVAC usage constitutes 59% of the northeast, 45% of the south, 59% of the midwest and 43% of the west USA building energy consumption [5,6]. This translates to a national average of 11 000 kWh spent on space conditioning per household [2], a notable portion of which is wasted due to inefficiencies [7,8]. With over 81% of the US population concentrated in urban areas [9], the state and federal governments have embraced important initiatives to reduce this waste and the associated carbon footprints of cities by providing stimulus funds to adopt energy efficiency programmes [10,11]. These programmes, however, operate with limited resources in terms of tax rebates and technical assistance, and therefore can only support a selected number of buildings per year. These limitations call for fast and accurate methods to inform smart, citywide weatherproofing plans that pinpoint buildings with the greatest saving potential to minimize associated carbon emissions.

From an analytical perspective, there are two major challenges to construct such methods. One challenge is to develop reliable methodologies and toolsets to estimate energy consumption in buildings. There has been a vast body of literature on this subject since the 1970s that focuses on combining fundamental laws of thermodynamics with convective, conductive and radiative heat transfer to provide a consistent framework for buildings' energy modelling. Readers are referred to Swan & Ugursal [12], Kavgic *et al*. [13] and Zhao & Magoules [14], and references therein, for a comprehensive review of various complex dynamic modelling techniques. Another important challenge is concerned with developing a quantitative framework for assessment of energy-saving potentials that supports informed decisions on energy-saving policies relevant to retrofitting at an urban scale. Such a framework requires a scalable and sufficiently representative static description of a building's energy, i.e. gas and electricity, consumption that avoids redundant details while carrying enough physical information at a building level to inform weatherproofing options. This paper is motivated by the latter challenge and particularly focuses on gas consumption for space heating purposes in cold climates. Current approaches to evaluate retrofits use rating or audit tools to help energy-saving investments. Bardhan *et al*. [15] present an updated review of current practices and methods. The level of complexity and the required information vary significantly from one method to the other. The general results to inform policy are presented in terms of scores, characterizing the relative efficiency of a house in the region, or the recommendation of actions and the estimate of their potential savings. Current tools are often best suited for building-wise assessment. Upscaling the results to the community level relies on considering a ‘typical’ or ‘average’ house as a building block. For instance, on the Home Energy Saver (HES) website designed by the Lawrence Berkeley National Laboratory, Berkeley, CA (http://homeenergysaver.lbl.gov/consumer/), cities are analysed at the zip code level within a sizeable resident population from each urban area. These models are used to estimate the annual energy consumption of standardized houses in cities, to provide upgrade recommendations and to perform cost–benefit analyses of each specific retrofit. Although these tools show promise in helping energy-saving investment by filling the informational gap to a great extent, there are limitations in their accuracy and scalability when used to inform retrofittability at a city scale. These limitations deter their application as an effective and robust decision support methodology at the urban scale. We herein propose a method to simplify the estimates and to relate gas consumption from relevant information at a building level to the size of national sustainability goals. We perform an analysis of variance (ANOVA) by combining data from gas bills, buildings' footprints and physical simulations to avoid statistical bias introduced by ‘typical houses'. We use this information to construct a simple yet efficient physics-based description of heating energy demand based on a model reduction scheme that encloses the most relevant parameters for the observed consumption. The model provides a means to identify buildings with the greatest potential for improvement, and quantifies the aggregated gas savings.

Gas demand patterns of residential, commercial and industrial sectors are driven by human activities, public perception and decisions, and physical constraints. Prevailing urban energy consumption models use micro-simulations to predict future usage by emulating the behaviour of urban dwellers (agents) and converting their decisions to respective demands [16,17]. Thus, a building's gas usage becomes a function of the choice of technology (e.g. insulation conditions, or efficiency of the heating system), its utilization (e.g. choice of the internal temperature set point) and additional extrinsic variables such as weather and neighbourhood patterns [18]. Retrofitting a building's thermal efficiency primarily targets the choice of technology, namely insulation to enhance the thermal resistance of its envelope (walls, windows, doors, roof and floors) and to reduce losses due to heat conduction, *Q*_{cond}, and infiltration, *Q*_{inf}. Thus, in order to quantify the potential gas savings of the retrofit of actual dwellings at the city scale, we study the diverse set of variables affecting consumption to derive a general statistical pattern of consumption in the studied urban settlement. Our goal is to identify, based on data analysis, the most influential set of physical parameters that operate locally [19,20]. We focus our solution on gas consumption for heating purposes and not on other means of consumption.

## 2. Results and discussion

The first step in this methodology is to understand the sensitivity of gas consumption response of a city's building stock to changes in external conditions, here being temperature. To this end, we combine buildings' footprints and associated gas consumptions with weather records for the same geographical location and time periods. We use actual monthly gas metre readings recorded by a utility company, *E*, per parcel (single- or multiple-family housing unit; this was the highest data resolution available) in kilowatt-hours (kWh) across the entire city. Herein, we use a 3-year-long record (2007–2009) collected for billing purposes, capturing the consumption pattern of almost 6200 individual residential buildings in Cambridge, MA, USA. After matching this record with the buildings' footprints from a geographic information system (GIS) dataset, we anonymized sources by removing addresses, in accordance with our non-disclosure agreement (NDA) terms. We match these data with the mean monthly temperature calculated via averaging the hourly temperature records from the closest weather station, which is located at Logan International Airport [21]. While the heat island phenomenon is certainly important in proper estimation of outdoor temperature in urban areas, we neglect its effect as it is out of the scope of the present study. Our data assimilation indicates that the gas consumption exhibits a characteristic piecewise linear form with respect to the outdoor temperature (figure 1*a*), separated by a cut-off temperature, *T*_{0}. The gas consumption increases linearly below this outdoor cut-off temperature, and it does not vary significantly for higher temperatures; thus it defines a temperature-insensitive baseline gas consumption (*E*_{0}), which is most likely due to hot water production. We identify this cut-off temperature as the temperature below which consumers turn on their homes' heating systems to maintain indoor spaces at a desired comfort temperature, *T*_{comf}. In other words, the outdoor cut-off temperature is representative of individual choices of set point temperature. The probability density function of *T*_{0} for the analysed building sample (figure 1*b*) sheds some light on these choices, in the form of three major peaks at 13°C, 15°C and 17°C (with 1°C s.d.), elucidating the core of the distribution (93% of the overall distribution, while the remaining 7% have medians at 4.7°C, 8.5°C and 23°C). These empirical findings give useful information to account for the distribution of the comfort level in buildings' energy simulations at the city scale.

For gas consumption, each building has a constant rate in the increase of gas consumption below the offset temperature, suggesting a linear form between the heating energy and outside temperature in excess of the baseline gas consumption (*E*_{0}), in the form:
2.1where *S* is the building's envelope surface area and *E* is the heating energy necessary to maintain an inside temperature of *T*_{comf} when the outside temperature *T*_{out} is below *T*_{0} during the time interval of exposure, *t*, which corresponds to the numbers of hours between two consecutive energy readings (*E*) by the utility company. Moreover, the linearity between the temperature difference *T*_{0} − *T*_{out} and the heating energy defines a linear coefficient, *R*_{eff} (in m^{2}K W^{–1}), that can be viewed as an effective thermal resistance representative of the thermal efficiency of a building. Unlike the effective heat loss rate in the degree-day approaches [22–24], we expect *R*_{eff} to depend only on the physical attributes of a building's envelope; namely, heat transport, radiation and infiltration properties. For the sample of 6200 homes in Cambridge, MA, USA, the effective thermal resistance that is obtained by a linear fitting of the energy readings according to equation (2.1) is found to follow a lognormal distribution (figure 1*c*). This lognormal distribution stems from the multiplicative random processes influencing the effective thermal resistance. The fact that this distribution is uncorrelated with *T*_{0} (figure 1*d*) is potentially related to the so-called rebound [25,26] or takeback effect [27], i.e. occupants' tendency to forgo taking advantage of high *R*_{eff} by increasing the indoor temperature. Also, this uncorrelated observation can possibly stem from the fact that the energy costs might have been included in dwellings' monthly rents, hence directly affecting the tenants' choice in adopting conservative behaviour. There are many other facets to occupants' choice, especially with regards to gas consumption. To infer these aspects along with HVAC performance metrics and their dependence on external conditions (temperature, humidity, etc.), we require high-resolution hourly consumption data that are currently only available at the building level [28–31].

At this stage, to identify individual buildings with the highest retrofitting potential from billing information alone, it suffices to link *R*_{eff} with the physical parameters that affect buildings' gas consumption in their specific environment. To establish this link, we resort to buildings' energy consumption modelling (see Material and methods section). We create a probabilistic gas consumption model of a block of nine buildings that interact with each other via shadowing and thermal interactions when they are physically in contact. This approximation neglects the shadowing effect of far-field tall buildings. We subsequently propagate the uncertainty in a set of 82 input parameters that affect buildings' energy consumption using the Energyplus package [32] (figure 2*a*; see the electronic supplementary material, table S1, for the entire list of uncertain input parameters, their distribution type and ranges of variations). Here, we employ global sensitivity analysis (GSA) to shed light on the relative importance of individual factors affecting heating energy consumption [33–37] (see Material and methods section).

Note that, for the purpose of this approach, we fixed the efficiency of the simplified heating system efficiency, *η*_{H}, in our Energyplus simulations. This parameter is a critical one, and is commonly related to the resulting scaling in buildings' gas consumption. We do so, because our goal here is to simplify the role of the physical parameters in the resulting consumption for heating purposes. In practice, one would need to scale the resulting expression, which is based on physical parameters, by introducing the exact value of *η*_{H} of the building under consideration.

After performing GSA on Energyplus simulation results, we find that only seven building parameters are important in determining heating energy consumption at the monthly level (figure 2*b*). These seven variables are the: building's volume and envelope surface area (*V* and *S*), number of neighbours sharing a wall with the building (*N _{c}*), effective thermal resistance of the building envelope (

*R*

_{env}), air infiltration rate (

*I*

_{env}), average temperature set point (

*T*

_{set}) and window type indicating the number of glazing (

*W*

_{typ}). In future work, one could potentially combine

*N*and S into a single variable such as exposed surface area. As a key result, the complexity of the problem is now reduced to its most influential variables; however, this form still presents an opportunity for further simplification.

_{c}Based on the sensitivity analysis presented in figure 2*b* performed via systematic simulations of Energyplus and the exploration of all the parameter space set in its relevant ranges, we note that, while the largest contribution to the variance of gas consumption (*E*) is accounted for through the variability in building size, the Spearman rank correlation coefficient (SRCC) in gas consumption per surface area (*E/S*) is mainly the result of the interplay between individual activity and envelope properties (*T*_{set}, *I*_{env}, *R*_{env}). Therefore, *E/S* is more informative about the thermal efficiency of the building envelope *per se*. Also, most of the contribution to the variance of *E*^{summer}*/S* is attributable to the consumer's set point temperature, *T*_{set}. This is likely to be the result of the fluctuating temperatures of the summers in northeastern USA. In other words, the average monthly temperature is close to *T*_{comf}, making the temperature difference between the inside and outside sensitive to individual choices and not the building's thermal efficiency. However, the average monthly temperature in cold seasons drops significantly and steadily below *T*_{comf}, which, as shown in figure 2*b*, results in gas consumption (*E*^{winter}) that can be expressed in terms of building variables (*I*_{env}, *R*_{env}) and different choices per building (*T*_{set}). When inspecting the ANOVA results of *R*_{eff}, we notice that *R*_{eff} has the same characteristics as *E*^{winter}*/S* with the exception of being completely independent of residents' choices (*T*_{set}). This reflects the preferences of households to maintain *T*_{set} at the monthly level regardless of *T*_{out}. As a consequence, the gradient of energy consumption as a function of *T*_{out} is independent of *T*_{set} and captures solely the building efficiency properties. Thus, consistent with the findings from the energy data analysis for Cambridge, MA, USA, we confirm from ANOVA and buildings' energy simulations that *R*_{eff} is the most convenient norm of choice for capturing the physical response of buildings, reducing the consumption to only physical variables in buildings.

Our task is then reduced to quantitatively describe *R*_{eff} as a function of physical properties of each building and to quantify the impact of weatherproofing at the city scale. This is achieved here by simplifying the problem through a dimensional analysis of the physical quantities involved that possibly affect *R*_{eff}, namely the effective thermal resistance of the building envelope (*R*_{env}), the air infiltration rate (*I*_{env}), the volumetric heat capacity of air () emphasizing that the heat exchange is performed through air, the building's characteristic dimension expressed by the volume-to-surface area ratio (*V*/*S*) and the efficiency of the HVAC system (*η*_{H}). This analysis allows us to further reduce the dimension of the problem to a three-parameter relation between the dimensionless thermal resistance, the ratio of infiltration (*Q*_{inf}) to conduction losses (*Q*_{cond}) and the thermal efficiency of the HVAC system (see Material and methods and the electronic supplementary material, section V, for detail on the dimensional analysis):
2.2

This implies that a simple functional form of is sufficient to describe the physical response of the system without the need to run Energyplus simulations repeatedly. To determine this functional relation, a full factorial design in the (*R*_{env}, *I*_{env}, *V/S*) space is performed by means of Energyplus simulations (figure 3*a*). By enforcing the law of conservation of energy, the response of the simulations can be written as (see the electronic supplementary material, section V):
2.3where *A*_{1} and *A*_{2} are the degrees of freedom in the model. While the functional form of the dimensionless relation in equation (2.3) remains unaltered irrespective of climate, *A*_{1} and *A*_{2} are strongly dependent on the location and climate under consideration. In practice, these parameters need to be calibrated with results of physical simulations in the urban settlement under a given climate. Also, these parameters need to be calibrated with results of physical simulations in the urban settlement under consideration (for instance, for the case of detached buildings with double-glazed windows, *A*_{1} = 0.49 and *A*_{2} = 0.30; figure 3*a*). The dimensionless form in equation (2.3) provides insights into a building's thermal efficiency from the perspective of a simplified complex system. Unlike the effective heat loss rate in the PRISM approach [38], *R*_{eff} and its dimensionless functional form not only account for weather normalization but also provide a quantitative framework to compare dwellings with different sizes. Nonetheless, the individual physical properties of buildings cannot be uniquely identified using the dimensionless model in equation (2.3). For instance, as shown in figure 3*b*, higher thermal efficiency at the building level can be equally achieved by increasing the thermal resistance of the envelope or by decreasing the air infiltration rate. That is, all weatherproofing solutions are located on an iso-performance line [39] in the (*R*_{env}, *I*_{env}) space at a fixed *η*_{H}, where any point corresponds to a unique value of *R*_{eff}. This finding was further tested via standard machine learning methods, namely by means of multiple adaptive regression splines (MARS) [40,41], which are well suited for capturing response surfaces of multi-parametric problems [42,43]. These results demonstrate that, given the nature of the gas consumption response to the parameter space, the response of the system as a surrogate function is always solvable.

In general, equation (2.3) is a powerful tool for simplifying decisions and first-order estimates of energy savings in a sense that, if we know W_{typ}, *R*_{env}, *I*_{env} and *η*_{H} of a given building prior to the retrofit, we will have a robust estimation of the energy-saving potential for each retrofit scenario. In fact, we can obtain the best cost-effective retrofit scenario by juxtaposing this reduced order model with collected gas consumption records and buildings' characteristics measured via on-site home inspection. Although we have effectively reduced the number of influential parameters, W_{typ}, *R*_{env}, *I*_{env} and *η*_{H} of the majority of buildings are not available at the urban level. Therefore, we shift our focus to the collective response of these variables, which is manifested in *R*_{eff}. Here, we start at the building level by considering the gas consumption of an arbitrary building and its pertinent linear fit (figure 3*c*). The obtained effective heat resistance prior to retrofit, , is situated on an iso-performance line (black solid line in the inset of figure 3*c*), which captures the current thermal performance of the building in terms of envelope heat resistance (*R*_{env}), infiltration rate (*I*_{env}) and HVAS efficiency (*η*_{H}) according to equation (2.3). Any weatherproofing option, such as increasing insulation (*R*_{env}), reducing air infiltration rate (*I*_{env}), improving the efficiency of the heating system (*η*_{H}) or installing multiple-paned windows (*W*_{typ}), while retaining the preferred behavioural choices (neglecting the rebound effect [25–27] by assuming the same value of *T*_{0} in figure 3*c*) would entail an increase of to higher iso-performance levels (smaller slope of energy consumption in figure 3*c*), and thus, in light of equation (2.1), to an energy saving after retrofitting of which is independent of the particular choice of weatherproofing.

This simple form provides a straightforward means to upscale the gas-saving potential from the building to the city scale to assist with science-informed urban policy choice and implementation. That is, the challenge pertaining to city-scale strategic retrofit planning is concerned with finding the shortest path to retrofit that achieves the highest savings with the least number of retrofitted buildings. In this regard, an important feature emerges from the ranking of the potential gas saving of buildings calculated in the same manner as presented in figure 3*a*. We find that the rank and magnitude of gas savings follow over a large range a power law with an exponent of 0.75; much like Zipf's law [44,45]. While the deviation of the tail from Zipf's law is attributed to buildings with insignificant gas savings, the tail of the gas-saving distribution follows a power law with an exponent of 2.2 (inset of figure 4*a*). Given the significance of a Zipf-type data distribution, it appears to us that such a ranking based on gas-saving potential will provide the shortest path for city-scale gas savings. To test our hypothesis, we compare this ranking with other selection criteria associated with urban policy choices, starting with a random retrofit of buildings at the city scale, performed upon requests of building owners, in which case the achieved gas saving scales linearly with the number of retrofits. The results of this analysis, displayed in figure 4*b*, show that an informed selection based on ranking the energy saving of buildings (rank(Δ*E*)) provides indeed the highest rate of energy saving per retrofit, followed by an informed selection based on ranking of buildings' gas consumption-per-surface area (rank), building sizes (rank(*S*, *V*)) and effective thermal resistance (rank()). When targeting buildings with high priority and after on-site inspections, equation (2.3) can quantitatively predict potential gas savings for various retrofit scenarios. By way of example, if Cambridge, MA, USA, targets a 40% overall gas consumption reduction related to heating, it would suffice, with such an informed selection process, to retrofit only 16% of the entire building stock as mapped in figure 4*c* in order to achieve this goal, in contrast to 67% of buildings with a random selection procedure to achieve the same target by neglecting the rebound effect [25–27]. That is, the proposed selection scheme based on ranking potential gas savings provides an efficient means to achieve the shortest path for substantial energy savings at the city scale.

To conclude, we propose a method of analysis that combines data on gas consumption, climate and buildings' footprints with surrogate energy modelling. This powerful framework reduces the complexity of the problem to a simple functional form to estimate the thermal response of buildings. Calibrated with utility data, this functional form allows us to easily estimate potential gas savings per building under different retrofit scenarios with minimal computational expense. When applied at the urban scale, we can make informed selections towards the reduction of the gas consumption footprint by identifying the shortest path to the desired goal. The method is portable to cities in different climates, requiring solely data that are readily available for billing and urban planning purposes. This physical approach would benefit from the interaction with new advancement in materials design [46,47] and policy analyses to shed light on energy price-dependent [48], cost-effectiveness [49] and properties' tenure-dependent considerations [50] in various city-scale retrofit scenarios. Hence, from a practical view point, we consider mitigation of city-scale gas consumption and associated carbon emissions to be a multi-objective optimization problem characterized by a Pareto front in the space of technical, economical, legal and political aspects. Similar model reduction approaches combining large data with statistical analysis and physical simulations to gain predictive understanding of the system's response appear to us to be promising for urban energy solutions such as patterns of hourly electric demand and the adoption of alternative sources for generation of electricity. These methods have the premise to help cities to use pervasive data sources to optimize decisions that make them more environmentally and economically sustainable.

## 3. Material and methods

### 3.1. Estimation of *T*_{0} and *R*_{eff} via a piecewise linear regression

The correlation between monthly heating gas consumption and monthly average outdoor temperature exhibits a piecewise linear trend. Such a gas consumption trend, *Y*, is mathematically expressed as:
3.1where *H*(*x* − *x*_{0}) is the Heaviside step function with *x*_{0} being the step position. To find the best piecewise linear regression, or, in other words, the best (*E*_{0}, *T*_{0}, *R*_{eff}) triplet for a building, we minimize the *L*_{2} norm of regression error, defined as:
3.2where *E*/*S* is the actual heating energy consumption per surface area. We performed all regression steps for the entire dataset in an automatic fashion with no data manipulation or treatment, as this makes the analysis rather subjective. The electronic supplementary material, section VI, includes a Matlab script developed for piecewise linear regression. Figure S4 in the electronic supplementary material shows the distribution of the regression coefficient of determination, *R*^{2}, indicating that the majority of buildings in Cambridge, MA, USA, follow the aforementioned piecewise linear trend.

### 3.2. Buildings' energy consumption modelling

We performed building energy simulations using the standard Energyplus package [32]. For this purpose, we constructed an hourly weather file for the period of 2007–2009 using the weather measurements recorded at Logan International Airport by the National Oceanic and Atmospheric Administration [21] (see the electronic supplementary material, section I, for details). We subsequently performed a local sensitivity analysis to uncover the extent of building interactions via the shadowing effect (see the electronic supplementary material, section III and figure S2). This analysis shows that only the first eight neighbours affect a building's heating energy consumption. Hence, we performed further simulations by only considering a block of nine buildings (see the electronic supplementary material, figure S3) that interact through the shadowing effect by considering the Sun's path in the sky dome. The hourly energy consumption predictions are extremely fine compared with actual energy measurements, i.e. monthly energy bills. This suggests that energy simulation and predictions should be relevant to average monthly trends rather than spontaneous temporal variations. Therefore, we considered an average occupancy level and constant indoor temperature in our simulations and aggregated hourly energy predictions to monthly values. Afterwards, we constructed a probabilistic model of buildings' energy consumption by considering several uncertain parameters (see the electronic supplementary material, section IV and table S1). In particular, to be representative of a city's texture, the distances between buildings in the simulations are taken from the distribution of inter-building distances calculated from analysis of the GIS dataset (see the electronic supplementary material, section II and figure S1).

### 3.3. Complexity reduction via global sensitivity analysis

To reduce the parameter space, we performed a sensitivity analysis using Monte Carlo sampling to propagate the uncertainty from all parameters containing all possible building sizes and specifications (figure 2*a*) into energy consumption space. This Monte Carlo sampling provides a probabilistic mapping necessary to infer the contribution of each uncertain variable to the variance of energy consumption using ANOVA. In particular, we employed the SRCC to characterize the sensitivity of energy consumption norms with respect to all uncertain variables (see the electronic supplementary material, section IV).

### 3.4. Dimensional analysis and response surface modelling

From a dimensional perspective, the effective thermal resistance of an envelope with dominant conduction and infiltration heat transfer mechanisms can be written as . The dimensional analysis reduces this functional form to a simple relation between fewer numbers of dimensionless parameters. The rank of the exponent matrix, the matrix formed by the exponents of the variables' dimensions, is 4 (see the electronic supplementary material, section V). Thus, according to the *π*-theorem [51], there are only two independent dimensionless variables among the initial six parameters. The dimensionless relation is: where *Π*_{1} is the ratio of the effective thermal resistance of the system to the conduction resistance of the envelope and *Π*_{2} will be shown to be the ratio of the infiltration heat transfer to the conductive heat transfer. Dimensional analysis effectively reduces the number of variables but it does not quantify the relation between them. To this end, we used the conservation of energy law to propose a functional form of *F*. The conservation of energy for the control volume (volume inside an envelope), which exchanges heat with surrounding media through conduction and infiltration, can be written as
3.3where *Q*_{tot} is the total heat loss through the envelope and thus is equal to , which can be rearranged in the form of equation (2.3). Coefficients *A*_{1} and *A*_{2} can be identified via either simulation or experiment. Here, we used Energyplus software to numerically estimate these coefficients. We have performed a full factorial simulation varying *R*_{env}, *I*_{env} and *V/S* at a fixed efficiency of the HVAC system (see the electronic supplementary material, section V). *R*_{eff} is computed as the derivative of predicted energy consumption with respect to average monthly temperature. The results are plotted in the *Π*_{1}−*Π*_{2} space and *A*_{1} and *A*_{2} are derived by fitting equation (2.3) to the results via the least-squares approach.

## Authors' contributions

M.C.G., F.-J.U., R.J.-M.P., M.J.A.Q. and J.F. designed the project. M.J.A.Q., J.M.S. and J.T. performed the energy, GIS and weather data assimilation. M.J.A.Q. performed the Energyplus simulations. M.J.A.Q. and A.N. performed the sensitivity analysis. M.J.A.Q., A.N., M.C.G. and F.-J.U. performed the dimensional analysis, designed the reduced order model and interpreted the potential energy savings based on the surrogate model. All authors contributed to writing the manuscript.

## Funding

Partial financial support through the Concrete Sustainability Hub at MIT with sponsorship provided by the Portland Cement Association and the Ready Mixed Concrete Research & Education Foundation is also acknowledged. R.J.-M.P. and F.-J.U. wish to acknowledge the support of the ICoME2 Labex (ANR-11-LABX-0053) and the A*MIDEX projects (ANR-11-IDEX-0001-02), cofounded by the French programme ‘Investissements d'Avenir’ managed by the ANR, the French National Research Agency.

## Competing interests

We declare we have no competing interests.

## Acknowledgements

M.J.A.Q. acknowledges discussions with S. Do and K. Goldstein. M.C.G. acknowledges the support of the MIT-Accenture alliance and Center for Complex Engineering Systems (CCES) at KACST, and J.T. acknowledges an NSF graduate studies fellowship. M.J.A.Q. acknowledges partial funding from the Henry Samueli School of Engineering, University of California Irvine.

- Received November 7, 2015.
- Accepted March 21, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.