## Abstract

Human observers readily make judgements about the degree of order in planar arrangements of points (point patterns). Here, based on pairwise ranking of 20 point patterns by degree of order, we have been able to show that judgements of order are highly consistent across individuals and the dimension of order has an interval scale structure spanning roughly 10 just-notable-differences (jnd) between disorder and order. We describe a geometric algorithm that estimates order to an accuracy of half a jnd by quantifying the variability of the size and shape of spaces between points. The algorithm is 70% more accurate than the best available measures. By anchoring the output of the algorithm so that Poisson point processes score on average 0, perfect lattices score 10 and unit steps correspond closely to jnds, we construct an absolute interval scale of order. We demonstrate its utility in biology by using this scale to quantify order during the development of the pattern of bristles on the dorsal thorax of the fruit fly.

## 1. Introduction

Spatial patterns with regularity are found in living, natural non-living and man-made systems. Their presence can be informative about mechanisms of formation (e.g. Liesengang rings), properties (e.g. glasses) and functions (e.g. sensory receptors). Despite the diversity of domains, because patterns are an abstraction of structure, generic methods for their analysis are possible. There are many types of spatial patterns (e.g. grey-level textures, binary-valued images, polygonal meshes), but here we consider only planar arrangements of points (point patterns). Although only a fraction of pattern space, point patterns are widely applicable.

Point patterns can be analysed geometrically and examined by eye. Vision strongly engages with point patterns, evoking a complex bouquet of dependent qualities which seem to vary along several dimensions (e.g. density, anisotropy, complexity, clustering, etc.) not all of which correspond to agreed-upon geometric measures. This paper is concerned with one such unmodelled dimension: the degree of order/disorder which we will refer to simply as *order*.

As in colorimetry, we often need to provide an agreed measure of a perceptible quality. As perceptual scales may exhibit individual differences, this agreement needs to be with the consensus. A common approach to eliminating the need for an observer is to devise a physical scale that is in good agreement with a perceptual one. In some cases, a perceptual scale may then be abandoned and replaced with this objective surrogate. For example, the Mohs [1] and Vickers [2] scales for material hardness are used without reference to felt hardness.

Finding objective proxies for the extremes of perceptual dimensions is usually easier than for the middle range. For order, the extremes are mathematically characterized: at the perfectly ordered end, the Group Theoretical analysis of spatial isometries, results in a definitive characterization of exact order [3]; at the disordered end, statistical concepts of randomness allow total disorder to be characterized as arising from Poisson point processes [4]. By contrast, there are only piecemeal mathematical theories concerning the middle range of order [5–9]. Yet, perceptually, the middle range seems to have well-defined structure: some patterns appear more ordered than others, suggesting an ordinal scale structure; further, some differences in order appear larger than other differences [10], suggesting an interval scale structure. So our hypothesis is that *the consensus of subjective order has the structure of an interval scale*. In addition to testing this, the further aim of this paper is to define an objective interval scale closely aligned with this consensus. Importantly, such a scale would provide a wide range of natural scientists with a tool by which to quantify order within images to study the evolution of order/disorder in natural systems and in systems responding to experimental perturbation.

Order is particularly relevant to biology and biomedicine, because living systems tend to be well-, but not perfectly, ordered, while the generation of disorder is associated with ageing and disease [11,12]. In biological systems, it is observed at all organizational scales. For example, at a fine scale, order is critical to the regulation of molecular machines [13], which often include juxtaposed modules of different symmetries [14]. At an intermediate scale, the orderliness of the cellular cytoskeleton can vary depending on its function [15], and in neoplasia has been seen to become less ordered [16]. At a coarse scale, the formation of orderly skin patterns in zebra fish arises from a self-organizing system of cell–cell interactions [17].

Within biology, order is especially pertinent in development where disorder is continually being generated and corrected. Moreover, changes from disordered to ordered states are frequently seen [18–21] and can be altered by mutation and perturbation of environmental conditions, such as temperature and nutrition [22]. Post-development, the final degree of order impacts on the effective functioning of the organ [20], and the fitness of the organism, via sexual attractiveness [23] and mating success [24]. Not all developmental processes aspire to exact order. For example, in the mammalian eye, spacings of parafoveal receptors are less than perfectly regular which prevents Moiré-like aliasing [25]. Post-development, order is maintained in healthy tissues by homeostatic processes that combine the disordering effects of cell proliferation with active processes to restore the resting level of intermediate order [26]—processes that fail in diseases such as cancer [11].

A specific example, which illustrates the complexities of order in a biological system, is provided by the development of pattern in the notum (dorsal thorax) of *Drosophila melanogaster* (the drosophila fly) [27]. By 24 h after pupa formation (APF), the notum exhibits a well-spaced array of precursor mechanosensory bristle cells (microchaetes), despite being relatively disordered during the proliferative phase 12 h earlier [28]. This rapid and pronounced change in order makes the tissue convenient for laboratory study which can be performed using live imaging techniques that do not interfere with the developmental process.

When notum development is imaged, two types of order become apparent (figure 1). First, cells across the tissue, through cycles of division, shape change and neighbour exchange, form an approximate hexagonal lattice. Second, a subset of these cells change their state (as measured by levels of proneural gene expression) to form a self-organizing [28,29] patterned array which gives rise to the well-spaced mechanosensory bristle organs. In this paper, we will be concerned only with the second of these aspects, the formation of the bristle pattern.

Viewing movies of notum development (figure 1), it is clear that the level of order increases in the period from 12 to 24 h APF. While convenient, such subjective assessments of order are imprecise, difficult to quantify, and are likely to be affected by the history of previous observations [30]. Additionally, irrelevant factors (e.g. element density, size, shape, orientation or brightness) may be difficult for observers to ignore—confounding the analysis. As an alternative to subjective assessment, geometric algorithms have been proposed that aim to quantify order. For example, the variance of distances from each point to its nearest neighbour [18]. These approaches will be misleading unless they accurately correspond with subjective assessment across the full range of levels of order. Moreover, if a geometric algorithm is to replace subjective assessment, its agreement with interval scale structure, not just ordinal, needs to be demonstrated. We would be entitled to call a geometric assessment of order that has been validated in this way a *measurement* of order giving values on a well-defined *interval scale*. Using such a measure we would, for example, be able to quantify the changing order of the developing fly notum and to plot its detailed timecourse using the interval scale as a meaningful ordinate axis.

In §2, we analyse experimental results to assess whether the perception of point patterns varies along a perceptual interval scale (*p*-scale). Our results confirm the reality of the perceptual dimension and establish that it is quite well extended in terms of its number of distinguishable levels. In §3, we establish an interval scale of order, as assessed by a novel geometric algorithm (*g*-scale), which accurately corresponds with the *p*-scale. We demonstrate that previously proposed quantifications correspond less accurately. In §4, by scaling the *g*-scale, so that absolute disorder maps to 0 and perfect order to 10, we define a convenient absolute scale (*a*-scale) for the measurement of order. In §5, we demonstrate application of the *a*-scale by plotting the timecourse of developing order in the bristle precursor cells of the fly notum. The code in MATLAB for the *a*-scale algorithm is provided in the electronic supplementary material.

## 2. Existence of a perceptual scale of order

### 2.1. Methods

#### 2.1.1. Psychophysical experiments

We used pairwise ranking of point patterns to collect data on subjective order. Since the space of possible point patterns is multiply infinite and contains many different perceptual types, difficult decisions about what patterns should be ranked had to be made. Since, we have no means to sample uniformly from the space of patterns, we carried out a thorough analysis of a particular core region of pattern space within a broader region. Using this strategy, we aimed to look for lawful behaviour within the core region and possible deviations within the broader region.

Point patterns were created as follows: (i) a base lattice—triangular, rectangular or hexagonal—was chosen, (ii) the lattice points were jittered using Gaussian positional noise, (iii) the pattern was affinely distorted by stretching it a random amount along a random orientation, (iv) a fraction of randomly chosen points were removed, or a number of randomly positioned points were added, (v) a nonlinear positional warp, implemented as a bicubic transformation of Cartesian coordinates, was applied, (vi) a random centre for a circular window was chosen, and its radius was chosen so 180 points were visible within. The parameters of this pattern generation process were the (i) base lattice, (ii) jitter magnitude, (iii) orientation and magnitude of the stretch, (iv) fraction of points removed or added and (v) five parameters for the nonlinear warp.

The aims of the experiment were to determine the reality of a *p*-scale for order, and to assess candidate *g*-scales. These aims place different demands on the experimental design. For testing for the *p*-scale, it is preferable to present a small number of patterns in all possible pairs. For testing *g*-scales, it is preferable to present as many patterns as possible. We reconcile these two considerations—core versus border region, all pairs versus many patterns—by an experimental design using a set A of 20 patterns from a core region of pattern space presented in all possible (190 = 20 × 19/2) pairs, and a set B of 240 patterns from a broader region presented in 120 fixed pairings.

To construct set A, a pool of 120 patterns were generated using a variety of parameter settings, avoiding those that were multiply extreme. From the pool, a subset of 20 were chosen with the criteria that they should: be roughly uniformly spaced in order all the way from highly ordered to fully disordered; and be diverse in terms of the base lattice, the amounts of perturbation, deletion/addition of points and the degree of warp. The patterns of set A are contained in figures 2, 4 and 8 where they are numbered according to their estimated *p*-scale values (see §2.2.2). To construct set B, 120 pairs of patterns were generated using random parameters settings.

Patterns were displayed in pairs on a 40 cm diagonal LCD screen at a distance on 50 cm under comfortable internal illumination. Each pattern was rendered using solid black dots of 1 mm diameter on a white circular disc of radius *r* = 6.2 cm. Pairs were on a grey background. Subjects first viewed patterns from set A (each of 20 × 19/2 = 190 pairs was presented twice) followed by each of the set B pattern pairs viewed once. Trials carried out within each block were performed in random order; in each trial, the patterns were randomly oriented and allocated to left or right. Subjects were given written instructions to use the keyboard to indicate the pattern that ‘appeared more ordered’ to them, and to proceed at their own pace. All subjects took 15–20 min to complete their 500 trials. Presentation of stimuli and recording of responses were controlled using the MATLAB Psych toolbox [31]. Twenty subjects (14 male), with normal or corrected-to-normal vision and at least undergraduate level education, took part.

### 2.2. Results

#### 2.2.1. Variability of the data

We computed three measures of response variability: the *intra*, *inter* and *maximum* agreement rates. As set B trials were not repeated, no *intra* rate was computed for them. Rates are shown in table 1.

The *intra* agreement rate is the probability that a random subject would choose the same pattern both times when faced twice with the same random trial. The rate of 85% for set A demonstrates that subjects do not always respond the same to a pair of patterns. This is not surprising since with 20 patterns in set A we would expect some to have similar levels of order.

The *inter* agreement rate is the probability that two subjects will agree on which pattern of a pair is more ordered. These rates are within 1% for the two sets. On set A, the *inter* rate is only 2% less than the *intra* which shows that there is very little variation between subjects over-and-above their personal variability. There is, therefore, a strong consensus in the perception of order.

The *maximum* agreement rate is the probability that on a random trial a random subject will agree with an optimal ranking of the patterns, where the optimal ranking is that which maximizes this rate. For both sets of patterns, this rate is roughly 5% more than the *inter* rate. We include this rate as it will provide a context to the performance scores for geometric measures in §3.2.1.

#### 2.2.2. Validating the *p*-scale

Linear models to account for paired comparisons of stimuli relative to some indicated perceptual dimension originate with Thurstone [32]. In linear models, the perceptual dimension is modelled as an interval scale, within which each stimulus (*X*) is assumed to have a true attribute value (*μ _{X}*). Each separate perception of a stimulus has its own attribute value (

*n*) from the same scale, which is a noisy realization of the true value. When a subject compares two stimuli (

_{X}*X*and

*Y*) they report which of the two noisy realizations is larger, which can vary if the trial is repeated. If the noise is assumed Gumbel-distributed, stationary and uncorrelated (Bradley–Terry Model) then there will be a logistic preference function which maps the signed difference between the true values of

*X*,

*Y*, to the probability that

*X*will be preferred to

*Y*[33,34]. The positive constant

*k*controls the units of the scale of attribute values. We choose

_{l}*k*= 0.91 so that

_{l}*P*(1) = 0.75, i.e. so that a unit distance on the scale corresponds to a conventional just-notable-difference (jnd) [35]. During psychophysical experiments, stimulus independent errors can occur (lapses). Unless modelled, these can bias estimates of attribute values [36,37]. Lapsing at a rate

*λ*is incorporated in the model by using a rescaled probability function

*λ*+ (1 − 2

*λ*)

*P*(Δ

*μ*).

We estimated the parameters of the model ( for each stimulus, and the lapse rate *λ*) by maximum-likelihood (ML) fitting to the psychophysical data using gradient descent with multiple random starts to check for stability. The goodness-of-fit (GoF) of the ML model was assessed by comparing its empirical deviance to the distribution of deviances that result by generation of random datasets from the ML model [36]. Deviance of a dataset (original or simulated) is the difference between the log-likelihood of the dataset given the ML model, and the log-likelihood of the dataset given a saturated model, which in this case specifies a separate ML probability for each trial.

For set A, the empirical deviance of the ML model is 211.7 and the 95% interval of the random datasets is [144.6, 214.9], hence the model is accepted. Therefore, our hypothesis that the consensus of perceptually based order has the structure of an interval scale is supported. We refer to this scale as the *p*-scale and will treat the ML estimates of the true values of our stimuli as values on this *p*-scale. In figure 3, we compare the experimental preference rates for trials using set A to the predicted rates based on the *p*-scale values of the stimuli. Visually, the fit is good which is consistent with the results of the GoF analysis. The *p*-scale values of set A are shown in figure 4. They cover a range of 9.86 jnds. The estimated lapse rate was 0.0008.

To estimate the uncertainty of the fitted *p*-scale values, we used a parametric bootstrap method [37,38]. The method generates sets of synthetic experimental data from the fitted model to each of which a new model is fitted. We thus obtain a set of *p*-scale estimates for each stimulus (*p*-scale values are aligned by subtracting the mean of each set), the standard deviation of which provides an estimate of uncertainty. Across the stimuli of set A, the standard deviations ranged from 0.10 to 0.30 jnds, with 0.14 jnds being the (RMS) average. The 95% confidence interval for the size of the range of *p*-scale values in set A was [9.17, 10.72] jnds. The 95% confidence interval for the lapse rate was [10^{−8}, 0.0013].

## 3. Evaluating geometrical measures

### 3.1. Methods

#### 3.1.1. Existing geometrical measures

Methods for quantification of order can be based directly on the point locations, or on a geometric construction derived from them. Common constructions used to analyse biological patterns are the Voronoi diagram [39] and the related Delaunay triangulation [40] (figure 5). The Voronoi diagram is the partitioning of the plane into convex polygons (cells) such that each contains all locations closer to one point of the pattern than to any other points. The Delaunay triangulation of a point pattern is the unique triangulation such that no triangle contains any points of the pattern within its circumcircle. The Voronoi diagram and Delaunay triangulation are related: Delaunay triangle circumcentres are coincident with Voronoi side junctions.

Measures have been proposed that depend on distances, areas, shapes and topology. Distance-based measures use either nearest neighbour or neighbours (as defined by a triangulation or other structure). Examples include: global mean nearest-neighbour distance [41], global variance of nearest-neighbour distances [18,42], mean of local variance of nearest-neighbours distances [43–45] and other measures of the dispersion of nearest-neighbour distances [45]. Other distance-based approaches use all, rather than nearest-neighbour, distances such as Ripley's L-function [46] and autocorrelation analysis [43,47,48]. Area-based approaches include: variance of polygon area [49], area disorder [12] and other measures of area distribution [47]. Shape-based approaches use the distributions of polygon elongation [19], polygon conformity [43,44] and Voronoi cell angles [46]. Topological measures analyse the variation in the number of neighbours points have [26,50].

Based on pilot studies, we have chosen for assessment six representative methods from the above list: mean nearest-neighbour distance, variance of the number of neighbours, variance of Voronoi cell area, Voronoi cell area disorder, variance of nearest-neighbour distance and maximum autocorrelation. The autocorrelation method has a single tunable parameter (width of the point spread function); the other methods have none.

#### 3.1.2. Novel measures

We also assess four novel measures of order that we have designed. Two concern the local symmetry of the Voronoi diagram, and two concern the variability of Delaunay triangles. We report the tunable parameters for each method.

*Pairwise local symmetry.* This measure is the mean of a local symmetry score computed separately for each pair of adjacent Voronoi cells. The pair score measures their symmetry by transforming one cell so that it lies roughly on top of the other, and then measuring their area overlap (intersection divided by union). The transformation can be by reflection in the shared Voronoi side, or by 180° rotation about the Voronoi side midpoint; whichever transformation results in the greater overlap is used. We allow overlap scores to be transformed by a power law, and then compute their weighted mean with weights which are power law transformed areas of the involved cells. The exponents of both power laws are tuneable.

*Centroidal symmetry.* While the previous measure assesses local symmetry based on pairs of adjacent Voronoi cells, this measure assesses order on the basis of each cell alone. The symmetry of a cell is measured by the distance between the point of the pattern (which the Voronoi cell surrounds) and the centroid of the cell, made dimensionless by dividing by the square root of the cell area. We allowed the same two tuneable power laws as for pairwise local symmetry.

*Delaunay sides entropy.* Previous measures that use Delaunay side lengths pool all lengths together and assess their mean or variation. Our novel measure analyses three disjoint sets of Delaunay side lengths: one for the shorter sides in each Delaunay triangle, another for the longest and the third for the intermediate length sides. We assess the variability of each set of lengths using an entropy measure based on a kernel-density estimation of the distribution using a Gaussian kernel as illustrated in figure 6. The method has one tuneable parameter, the width of the Gaussian kernel.

*Delaunay triangles entropy.* Methods based on nearest-neighbour distances assess the size of the spaces between the points of a pattern using a single univariate histogram. By contrast, the Delaunay sides entropy (DSE) measure of the previous section uses three univariate histograms, which gives it some sensitivity to the shape of the inter-point spaces as well as their size. However, information on shape and size is entangled and present across all three histograms. In this final measure, we separate these two aspects of the inter-point spaces: rather than three univariate length histograms we form one univariate size histogram (triangle areas) and one bivariate shape histogram (triangle shapes as represented by the lengths of the two shorter sides divided by the longest). As with DSE, we use kernel-density estimation to produce smooth histograms from which stable entropy estimates are computed. The process is illustrated in figure 7. The method has two tuneable parameters: the size of the two smoothing Gaussians.

### 3.2. Results

#### 3.2.1. Assessing candidate *g*-scales

A geometric measure can agree with the ordinal structure of the *p*-scale while differing in the interval structure. In such a case, if the raw outputs of the measure are transformed by an appropriate monotonic function the resulting outputs will have perfect interval structure. For each of the measures presented in §3.1.1 and 3.1.2, we will report the ordinal accuracy of their raw outputs and the interval accuracy of their optimally transformed outputs.

To find the optimal transformation for each measure, we used a parametrized representation of a smooth monotonic function, implemented as a constrained polynomial. Any internal parameters of a measure can also be tuned for optimal performance. We will refer to the combination of a measure and transformation as an algorithm. The parameters of each algorithm were optimized by maximizing the likelihood of the experimental data, computed using the logistic plus lapsing model. Each optimized algorithm defines a candidate *g*-scale.

Leave-one-out cross-validation was used to make unbiased estimates of the accuracy with which each *g*-scale agrees with the *p*-scale. To do this, we set aside one of the 20 patterns of set A and optimize the algorithm's parameters using the subset of experimental data involving the remaining 19. Next, the constant offset needed to best align the 19 *g*-scale values from the optimized algorithm with the 19 corresponding *p*-scale values was identified. Then we computed the *g*-scale value of the left-out pattern using the optimized algorithm, applied the constant offset, and compared the result to the *p*-scale value of the left-out pattern. We repeated this, leaving out in turn each of the 20 patterns, and computed the RMS *g*-/*p*-scale differences as an overall accuracy for that *g*-scale.

The performance of the candidate *g*-scales are shown in table 2. The ‘number of parameters’ column gives *M* + *N*, where *M* is the number of internal parameters, and *N* the number of parameters of the monotonic transformation. In all cases, we report *N* = 2 since it gave consistently better performance than *N* = 1, while *N* = 3 gave marginal improvement for some methods and worse cross-validated performance for others. The ‘ranking performance’ columns show the fraction of psychophysical trials that agree with the ranking of the patterns according to each candidate *g*-scale. These rates cannot exceed the maximum agreement rates in table 1. The accuracy column gives the cross-validated RMS difference between the *g*- and *p*-scale values.

In comparing the performance of different geometric measures, our ultimate aim was to discover one with excellent interval accuracy for order. Such a measure would necessarily have high ranking accuracy. We therefore first assessed measures by their ranking accuracy. Then, we compared the interval accuracy of those that performed well. Ranking accuracy can be assessed on both sets but our primary aim is for good accuracy across set A, while the role of set B is to allow the limits of measures to be assessed. Interval accuracy can be assessed on set A but not on set B.

Table 2 shows that previous methods perform roughly similar, with the exception of autocorrelation, which performs better. Autocorrelation achieves a ranking accuracy within 2% of the ceiling on set A and 5% on set B, and an interval accuracy of 0.85 jnds. All our novel methods show equal or better ranking accuracy than previous methods. Among the novel methods, the best is Delaunay triangles entropy (DTE), which achieves perfect ranking performance on set A and within 3% of set B, with an interval accuracy of approximately half a jnd. In the remainder of this paper, we will focus on the DTE *g*-scale, which we will refer to it as *the g*-scale.

The cross-validated residuals between the *g*- and *p*-scales have a small mean (0.03 jnds); they pass Kolmogorov–Smirnov and Jarque–Bera normality tests (*p* > 0.05) and their correlation with the *p*-scale values is not significantly different from zero (*p* > 0.05). We conclude that, assuming that the patterns belong in the same class as set A, *g*-scale values estimate *p*-scale values with an unbiased, unstructured random error with a standard deviation of half a jnd. A performance analysis of the DTE measure for set B is included in the electronic supplementary material.

There is a wide range of physically inspired measures of order in literature [4–9] that could directly suggest or inspire candidate measures for our study. Some are not straightforward to adapt to our case due to the relative small number of elements in the patterns in comparison to physical systems, or due to finite size effects, etc. In this paper, a thorough analysis is limited to measures that appear frequently in biological studies (table 2). Some additional measures based on Hopkins statistics, radial distribution function and bond order parameters Q4 and Q6 were, however, tested; their ranking performance was lower than that of the novel measures we present.

## 4. An absolute interval scale for order

Since the interval structure of the *g*-scale is preserved by linear rescaling, we can scale it so that certain significant patterns are anchored to memorable values. We will call the result an *absolute interval scale of order* (*a*-scale).

For the lower anchor, we considered patterns arising from two-dimensional spatial Poisson processes, which are random systems of points such that the distribution of counts within any region is Poisson-distributed, and the distributions for disjoint regions are independent. Finite point patterns that arise from such a Poisson process were found to vary modestly in their *g*-scale value. In particular, for patterns of 180 points (as in our stimuli) the standard deviation of the *g*-scale value was 0.29 jnds. We established the *a*-scale so that on average Poisson patterns of 180 points have a value of zero.

For the upper anchor, we consider Bravais lattice patterns which appear as regular arrays of parallelograms (with special cases of squares, diamonds, etc.). For any Bravais lattice, all the Delaunay triangles are congruent so the *g*-scale value attains its maximum possible. We established the *a*-scale so that Bravais lattice patterns have a value of 10.

Since the difference in *g*-scale value between Poisson patterns and Bravais lattices is 10.4 jnds, we estimate that 1 unit on the *a*-scale (*a*-unit) is equal to 1.04 ± 0.07 jnds. Thus, for practical purposes: one *a*-unit ≃ 1 jnd. Since the *g*-scale agreed with the *p*-scale to an RMS accuracy of 0.49 jnds, taking into account the re-scaling of the *a*-scale to the anchor points, the precision of the *a*-scale is 0.47 *a*-units (=0.49/1.04). In figure 8, the *a*-scale values of a diverse set of patterns are shown.

## 5. A biological example

A major goal of the approach was to develop a scale of order that would be of widespread utility in the study of biological patterning during development and disease. To illustrate this, we have applied the *a-*scale to the study of the dorsal thorax of the developing fruit fly. The notum of *D. melanogaster* is an ideal system for this type of analysis since it is a simple and experimentally tractable model system that has been used for nearly 100 years to study developmental patterning, its genetics and its evolution. The pace of its development is such that observation of cellular dynamics and gene expression changes can be achieved in real time using live imaging techniques which do not interfere with the developmental process. In the fully developed fly, the notum is covered by a population of short mechanosensory bristle cells (microchaetes) arranged in a fairly regular array; something that is likely to contribute to organismal fitness. This pattern arises during the course of development, so that at early stages (12 h APF) the notum is found to be in a state of relative disorder, and lacks an established bristle pattern. A process of lateral inhibition, based on Delta–Notch signalling, is thought to drive the system towards a patterned final state, in which each Delta-expressing bristle precursor cell is surrounded by cells with active Notch signalling [18]. The activation of intracellular Notch signalling by the membrane-tethered Delta protein requires cell–cell contact which is accomplished by the action of dynamic, basal actin-based filopodia protrusions, which help to control the observed regular spacing of the bristle cells [18]. Moreover, it has been proposed [28] that the final well-ordered state is reached through a process of gradual pattern refinement which relies on the stochastic character of the filopodia contacts (*structured noise*). During the final stages of this refinement process, the fate of each cell, i.e. whether it will remain a normal epithelial cell, or will give rise to bristle, is determined.

### 5.1. Methods

Imaging is by confocal laser scanning microscopy. Neuralized-Gal4, UAS-MoesinGFP was used as a marker for the Delta expression of bristle precursor cells, and ubiquitously expressed E-Cadherin-GFP was used to visualize apical cell–cell junctions. For live imaging, a window was cut in the pupal case; consequently, the earliest time that could be imaged was 12 h APF. From that time, images were acquired roughly every 30 min for 10 h and assembled into a movie. Each movie frame appeared as a mosaic of epithelial cells with bright walls. Roughly, 95% of these cells express no fluorescent protein, causing them to appear dark, whereas the remaining 5% exhibit varying degrees of lightness due to differing levels of proneural gene expression. During the movie, the cellular mosaic continuously changes, and new cells appear through division; as cells start to express markers of bristle fate they become visible (becoming potential bristle precursor cells), and cells expressing the fluorescent marker can lose this expression. Although the distinction is not sharp, the fluorescently labelled cells can be segregated into dim and bright classes. Over the course of each movie, a number of cells transit between the dark, dim and bright states. Bright cells at the end of the movie are those that will become bristle cells; dark and dim cells will become ordinary epithelial cells. Therefore, the process of patterning involves the refinement of the pattern of dim and bright cells.

To generate point patterns for this analysis, we first localized cells expressing the fluorescent marker, and then classified them as dim or bright. Since brightness, contrast and blur vary within images and across the movie, and cells vary in size and shape, the localization and classification steps were performed manually rather than automatically. Localization was done with full frame viewing, while classification was done by viewing individual cells in a randomized order clipped out of the image along with their immediate contexts, so that the viewer was not influenced by the positioning of a cell within some wider pattern.

The *a*-scale was used to quantify order for the biological system over time, with separate values computed for bright cells, bright plus dim and random sets of cells. Four movies of the wild-type fly were analysed.

### 5.2. Results

As shown in figure 9, the number of bright + dim cells (i.e. Neu:GFP expressing) increases during the movies; of these cells, the proportion which are bright increases from 50% at early stages of development (15 h APF) to 90% in the final stages (22 h APF). Figure 10 shows the timecourses of the *a*-scale values for the bright, bright + dim and random subsets for all four movies, along with linear fits. Random subsets were chosen from the set of bright + dim cells, equinumerous with the bright cells. The *a*-scale values plotted are the average for 10 random subsets at each timepoint. The quality of the linear fits were assessed using a *χ*^{2}-test taking into account the 0.47 units precision of *a*-scale values, and the number of parameters of the fit. Eleven of the 12 pass the *χ*^{2}-test at a 95% significance level; the 12th (bright cells of movie 4, figure 10, top-right) passes after the removal of one of its 17 points. Figure 10 shows that all movies showed similar timecourses of order for each class of cell.

### 5.3. Discussion

To get a clearer view of typical behaviour, we combined the data from all four movies and computed new linear fits which are shown in figure 11. This should be viewed in conjunction with figure 12 which shows point patterns for one of the movies at indicated times.

Figure 11 shows that the population of bright + dim cells starts at an *a*-scale level of 3.0 at 15 h APF, which is a modest level of order but distinctly different from perfect disorder, and rises to 3.5 over the following 8 h. While the increase in order of only 0.5 *a*-units is small, figure 9 shows that this is achieved while increasing the number of these cells by 43%. Adding points randomly to a pattern would decrease its order, so it seems that there must be processes that cause the new cells to appear in non-random positions and/or processes that cause the bright + dim population to re-arrange to maintain their order. Observation of the movies reveals little re-arrangement so the non-random emergence of new bright + dim cells is the prevalent mechanism for maintaining order in this population even while it increases in number (figure 12, middle row).

Turning to the bright cells, figure 11 shows that they start out at a lower level of order than the bright + dim population of which they are a subpart, but their order increases so that by 18 h APF it has reached the same level as the bright + dim cells, and by 23 h it has exceeded it by a full two *a*-units. This behaviour can be understood by comparing the timecourse to that of the random cells. This comparison shows firstly that the initial low level of order of the bright cells is compatible with them being a random subset of the combined bright + dim population (figure 12, top-left), and secondly that the increase in the fraction of bright + dim that are bright would account for only a modest increase in the order of the bright cells, much less than that observed (compare top-right and bottom-right of figure 12).

In summary, the timecourses of order are consistent with two mechanisms operating in parallel. One mechanism adds cells to the bright + dim population while maintaining its order; the other mechanism changes cells from dim to bright and vice versa so that the bright sub-population increases in proportion and level of order.

Thus, using the *a*-scale of order, we have been able to decipher a two-step process by which the pattern of bristles is refined during the generation of biological order.

## 6. Summary

Using pairwise comparisons of the order of point patterns, we have established that there is a strong consensus in the perception of order. These perceptions have the structure of an interval scale, and that scale extends over roughly 10 jnds between disorder and order.

We compared existing and novel geometric methods for quantifying order. The mismatch between the best of existing methods and perception was 0.85 jnds, whereas the novel DTE method achieved a mismatch of less than 0.5 jnds; this method works by quantifying the variability of the spaces between points as captured by the size and shape of the Delaunay triangulation.

We proposed an absolute interval scale (*a*-scale) for order based on the DTE method, but scaled so that Poisson disorder has an average value of zero, and perfect Bravais lattice order has a value of 10. Each *a*-unit of this scale is approximately 1 jnd for human perception. The RMS precision of *a*-scale values, as determined by the DTE method, is 0.47 *a*-units.

We demonstrate the use of the *a*-scale on pattern formation data from the fly. This allowed an objective quantification of the timecourses of order in different sub-parts of the pattern. The qualitative structure of these timecourses is consistent with existing hypotheses about distinct pattern refinement processes operating in parallel.

## Funding statement

CoMPLEX is an EPSRC funded Centre for Doctoral Training. E.D.P. is supported by the Greek State Scholarship Foundation (IKY). G.L.H. is supported by a BBSRC grant, and B.B. by a Cancer Research UK, Senior Research Fellowship.

- Received April 2, 2014.
- Accepted July 8, 2014.

- © 2014 The Author(s) Published by the Royal Society. All rights reserved.