Everyone's walking style is unique, and it has been shown that both humans and computers are very good at recognizing known gait patterns. It is therefore unsurprising that dynamic foot pressure patterns, which indirectly reflect the accelerations of all body parts, are also unique, and that previous studies have achieved moderate-to-high classification rates (CRs) using foot pressure variables. However, these studies are limited by small sample sizes (n < 30), moderate CRs (CR ≃ 90%), or both. Here we show, using relatively simple image processing and feature extraction, that dynamic foot pressures can be used to identify n = 104 subjects with a CR of 99.6 per cent. Our key innovation was improved and automated spatial alignment which, by itself, improved CR to over 98 per cent, a finding that pointedly emphasizes inter-subject pressure pattern uniqueness. We also found that automated dimensionality reduction invariably improved CRs. As dynamic pressure data are immediately usable, with little or no pre-processing required, and as they may be collected discreetly during uninterrupted gait using in-floor systems, foot pressure-based identification appears to have wide potential for both the security and health industries.

1. Introduction

When walking, our feet interact with the ground in a stereotypical fashion: heel strike, roll to the forefoot, then push-off with the distal forefoot and toes [1] (figure 1a). This process takes about 0.7 s when walking at normal speeds of about 1.2 m s–1. Is it possible that, within these stereotypical constraints, all individuals interact with the ground uniquely?

Figure 1.

Description of plantar pressure data for a single step. (a) Pressure image time series; percentages indicate normalized time (% stance). (b) Pixel time series; dark grey, black and light grey trajectories indicate pixels whose maxima were reached in the first, second and final thirds of stance phase, respectively. (c) Pre-features for an example pixel time series (see table 2 for variable descriptions). (d) Pre-features, when computed across all pixels.

Based on the gait recognition literature, this seems plausible: individuals move their bodies and limbs in highly unique and highly repeatable patterns [2], and camera-based computer systems can be trained to recognize these patterns [3], even in adverse conditions such as poor lighting and brief exposure [4]. We would therefore expect these highly unique movement patterns to be reflected, to a certain extent, in our mechanical interaction with the ground, and that computers could be similarly trained to recognize gait patterns from floor-based sensors. Indeed, floor-based gait recognition has already been highly successful. Recent examples include use of ground reaction force (GRF) trajectories, wavelet decomposition and fuzzy set-based feature extraction to recognize individuals with classification rates (CRs) of 97 per cent [5] and 99 per cent [6].

While both camera-based and GRF-based gait recognition have been widely successful, both also have certain practical limitations. Camera systems must overcome environmental noise, perspective and other three-dimensional calibration problems, which state-of-the-art systems can do impressively, but with only moderate accuracy (74%) [7]. Force plate-systems must be quite large, at least 0.5 m long for full foot contact during non-targeted gait, but multiple feet must not contact the plate at the same time, meaning that force plates cannot be positioned arbitrarily and also that they cannot be used in multi-subject environments.

An alternative is plantar pressure imaging (PPI) [8]. PPI systems typically consist of an array of hundreds or thousands of pressure-sensitive sensors that are capable of characterizing plantar pressure distributions at spatial and temporal resolutions of the order of 5 mm and 100 Hz, respectively. There are a variety of PPI technologies [8], but in their final form most systems are thin, flat, relatively rigid boards that can be embedded in the floor to be flush with the walking surface. PPI systems do not suffer from environmental noise because the foot can be very easily isolated from the environment using low-pressure thresholding. Even though an individual may walk over a PPI plate at arbitrary angles, PPI systems also do not suffer from perspective problems because foot images may be spatially aligned using automated registration techniques [9,10]. Finally, high spatial and temporal resolutions mean that PPI systems can be used in multi-subject environments as all footsteps are, by nature, spatio-temporally isolated.

PPIs are qualitatively highly unique among different subjects (figure 2), and PPI-based biometric identification has consequently also had varying degrees of success (electronic supplementary material, appendix table A). Most of these studies report moderate accuracy (80–85%), and we are aware of only four that report accuracies greater than 90 per cent for sample sizes of at least 10 subjects: 98.6—[11], 96.0—[12], 93.1—[13], 92.3 per cent—[14]. However, the maximum number of subjects tested in these studies was 11, and in a variety of pilot tests, we were unable to reproduce the best of these results [11], perhaps partly because we were unable to resolve certain ambiguities in the authors' algorithm descriptions. Only one study examined more than 11 subjects [15] (n = 30), but accuracy was notably lower (86.1%) than a previous study by the same group with fewer subjects—93.1 per cent (n = 10) [13]. To date, high accuracies on samples notably larger than n = 10 have only been achieved using complimentary information like high-resolution skin prints (99%; n = 32) [16] or three-dimensional foot sole shape (98.7%; n = 30) [17], information that cannot be readily obtained during uninterrupted gait because of lengthy scanning durations.

Figure 2.

Maximal pressures (P100) for the first 12 subjects presently tested; averaged across five trials.

Of the purely PPI studies, it is notable that many have employed spatial normalization procedures; as the foot may adopt an arbitrary posture with respect to the PPI device, it seems logical to compensate for arbitrary postures using spatial normalization. However, we note that most of these studies employed decorrelation (electronic supplementary material, appendix table A), or equivalently: principal axis alignment [18], an approach that has been shown to yield much poorer alignment than optimization-based alignment procedures [10]. It is therefore conceivable that improved spatial alignment would yield improved biometric identification. It is also notable that previous PPI studies used a variety of pre-selected features to be extracted from the raw data (figure 1), but none, to our knowledge, has conducted a systematic evaluation of the relative effectiveness of different features. The purposes of this study were thus: (i) to explore the feasibility of PPI-based gait recognition on a larger sample of subjects (n > 100), (ii) to systematically compare a variety of spatial alignment procedures, and (iii) to systematically compare a variety of features and feature extraction procedures.

2. Methods

2.1. Data

Plantar pressure data were collected from 104 healthy individuals at the University of Münster (table 1). These data were previously used to compute a healthy ‘average’ pressure distribution [19]. Data were recorded for 1.0 s at 50 Hz using an EMED ST4 system (resolution: 5 mm; Novel GmbH, Munich, Germany). Each subject performed a total of 10 trials of self-paced walking, five for each foot, yielding a total of 1040 three-dimensional (x, y, time) images (figure 1a). ‘Follow-up’ data from 10 of these subjects were collected separately (table 1); these data were obtained up to 5 years prior to the main data collection sessions. Prior to participation, all subjects provided informed consent according to the policies of the University of Münster.

View this table:
Table 1.

Subject characteristics. Averages, with s.d. in parentheses. ‘Follow-up’ data included five females and five males; main dataset, Spring 2009; two follow-up subjects, 1.5 years later; eight follow-up subjects, 1.5–5.0 years before.

The left- and right-foot images were examined separately after finding that single-foot analyses yielded sufficiently high performance. This is justifiable, we believe, because (i) the literature shows that lower limb dominance is poorly defined [20], (ii) naturally occurring gait asymmetries tend to load left and right feet differently [21], and (iii) in post hoc analyses, we found no systematic left–right asymmetries among subjects. We may thus justifiably regard the left- and right-foot datasets as essentially independent, at least for the purposes of validating our methods on the population from which the present subjects were drawn.

2.2. Image alignment

Images were spatially padded by adding at least 1 cm of zero pressure rows/columns to the foot periphery. They were then temporally aligned so that the first (x, y) time slice corresponded to initial heel strike. Following padding, all images were contained in a 65 × 29 × 50 voxel grid (x, y, time) (94 250 voxels) of which an average of 8291 voxels (8.8%) were non-zero for any given trial; across all subjects and trials, 33 143 (35.2%) were non-zero. The raw data were quite smooth (figure 1a,b) so images were neither spatially nor temporally filtered.

Subsequently, three categories of spatial alignment procedures were tested (table 2). The first: ‘None’ performed no alignment, passing raw images directly to feature extraction (below). The second: ‘Decorrelation’ performed a principal axis transformation to centre the pressure-weighted foot centroid and to vertically align the foot's minor principal axis. The third: ‘Registration’ [22] used a rapid frequency-based alignment procedure [9,23] to automatically align a given image to a foot template. The goal of the algorithm was to maximize cross-correlation, first in the frequency domain to optimize horizontal and vertical foot translations, and then in the log-polar domain to optimize foot rotation. Example data (figure 3) reveal that both Decorrelation and Registration tended to improve alignment, although Registration performed qualitatively better, agreeing with previous results [10].

View this table:
Table 2.

Spatial alignment methods.

Figure 3.

Spatial alignment example, first subject. Rows (a,b,c) depict the original, decorrelated and registered images, respectively; here the registration template was RegMunCont (table 2). The thick dark outline depicts the cross-trial mean.

For Registration, seven template images were tested including: (i) the morphologically average contralateral foot from the Münster data sample (RegMunCont) [19], (ii) an average foot from a separate study, a separate laboratory and collected with a different manufacturer's equipment [10], and (iii–vii) average feet of the chronologically first five subjects from the cited study. Bilinear interpolation was used for all image transformations.

2.3. Pre-features

Since desktop computing memory was inadequate to submit the three-dimensional images directly to classification routines, and as classifiers generally perform more poorly with increasing dimensionality [24], the images were first reduced to 10 different two-dimensional spatial (x, y) ‘pre-features’ by extracting specific characteristics of each pixel time series (figure 1c,d and table 3); we refer to these as pre-features to distinguish them from the final features upon which classification was based; the final features were extracted automatically from the pre-features using various dimensionality reduction techniques (§2.4).

View this table:
Table 3.

Pre-feature descriptions. Here, I(x,y,t) is the image time series, p denotes percentile, k indexes the ordered observations of a particular pixel's time series, pk is the percentile of the kth ranked observation, n is the number of observations and ε is a pressure threshold (manufacturer-set to ε = 5 kPa in the current dataset). In the percentile equation k is not a time index, but rather indexes sorted observations and k may be different for each pixel's time series.

Specific pre-features included those commonly used in the plantar pressure literature: ‘peak pressure’ or equivalently ‘maximum pressure’, or equivalently ‘100th percentile pressure’ (P100). This two-dimensional variable represents the maximum pressure experienced by each part of the foot over the course of stance, and is by far the most common variable seen in the plantar pressure literature, often used to check for plantar tissue overloading [8]. Other common variables analysed included: the pressure–time integral (PTI), contact duration (CD) and time-to-maximum (Tmax) [8]. The PTI represents the total loading during stance; areas of the foot with brief, high-pressure impulses may have a PTI value similar to those areas with long, low-pressure impulses. Since the precise variable(s) regulating plantar tissue breakdown are unknown, PTI, which quantifies loading in a different way, has also been commonly analysed in the literature. CD is a PTI-like variable which considers only loading duration, not magnitude, and Tmax represents yet another loading feature: loading rate (with respect to initial heel contact). The point is that PPI data are complex, and that no single two-dimensional variable can characterize the three-dimensional loading profile.

In addition to these common variables, we also tested one that is less commonly used: time-to-first contact (Tfirst) [25] and others that, to our knowledge, have not been previously reported, i.e. the 90th, 80th, 70th, 60th and 50th percentiles (P90, P80, P70, P60 and P50). The Tfirst variable reflects the speed with which one transitions to different parts of the foot and thus, like Tmax, represents a single specific temporal feature of the dynamic loading pattern. This is less common than the aforementioned variables, most likely because load magnitude is quite low at first contact. The percentile variables, we believed, were also worth testing, partly because P100 is a maximum function, and therefore may be more susceptible to sensor noise than other percentiles, and partly to check if there was a systematic effect on the ultimate results as one considers relatively higher pressures. All aforementioned pre-features were tested either individually or in pairs, by vectorizing then stacking two-dimensional images. Since the full image time series were too large for practical testing, two-dimensional feature-pairing permitted inclusion of additional dynamic characteristics.

2.4. Dimensionality reduction

The second feature extraction phase used automated dimensionality reduction to further reduce the pre-features to a dimensionality most effective for classification. Reduction algorithms included (table 4): Laplacian eigenmaps (LEs) [26], normalized spectral clustering with a symmetric Laplacian [26], kernel principal component analysis (KPCA) [27] and locally linear embedding (LLE) [28]. Following semi-factorial analysis (§3.1), we found that reduction to a dimensionality of 70 (from 1885 dimensions for single pre-features and 3770 dimensions for paired pre-features) worked well for these data. Other reduction parameters were manually tuned for the right foot using 104-fold cross-validation (CV; §2.5), and final performance was verified on the left foot dataset and also with a separate (leave-one-out) validation scheme. As a baseline comparison, we also used no dimensionality reduction, submitting pre-features directly to classification.

View this table:
Table 4.

Dimensionality reduction methods.

2.5. Classification

Classification of the final features was performed using nearest neighbour (1NN) classification; this is the simplest possible classification scheme, detecting only the image most similar to the test image (i.e. minimum Euclidian distance) in reduced feature space. Although simple, 1NN was selected to emphasize the power of automated dimensionality reduction for biometric-relevant feature extraction. Classifier performance was validated using 104-fold CV and separately using leave-one-out CV to ensure that 104-CV was not biased. We also employed a stratified 5-CV, wherein the first image of each subject was retained for testing, while the remaining four were used for training, and then repeated for the second images, third images, etc. This scheme (with a testing–training ratio of 25%) was adopted to ensure that the low testing–training ratio of 0.97 per cent in 104-CV was not a biasing factor.

2.6. Algorithm evaluation

A full-factorial evaluation of all aforementioned factors (alignment, pre-features, dimensionality reduction techniques, classification algorithms) would have required a prohibitively large number of iterative tests so we narrowed our focus by conducting semi-factorial evaluations in an ad hoc manner. For example, if variable P100 was found to perform generally better than other pre-features, then we used P100 to explore different alignment procedures, and the resulting best alignment procedures were used to re-test all pre-features. While incomplete, this approach proved to yield highly accurate classification performance.

Statistical hypothesis testing was conducted on a variety of classification-relevant metrics in an ad hoc manner as context demanded. For example, a paired-sample t-test was used to test whether the difference between the None and Decorrelation alignment methods was different from zero; the motivation for this particular analysis was to examine whether Decorrelation, the predominant alignment procedure in the literature (electronic supplementary material, appendix table A), is a better alignment choice than None. All aforementioned data processing was conducted in Matlab v. 7.10 (The MathWorks, Natick, MA, USA), and all figures were created using Matplotlib v. 0.99 as released with the Enthought Python Distribution v. 5.0 (Enthought Inc., Austin, TX, USA).

3. Results

3.1. Basic results

With no image processing at all (except for image padding), 1NN classification identified individuals with an accuracy of 90.8 per cent using the P100 pre-feature (figure 4). Decorrelation surprisingly yielded a slightly lower average CR of 90.2 per cent, while Registration markedly increased the average CR to 98.9 per cent. Dimensionality reduction also tended to improve CRs (figure 4), although to a lesser extent than registration.

Figure 4.

Classification rate (CR) for all 104 subjects using the P100 pre-feature. See tables 3 and 4 for alignment and dimensionality reduction and alignment method descriptions. Dark grey, left feet; light grey, right feet.

Across both feet, the best-performing embedding dimension was 70 (figure 5). Using this dimensionality, and following a systematic, semi-factorial study of the different alignment algorithms, pre-features and dimensionality reduction schemes (tables 5 and 6), the highest CR we were able to achieve in a single foot was 99.8 per cent (519/520 correctly classified images). This was achieved on the right foot using RegMunCont alignment, the combined P100 and P80 pre-features, and LLE dimensionality reduction. For this set of parameters the left foot CR was 99.4 per cent (517/520). Our semi-factorial analyses and manual parameter tuning were found to be unbiased as leave-one-out CV (table 6b), as well as validation on the left foot yielded practically identical results (table 6a). Additionally, we found that the low testing–training ratio of 0.97 per cent in our validation scheme was not a biasing factor, as a 5-CV scheme (with a testing–training ratio of 25%) yielded CRs of 99.4 per cent in both the left and right feet.

View this table:
Table 5.

Semi-factorial analysis: alignment and single pre-features, left foot. Data are CRs,%. Data reduction, LLE. The alignment methods and features yielding CR > 90% are in bold.

View this table:
Table 6.

Semi-factorial analysis: pre-processing and dimensionality reduction methods. Data are CRs,%. Combined pre-features: P100 and P80. The best-performing methods are in bold.

Figure 5.

CR as a function of embedding dimension (alignment, RegMunCont; pre-feature, P100; reduction, LLE). Black line, left foot; grey line, right foot.

3.2. Follow-up dataset

Using the aforementioned ‘best’ parameters, CRs for the left and right feet were 98 per cent (49/50) and 90 per cent (45/50), respectively, for the 10-subject follow-up dataset (figure 6a,b). We note, however, that one of the follow-up subjects had significantly higher right-foot heel and metatarsal pressures in the 2007 ‘follow-up’ trials than in the 2009 ‘original’ trials (figure 6c; p = 0.005, two-sample t-test on extracted regional data [8]), and this led to four out of five misclassifications for this subject's right foot. Upon questioning, this subject could not recall any orthopaedic condition that could explain the 2007–2009 metatarsal pressure difference. We also note that all five of this subject's left foot follow-up images were correctly identified. If we exclude this subject's right foot data from follow-up analyses, the CR across the nine remaining subjects would be 97.8 per cent (44/45).

Figure 6.

‘Follow-up’ test results. (a,b) Number of correctly classified images (out of five) for the left and right feet (light and dark bars, respectively) and for all 10 follow-up subjects (s01 … s10). Numbers in the white boxes indicate the number of years between collection of the follow-up and main datasets. (c) Left foot P100 images for subject 6, mean across five trials. The ‘Target’ image is from the main dataset. (Online version in colour.)

Once the classifier was trained on the 520 images from the original dataset, each follow-up image was read from disc and classified in 2.8 and 12.5 ms, respectively, as tested on a desktop computer (2.93 GHz dual-core processor, 4 GB memory) and averaged across the 100 follow-up images. Even though data transfer delays between pressure measurement systems and PCs are longer than reading from disc (approx. 64 ms, pilot results), a single footstep could still likely be identified within 100 ms of toe-off in a real-time implementation.

3.3. Decorrelation

Decorrelation decreased the average CR by 3.4 and 3.6 per cent for no-reduction and LLE-reduced data, respectively, across all pre-features (table 5) and both feet. After correcting for (two) multiple comparisons with a Bonferroni threshold of p = 0.025 (family-wise type I error rate: α = 0.05), paired t-tests verified the significance of this decorrelation-induced CR drop (p < 0.001 and p = 0.004 for None and LLE, respectively). This finding was supported partially by root-mean-squared error (r.m.s.e.) results for the no pre-processing, decorrelation and registration (RegMunCont) conditions of 22.4 ± 7.5, 18.0 ± 7.2 and 12.2 ± 5.7 kPa, respectively (mean ± s.d., computed with respect to the intra-subject mean foot). It was further supported by ANOVA on no alignment versus decorrelation m.s.e.; a significant SUBJECT effect was found (p < 0.001), but no significant DECORRELATION effect was found for either the entire time series (p = 0.934) or for the P100 pre-feature (p = 0.339). A marginal FOOT effect was found for the time series data (p = 0.070) but not for the P100 pre-feature (p = 0.338); since our best-performing classifier used only two-dimensional pre-features (including P100) we may conclude that decorrelation's failure to reduce intra-subject m.s.e. was similar in both feet. In agreement with the present CR results (figure 4), the present ANOVA results imply that decorrelation was not effective at reducing intra-subject variability. Therefore, choosing decorrelation over no alignment may not be statistically justified, in general, unless initial foot posture is highly variable. Indeed, over all tested parameter combinations, registration invariably out-performed decorrelation.

3.4. Foot shape versus pressure distribution

The best alignment and reduction schemes with a binary P100 pre-feature (i.e. a binary image defined by the inequality: P100 > 0) yielded CRs of 93.7 and 96.5 per cent for the left and right feet, respectively. As compared with the continuous-pressure P100 pre-feature (figure 1d), binary features reduced the CR by only 4.2 per cent, suggesting that a large proportion of the present classification-relevant information was derivable simply from 5 mm-resolution foot shape. Nevertheless, in semi-factorial studies, we were unable to achieve binary P100 performances greater than 97 per cent, suggesting that pressure distribution information is necessary for optimal subject identification.

4. Discussion

4.1. Classification

The fact that essentially no processing (except for zero-padding) yielded CRs greater than 90 per cent across 104 subjects, as well as the currently best results of CR > 99%, strongly suggest that PPI data contain high-quality biometric information. This inter-subject uniqueness could only have been in embodied in plantar foot shape, dynamic plantar pressure distribution, or both, as these constitute the only subject-specific information sources in PPI data. The present binary image results of CR ≃ 95%, which were very similar to previous binary image results of CR = 94.6% [16] clarified that foot shape itself constituted a substantial source of classification-relevant information in the current sample. Nevertheless, the original non-binary data pushed these CRs above 99 per cent, suggesting that pressure patterns embody additional non-trivial inter-subject uniqueness.

In agreement with reports of high day-to-day PPI reliability [29], follow-up testing was also highly successful, yielding CRs of approximately 98 per cent, despite fairly extensive delays of up to 5 years between testing sessions. Together with the presently estimated processing times of less than 100 ms per footstep, these CR results suggest that PPI-based biometric identification may be suitable for real-world security applications.

Recent successes in PPI-based classification of healthy foot types [30], pathological state [31] and PPI-based fall detection [32] indicate that the current registration-based approach may also be useful for health-related applications. We hope to explore some of these applications in future work.

4.2. Previous studies

The current CR results are, to our knowledge, higher than previous purely PPI-based identification studies (electronic supplementary material, appendix table A) except a previous five-subject study [33] (CR = 100%). The best-performing algorithm on a database of at least n = 10 subjects was Jung et al. [11]: CR = 98.6% (n = 11), but a potential drawback of this study was that two steps were obtained on a short (80 cm) platform; given average foot lengths of 25.5 cm [34] and average stride lengths of 76 cm [35], subjects would have had to adopt unnaturally short strides to achieve two complete footfalls on the measurement platform. Regardless, Jung et al.'s results imply that a larger database of subjects may be identifiable even during unnatural or constrained gait. The remaining studies examined fewer than 12 subjects (except for [15]: CR = 86.1%, n = 32) and reported moderate CRs in the range 64–94%.

The higher current CRs can only be explained, we believe, by better data quality (spatio-temporal resolution, accuracy, precision, etc.), better feature selection, or both. Some studies, for example, used PPI systems with considerably less spatial resolution [12,36,37] (approx. 35 mm). Others used relatively low-dimensional features like approximately 10-dimensional region of interest pressures [38] and approximately 100-dimensional centre of pressure trajectories [11,15,33,3941]; this is contrasted with the current approximately 8000-dimensional pre-features. Thus, compression of PPI data, either by sensor resolution or by lossy data reduction, likely sacrifices identification-relevant features. Automated dimensionality reduction, also used in previous investigations of biomechanical (kinematic) data [42,43], thus appears to be a more robust data compression tool.

4.3 Spatial alignment

Registration presently out-performed decorrelation over all tested parameter combinations, yielding CR improvements of the order of 10 per cent despite moderately high pre-registration CRs of 85 per cent or more. Registration's successes are somewhat unsurprising because registration's explicit goal is to minimize a dissimilarity metric which, by definition, reduces intra-subject variability. Its successes are also consistent with previous reports that a variety of registration approaches both qualitatively and quantitatively out-perform decorrelation [10].

It was more surprising that decorrelation performed worse than no spatial alignment in many cases. This can be partially explained by stereotypical foot postures adopted by subjects—particularly, the angle of the foot's longitudinal axis with respect to progression direction [44]. Decorrelation removes this information because the foot becomes rotated to a ‘vertical’ posture. While registration to an arbitrary template would also remove some of this stereotypical posture information, registration achieves better intra-subject alignment [10], so postural information likely becomes less relevant once better alignment is achieved. Rather than registering to an arbitrary template, as was done currently, it would be interesting to test a registration scheme that iteratively registers a given PPI to a mean database image for each subject. This was not done currently because improvements would not be noticeable beyond the present CRs of 99.6 per cent.

As an aside, we note that many previous PPI-based identification studies used decorrelation for spatial alignment [11,15,40,45]. Despite its prevalence in previous papers, the current results strongly suggest that decorrelation is a poor alignment choice. While we have speculated on potential mechanisms for decorrelation's poor performance (i.e. loss of stereotypical foot posture) it would be interesting to directly test this assertion by incorporating initial posture as an additional feature in a decorrelated dataset. However, as we had no reason to expect decorrelation's poor performance prior to the present results, we leave this hypothesis for future work.

We wish to emphasize that we do not believe that the current registration scheme [23] was particularly special in terms of generating higher CR; there are a plethora of registration algorithms in the literature [22], and indeed a variety of methods have been shown to yield similar results in plantar pressure data [10]. Furthermore, in post hoc analyses, we employed a completely different registration scheme [10] and achieved similarly high, albeit slightly lower CRs of approximately 97.5 per cent. The current algorithm was selected simply because it was fast and has worked well recently. To rule out a particular registration scheme as a limitation, it would be prudent to evaluate other algorithms in future work.

4.4. Feature extraction

The best-performing single pre-features were P100, P90, P80 and PTI (table 5), and the best pre-feature combination of P100 and P80 only marginally improved ultimate CRs (table 6). This gives anecdotal credence to the extensive use of P100 and PTI in the literature [8] as information-dense parameters. To our knowledge, P90 and P80 have not been previously examined. One explanation for the success of the P100 and P80 combination is that this essentially represents a dynamic gradient, albeit a low-frequency one, and that this low-feature gradient also contains subject-specific information. However, it does not explain why P100 and P80 were better than P100 and P90. Regardless, since the performances of the P80, P90 and P100 pre-features were all quite high, a systematic exploration of their differences would not be possible without more data.

Moreso, than particular, pre-feature selections, and with the exception of KPCA, dimensionality reduction was found to invariably improve CR (table 6), albeit to a smaller extent than registration. While the CR improvement was small, it was non-trivial, pushing the average CR beyond what was achievable with raw, spatially aligned pre-features. We may thus conclude that while certain pre-features perform very well, only with dimensionality reduction can optimum CR be achieved. In other words, there are classification-relevant patterns in the pre-features that cannot be extracted in an a priori manner.

As an aside, we note that the present percentile pre-features (P90, P80, etc.) were computed over all time frames (figure 1c), and are therefore dependent on both the duration of supra-zero pressure and the recording duration (1 s). In post hoc analysis, we also computed percentiles over CD, but we found little qualitative effect on the ultimate results: P100 was the best-performing percentile, and CR systematically reduced with percentile (table 5).

We also wish to restate that we presently did not conduct temporal normalization (aside from heel strike alignment). Temporal normalization was deliberately avoided in order to give the present time-related features like CD the best possible chance to reflect temporal differences amongst subjects; if subjects walked with very different speeds, for example, then CD would best reflect these different speeds when non-normalized. However, the fact that CD and Tmax performed relatively poorly (table 5) suggests that inter-subject temporal differences were not as important as the pressure-related differences.

Finally, the present pre-feature list was incomplete. All two-dimensional (x, y) pre-features were derived from the original three-dimensional (x, y, time) image, but additional variables could have been analysed like the spatial pressure gradient [46] and the spatio-temporal (x, time) 100th percentile [31]. It may be informative to investigate such variables in future work.

4.5. Limitations

A major practical limitation of the current study is that we investigated only unshod walking. It is conceivable that shod walking considerably distorts classification-relevant pressure patterns and/or that subjects are not recognizable if they wear different shoes. A second key limitation of this study is that only natural self-paced walking data were collected; PPI data are known to change with walking speed [47], fatigue [48] and a variety of other factors [8], and we note that some previous PPI-based identification studies have indeed incorporated some of these factors in experimental classification tests [11].

Walking speed, in particular, would be interesting to consider; although general foot morphology does not change with speed, and thus binary features (§3.4) should be largely unaffected, the non-trivial pressure redistributions associated with walking speed [47] would likely affect subject separability, and it would be prudent to empirically define the walking speed limits that retain separability. However, as we can easily measure walking speed using cameras, and/or using foot-CD as a proxy, it may be possible to algorithmically compensate for walking speed variability, for example, by introducing temporal normalization, or by scaling pressures in certain foot regions.

Although individuals can deliberately walk in atypical ways to avoid detection, many gait recognition applications involve desired identification, situations in which an individual wants to be identified (e.g. automated airport immigration control). For other applications, it may be necessary to test the current algorithms on experimentally manipulated gait. Finally, we presently consider only particular testing–training ratios in our model assessment. It would be prudent to systematically explore testing–training ratios, with more images for each subject, to find the optimum number of images one should obtain if implementing a real-world plantar pressure-based identification scheme.

5. Conclusion

Normal self-paced unshod walking produced a high-quality plantar pressure-derived biometric, and the present identification implementation yielded CRs of 99.6 per cent in n = 104 individuals. These results were largely driven by spatial image registration and, to enable finer subject differentiation, automated dimensionality reduction. As plantar pressure data are highly unique among individuals, and as data can be easily collected and processed using commercial in-floor hardware, plantar pressure-based identification appears to have strong potential for a variety of security and health applications.


Funding for this work was provided by Special Coordination Funds from MEXT, Japan. We thank Prof. Robin Crompton and Mr Russ Savage of the University of Liverpool for their early contributions and continued support of this project.

  • Received July 3, 2011.
  • Accepted August 18, 2011.


View Abstract