Abstract
Track geometry measurements are regularly collected to monitor the condition of a railway network. To detect deterioration patterns and enable predictive maintenance, sequential measurement runs must be mutually aligned which has been proven a serious challenge. This paper presents a novel algorithm for mutual alignment of track geometry signal data. It resolves several previously intractable alignment problems: highly segmented data with variable sample rate, spatially correlated and uncorrelated measurement errors, convergence to true locations, and consistency over time. The algorithm adjusts spatial measurement errors by splitting signals in continuous segments. Re-sampled, error-corrected signals are mutually aligned using cross correlation, and this process is repeated until the mutual alignment meets a pre-defined precision threshold. Missing measurement values are handled by imputing an interpolated offset from nearby segments, ensuring that the signals remain continuous. By using weighted average offsets over all aligned signals, the law of large numbers guarantees convergence and consistency. The practical feasibility of the algorithm is demonstrated on empirical track geometry measurement data from the British railway network, owned and operated by Network Rail.
In order for railway networks to meet higher standards of service level and safety, condition monitoring and predictive maintenance are becoming increasingly important. Collection of track geometry data is an established procedure to facilitate evaluation of the functional condition and deterioration patterns of the railway track ( 1 – 3 ). Track geometry data are normally collected from sensors, either on dedicated track recording vehicles or on in-service vehicles. Sensor measurement indicators, techniques, and accuracy are well-studied and constantly improving topics. Some recent examples on advancements in the railway sector include replacement condition detection of railway points ( 4 ), degradation of railway crossings ( 5 ), image-based detection of track geometric parameters ( 6 ), fault diagnostics of railway axle bearings ( 7 ), image detection of fasteners ( 8 ), machine learning for analyzing the effect of geocell installation on track geometry quality ( 9 ), and novel filtering and smoothing algorithms for railway track surveying ( 10 ). However, predictive models usually need data from sequential condition monitoring runs to estimate deterioration patterns. As stated by Weston et al: ( 11 ) “Mutually aligning data is critical for obtaining the changes in track condition over time. Mutual alignment of data from different runs over the same track presents a serious challenge.” These challenges are present in monitoring systems based on both tachometers and global navigation satellite systems (GNSS). For tachometers, Weston et al. ( 11 ) points out that a slightly different wheel radius will cause an accumulation of distance errors. For GNSS-positioning, various sources of measurement errors occur, including, for example, clock-related errors, signal propagation errors, receiver noise, signal jamming, and signal spoofing ( 12 ).
Previous studies have dealt with alignment of multiple signals in different ways. Common techniques to improve accuracy in positioning of condition measurements are data fusion between GNSS and INS (inertial navigation systems), on which there is extensive research ( 13 , 14 ). Another technique to identify the actual position of a train or measurement is map matching ( 15 , 16 ). Both of these methods can improve the position accuracy of a single run, and thus significantly simplify the process of mutually aligning several measurement runs. However, according to Al-Douri and Tretten ( 17 ), existing studies have not adequately dealt with the problem of sufficient mutual alignment from a maintenance decision-making point of view.
To further improve alignment, post-processing algorithms can be imposed. In his dissertation, Faiz evaluates three different algorithms for mutually aligning track geometry data: distance alignment, fixed-window-based alignment, and parameter-based alignment. Faiz et al. ( 18 ) conclude that the latter provides the most accurate signal data alignment. The parameter-based alignment finds the closest match for each parameter value by incrementally shifting each parameter one row and calculating the minimum absolute error over time. This method works well when the signals have an overlap between different runs, but signal measurements from disjoint segments of a railway track will not be well aligned. As non-overlapping segments are very common in practice, an algorithm intended to work in a predictive maintenance setting needs to address this issue.
Bergquist et al. ( 19 ) conclude that pattern matching between runs, and then the use of the average location for all measurements, is a straightforward alignment algorithm in theory. However, because of post-processing, measurement errors, differences in measurement equipment, and potential loss of data in the collection process, alignment in this fashion faces significant problems. They also propose that for regular maintenance, the exact location of point defects is not as important as monitoring deterioration of an extended segment ( 19 ).
Even though the exact location of a defect is not a requirement for predictive maintenance, the mutual alignment algorithm must ensure a location estimate that is accurate enough to enable maintenance actions on the track. Al-Douri and Tretten ( 17 ) state that current research has not yet solved this issue: “Overall, data are inaccurate, there is no testing phase using realistic data, and existing models are insufficient. This has a negative impact on maintenance decisions.”
We address these concerns by proposing an algorithm for mutual alignment of sensor measurement data of track geometry parameters. Novel aspects of the algorithm are:
Alignment of highly segmented data, that is, aligning signals which do not necessarily have an overlap with other signals.
Alignment of data with variable sample rate.
Alignment of data with various aspects of measurement errors, spatial errors, missing locations, and missing measurements.
Alignment ensuring convergence toward the true location.
Alignment which is consistent over time, that is, data are aligned even after maintenance has occurred and changed the track geometry.
In summary, the algorithm presents a working method to align large amounts of track geometry data, in spite of measurement errors and/or missing data which are common in practice. The algorithm addresses several issues pointed out in available literature, such as accuracy and practical feasibility. To our knowledge, no previous method has resolved all of the above stated issues.
Materials and Methods
Data Material
The algorithm described in this paper is used to align real track geometry data from the British railway network, owned and operated by Network Rail. The example data was collected from Network Rail’s Engineer’s Line References (ELR) VTB3 1100, going from Earlswood mileage change to Brighton. Measurements were made between June 26, 2017 and July 1, 2019. In total, this dataset consists of 20 measurement runs, ranging from 2,047 m to 46,775 m in length.
The track geometry data is collected by Network Rail’s fleet of track recording vehicles. Measurements are pre-processed using data fusion between an onboard RTPS (real time positioning system) and map matching. The RTPS consists of D-GPS (differential global positioning system), inertial measurement unit, and tachometer. Spatial positions of each query run are derived and map matched to an integrated network map, the route setting (RS) files, providing information about the route. The reference RS files may not align fully to RTPS which can cause inaccurate distances.
Because measurement vehicle layout is different, the reference position of each track geometry system and RTPS varies between vehicles. In practice, trains running in the same orientation show a more consistent alignment when data is overlaid. However, recordings of the same section of track result in locations varying several meters between runs from the same vehicle running in a different orientation, or different vehicles in running in different orientations.
Track geometry measurements are spatially sampled approximately every 200 mm by a tachometer which is independent of the GPS providing its location. Each data point receives an applied position that may or may not equate to 200 mm. Recording vehicles will also have different tachometer counts, causing initial unpredictable drifts, which the RTPS tries to compensate for. The GPS-positioning adjusts the location continuously, which means it sometimes re-adjusts locations, causing visual overlaps in the signal measurements.
Table 1 describes nine track geometry parameters which are aligned using the proposed algorithm.
Network Rail Definitions of Track Geometry Parameters ( 20 )
Algorithm
Figure 1 provides a flow chart of the algorithm. Examples of alignment performance are presented in in a case study in the Results section.

Flow chart for the alignment algorithm.
Sampling
The true signal sampled by the track measurement equipment is modeled as a continuous function of location
Repeated measurements form a time series of signals
The signal of every track geometry parameter is sampled unevenly, and with occasional spatial gaps between measurements. The spatial measurement errors
Continuous Segments
To simplify processing and notation we split a sampled signal into segments and model continuous pieces of the full continuous signal. In each a segment a zeroth order location estimate
where
Gaps
A sample may have gaps that are not explicitly marked in the sequence of observations. Such gaps can, however, be identified by verifying that the distance to the next location is much greater than expected with Inequality 5 or, more specifically, Inequality 6:
where
Location Resets
As the tachometer sequentially samples a signal y, the RTPS records the location x. The RTPS has a correction mechanism that periodically resets the location, which causes sudden jumps in the location sequence even though the track recording vehicle is steadily moving in one direction (see detailed description in the Materials and Methods section). Such jumps are characterized by an unexpected distance between two consecutive observations (Inequality 7). These sudden jumps can even make the measurement locations non-monotonic (
When the situation represented by Inequality 7 is detected a fixed step size is imputed in the estimated location:
This implies that the estimated error
We construct an estimated location
Since the the operation is symmetric, the average error is

Non-monotonic locations. Upper panel: Signal by location. Lower panel: Signal by index.
Convergence to True Location
Repeated samples
Drift
Consecutive measurements in a given sample
These restrictions let us start with a coarse local alignment of the samples based on the largest drift possible (Inequality 13) and refine details (Inequality 12) with increasingly local (Inequality 11) alignment steps.
A drift that changes with frequency f and amplitude a will oscillate from
Cross Correlation
The offset between two signals is found through cross correlation, a standard signal processing technique to search for similarity between signals which has been used extensively in various fields ( 22 , 23 ). Some examples from a wide variety of topics are high-speed sampling oscilloscopes ( 24 ), DNA sequence alignment ( 25 ), identifying street poles from images ( 26 ), and alignment of cosmic waves ( 27 ).
For continuous signals f and g, the cross correlation is defined as:
where
For signals measured at discrete points f and g, the cross correlation is defined as:
where
Alignment
An iteration of alignment starts by reconstructing an estimated signal. Let
The reconstructed signal is undefined for all x where the nearest sampled observation is further away than
A discrete approximation of this function uses a sample rate
o is undefined if the maximum correlation is less than a pre-defined threshold
Imputation
Let
with
Refinement of Spatial Error Estimate
The refined spatial error
The refined spatial errors are estimated through iterating with half the window size until the window is small enough to capture the fastest changes in drift that are of interest (Inequality 11).
Results
Case Study
The visual alignment is how the end user perceives the performance of the algorithm. Figure 3 shows an alignment of a track geometry parameter where multiple window sizes are needed to mutually align the signals. QueryFileDate represents the date of the track measurement in the format yyyymmdd. Measurement running 20170526 to 20190114 have a drift of approximately 15 m, but the locations of the measurements from 20190211 drift approximately 60 m from the other runs.

Example of drift where different window sizes are needed for mutual alignment. Upper panel: Signal before alignment. Lower panel: Signal after alignment.
Figure 4 shows the estimated spatial error signal for the alignment shown in Figure 3. The offsets for the different measurement runs differ between 5 and 60 m. Figure 5 shows an example of algorithm performance when the measurements of a track geometry parameter have a heterogeneous behavior, possibly because of maintenance activities. Measurements from 20170821 to 20180528 are well aligned, while another group of measurements from 20180625 to 20190701 show a different behavior. A possibly incorrect signal, measured on 20190311, is not well aligned to either of the groups. In the lower panel of Figure 5, showing the measurements before alignment, it is almost impossible to distinguish different signal behaviors.

Estimated spatial error signals corresponding to Figure 3.

Example of signal with two different aligned groups, and one erratic signal QueryFileDate 20190311. Upper panel: Signal before alignment. Lower panel: Signal after alignment.
Discussion
Five challenging and previously unresolved aspects of mutually aligning track geometry signal data were stated in the Introduction: segmented data, variable sample rate, measurement errors, convergence to true locations, and consistency. This section discusses how our algorithm addresses each one of them.
Aligning Segmented Data
Aligning signals with cross correlation requires evenly sampled signals, equal in length. As shown in the section on Materials and Methods, measurement runs can range from 2,000 up to 47,000 m in length and not all runs may overlap.
By splitting each measurement run into continuous pieces, local cross correlation can be used to calculate the offset between pairwise continuous windows of two signals. For windows where the offset is undefined, that is, segments with no continuous overlap between signals, imputed offsets interpolated from nearby windows ensure that the estimated spatial error signal is continuous.
As highlighted in previous literature, it is critical to prove that mutual alignment works on realistic, empirical railway infrastructure data ( 17 ). Because measurement runs often differ in length and there may not be overlap between runs, an algorithm for mutual alignment must address this issue to be useful in practice. Our proposed algorithm successfully aligns segmented, empirical data with a very high precision in the alignment.
Variable Sample Rate
As described in section on Materials and Methods, the sample rate of the track geometry parameters measured by a tachometer is 200 mm. However, the GPS does not match the locations exactly to the tachometer sample rate. This mismatch must be adjusted to allow mutual alignment between signals.
Location resets in the RTPS causing mismatches between the track geometry parameter and locations are handled by creating an estimated location without resets. With resets removed, an accurate representation of the signal with one location per track geometry measurement point is achieved.
Measurement Errors
Our algorithm identifies spatial gaps in the sample signal and splits the signal into continuous pieces accordingly. Imputation of offsets (as described in the subsection Imputation) ensures that the signal is spatially correlated over the entire measurement run, despite the gaps.
The drift between measurement runs, that is, the spatially correlated measurement errors, is adjusted using an incremental reduction of local cross correlation windows. As seen in Figure 3, the drift varies vastly between different runs. At the same time, the precision of the alignment needs to be high enough to identify deterioration in the track geometry parameters. By iterating the algorithm using smaller cross correlation windows until sufficient granularity of the alignment is achieved (Figure 1), the proposed algorithm can mutually align signals with large drifts as well as obtain any desired level of precision.
Measurement errors in the track geometry parameters are not addressed by the algorithm. However, as can be seen in Figure 5, the alignment allows for erratic or incorrect signals to be identified. If the pre-defined correlation threshold is not met (see the section on Alignment), the offset of the estimated spatial error for that window is imputed from the nearest neighboring window. This procedure ensures that signals suffering from large measurement errors in the track geometry parameters will not affect the accuracy of the mutual alignment, and inaccurate measurements are more easily distinguished from accurate measurements.
Convergence
Through the law of large numbers, the mean of the estimated spatial errors will converge to zero, which ensures that the adjusted locations converge to the true locations. Previous literature concludes that monitoring deterioration of an exact location is not a prerequisite for effective maintenance ( 19 ). Although not crucial, convergence is a very attractive feature of the algorithm. It provides a reliable, stable estimate of the true location, reducing the need for other accuracy checks such as map matching.
Consistency
The railway track undergoes regular maintenance activities which are reflected in the measurements of the track geometry parameters. A static alignment algorithm, that is, aligning signals to a fixed reference, will inevitably cause inconsistencies when maintenance actions change the geometry of the track.
To achieve an alignment that is consistent over time, the algorithm must be able to update the alignment dynamically. When the track geometry parameters change because of maintenance, a new series of mutually aligned signals will occur, as seen in Figure 5.
Our algorithm attains consistency through two different steps. First, it imposes a threshold of each local cross correlation. This ensures that when the signal of a track geometry parameter changes characteristic because of maintenance, it is mutually aligned with other signals exhibiting a similar pattern. Second, the refinement of the estimated spatial error ensures that different groups of mutually aligned signals have an accurate location. By setting the sum of a weighted average over all windows to zero (as described in the section Refinement of Spatial Error Estimate), the estimated locations are consistent over time despite maintenance activities taking place.
In a production setting, consistency is maintained by running the alignment algorithm each time a new measurement run is added to the database.
Limitations
The law of large numbers implies that a sufficient amount of data must be aligned to ensure convergence. With only a few measurement runs, convergence to true locations cannot be guaranteed. The algorithm does not use fixed reference points, which is an advantage in terms of flexibility. However, in certain practical applications, for example, when compiling several geospatial data sources, fixed references might sometimes be required. This can be achieved by fixing the algorithm at a specific alignment, but by doing so, the algorithm does no longer converge to true locations as more data is added.
Conclusion
Condition monitoring is a crucial aspects for railway safety as well as efficient maintenance planning. The novel algorithm for mutual alignment of track geometry data presented in this paper addresses five key features which must be resolved to enable monitoring of deterioration over time as well as predictive maintenance. The algorithm successfully solves alignment of segmented data sampled at variable sample rates, spatially correlated and uncorrelated measurement errors, convergence to true locations, and consistent alignment over time. The limitations of the algorithm mainly apply to the amount of data and flexibility in the alignment required to ensure convergence. The advantages of the algorithm are shown in a case study by aligning nine different track geometry parameters on a reference line operated by British Network Rail.
Footnotes
Acknowledgements
The authors wish to acknowledge the support from Network Rail for releasing the algorithm to public knowledge. Special thanks to Jonathan Schofield for proofreading and insights, and Russell Licence for providing knowledge about the positioning systems on Network Rail’s measurement trains. Three anonymous reviewers contributed to improve the quality of this paper.
Author Contributions
Authors have contributed as follows: conceptualization: K. Eklöf and A. Nwichi-Holdsworth; methodology: K. Eklöf and J. Eklöf; validation: J. Eklöf; formal analysis: K. Eklöf; investigation: A. Nwichi-Holdsworth; resources: A. Nwichi-Holdsworth; data curation: K. Eklöf; writing—original draft preparation: K. Eklöf; writing—review and editing: K. Eklöf; visualization: K. Eklöf.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
