A three-dimensional clustering study of gearbox system failures based on frequency-domain feature structure analysis

Abstract

In actual operation, vibration signals from gearbox systems are susceptible to load fluctuations, speed variations, and environmental noise. The spectral differences between various fault states are often subtle, resulting in blurred sample boundaries and posing challenges for structural separation and state analysis. To address the aforementioned issues, this paper establishes a three-dimensional (3D) feature structure analysis framework based on frequency-domain statistical characteristics to investigate the structural distribution properties of gearbox vibration data. First, extract frequency-domain statistical features such as dominant frequency, energy, and bandwidth to construct a feature matrix. We then analyze the distribution patterns of different features in the sample space through two-dimensional projection analysis, identify key features that contribute significantly to the structure, and, by considering the correlations among features, perform structural consolidation and weighted fusion of features with similar trends to reduce the impact of redundant information on structural representation. Based on this, a 3D feature space is constructed to visualize the distribution patterns of samples under different operating conditions, and the structural separability is verified using the contour coefficient and various clustering methods. Experimental results demonstrate that, using field data and publicly available datasets that include various rotational speeds and multiple types of component failures, samples of different failure states all form distribution structures in 3D space characterized by clear boundaries and intra-class compactness, with contour coefficients all exceeding 0.92. Furthermore, the extracted feature sets exhibit a certain correspondence with fault vibration characteristics, indicating that this method is capable of effectively capturing the structural differences between various fault states under complex operating conditions. This provides a reliable feature foundation for gearbox system condition assessment, fault diagnosis, and subsequent intelligent recognition.

Keywords

gearbox systems no tags clustering frequency domain statistical features 3D structural analysis state recognition fault diagnosis

Introduction

In recent years, the gearbox system, as a key power unit in industrial equipment, is widely used in the fields of transportation, aerospace, machinery manufacturing, energy equipment, and so on, and its operation status directly affects the reliability and efficiency of the whole machine.^1,2 With the development of equipment in the direction of high power density, high speed, and long life, the operational safety and fault diagnosis ability of gearboxes have become an important link to ensure the stable operation of equipment.^3,4 Under the action of complex load and alternating torque, the gear meshing pairs and their bearings and other components are prone to wear, pitting, tooth breakage, and other early damage. The degradation process is often subtle and progressive. If not promptly identified and addressed, it may further impair transmission performance, potentially leading to system shutdowns and resulting in economic losses or safety risks. Gearbox system failures are the focus and difficulty in the maintenance of industrial systems because of their hidden nature and fast propagation.^5,6

Engineering sites are typically characterized by frequent changes in rotational speed, significant load fluctuations, and highly non-stationary signals, while the collected data often lacks clear annotation, which imposes certain limitations on fault identification methods that rely on empirical features or supervised models in practical applications.⁷ Under the coupled effects of complex loads and multi-source excitation, the frequency-domain responses of different fault states often exhibit a certain degree of structural superposition. The spectral differences between multiple types of minor faults are relatively weak, and the distribution of samples in the feature space is prone to overlap.⁸ The crux of this type of problem lies not in the difficulty of feature extraction, but in the fact that statistical features in the frequency domain often exhibit strong correlations and structural coupling. If dimension reduction or clustering is performed directly, it can easily lead to feature overlap, making it difficult to accurately capture the true differences between different states. In the engineering context where the spectral differences among various gearbox system failures are not pronounced and sample labeling is limited, how to clearly reveal the structural relationships among samples while maintaining the stability of the sample space distribution has become a problem that needs to be addressed in practical applications.^9,10

In recent years, researchers both domestically and internationally have conducted extensive studies on vibration signal processing, feature extraction, and condition monitoring to address the challenge of fault condition analysis in gearbox systems under complex operating conditions.¹¹ Time-frequency analysis methods such as the fast Fourier transform (FFT),¹² Envelope Spectral Analysis,¹³ and Wavelet Packet Decomposition¹⁴ are widely used for fault feature extraction; Building upon this foundation, supervised models such as support vector machines, K-nearest neighbors, and random forests,^15–17 along with related deep learning methods including convolutional neural networks, long short-term memory networks, and graph neural networks,^18–21 have demonstrated strong classification performance on standard datasets. However, under real-world engineering conditions, labeled samples are often limited, and distribution shifts caused by varying operating conditions can also affect the generalization ability of supervised models.²² In recent years, with the increasing prevalence of complex and unknown operating conditions, reliable and generalizable diagnostics have gradually emerged as key research areas in fault diagnosis. Related research has begun to focus on out-of-distribution testing, uncertainty assessment, and cross-condition generalization to improve the stability and reliability of models in practical applications.²³ In the absence of labels, unsupervised clustering methods such as K-means, density-based clustering (DBSCAN), and hierarchical clustering have been introduced for fault state classification.^24–26 Concurrently, nonlinear dimensionality reduction algorithms like t-distributed stochastic neighbor embedding (t-SNE)²⁷ and uniform manifold approximation and projection (UMAP)²⁸ are employed to transform high-dimensional features into low-dimensional representations. From an engineering perspective, existing methods typically perform dimension reduction or clustering directly based on high-dimensional frequency-domain features, without fully accounting for the correlations and structural relationships among these features. Because statistical features in the frequency domain are often highly correlated, directly mapping them to a low-dimensional space can easily lead to structural aliasing; at the same time, under two-dimensional (2D) representations, samples from multiple fault classes tend to exhibit local crowding and overlap in the embedding space, making it difficult to fully capture their underlying distribution patterns. This makes it difficult to clearly distinguish between different operating states and complicates the identification of fault conditions under complex operating conditions. Therefore, how to more fully characterize the structural relationships among samples while maintaining the stability of spatial distribution has become a key challenge in fault analysis under unlabeled conditions. Based on the above analysis, in the context of high-dimensional frequency-domain features, how to preserve the complete representation of sample structural relationships in a low-dimensional space is one of the key issues determining whether effective separation can be achieved in fault state analysis under unlabeled conditions.²⁹ Compared to 2D representations, three-dimensional (3D) spaces have a greater capacity to capture structure; this can, to some extent, alleviate the crowding and overlap of samples during the dimensionality reduction process, allowing for a more complete representation of the distribution relationships among various fault types in the spectral space, and reducing the cross-contamination and misclassifications caused by 2D compression. In practical applications, the 3D space not only provides additional structural degrees of freedom but also facilitates the observation of sample distribution patterns from different viewpoints, providing a more intuitive basis for analyzing the patterns of state distributions. Building on this foundation, representing high-dimensional frequency-domain features as 3D structures and validating the sample distribution using simple and robust clustering methods helps yield clearer structural grouping results under unlabeled conditions, thereby providing a spatial foundation for subsequent state assessment and fault diagnosis.³⁰ Based on the above analysis, this paper proposes a frequency-domain feature analysis framework for 3D structural representation to address the issues of structural aliasing in gearbox systems under complex operating conditions and the significant differences in the contributions of various features to the sample space. By extracting frequency-domain statistical features such as center frequency, energy, and bandwidth, a normalized feature matrix is constructed, and, and by analyzing the distribution patterns of different features in the sample space using 2D projections, identify the key features that contribute most significantly to the structural unfolding; based on this, and taking into account the correlations among features, features with similar trends are consolidated and weighted to reduce the impact of redundant information on the structural representation. We then construct a 3D feature space to visualize the distribution patterns of samples in different operational states, thereby allowing the relationships between samples—which tend to overlap in a 2D space—to be more fully represented. Clustering methods are used solely as a means of structural validation to verify the structural separability of samples in 3D space, rather than as a final classification model, thereby providing structurally meaningful criteria for the assessment of gearbox system status and fault diagnosis.

Methodological framework and structural modeling

This method is based on the frequency-domain statistical characteristics of vibration signals and focuses on feature structure analysis, 3D structural representation, and structural validation. First, the original signal is segmented using the sliding window method,³¹ and the time-domain signal is converted to the frequency domain using the FFT³² to obtain the complete spectrum. The spectrum is composed of both the frequency axis and the amplitude distribution, where the frequency reflects the periodic characteristics of the signal and the amplitude portrays the energy magnitude of each frequency component. Extract typical statistical features such as center frequency, energy, and bandwidth to form a sample feature vector. To eliminate differences in feature dimensions and scales, the feature matrix is normalized column-wise. Building on this foundation, we further analyze the geometric distribution characteristics and structural relationships among the features to construct a weighted feature space for representing 3D fault structures and verify the separability of the sample space through 3D structural visualization and clustering results. Figure 1 illustrates the overall approach of the method described in this paper.

Figure 1.

Framework for 3D fault structure analysis. 3D: three-dimensional.

Feature matrix construction

Building on the aforementioned preprocessing, we first establish a unified representation of the frequency-domain statistical characteristics of the vibration signals. For each segment of the vibration signal, we calculate typical statistical measures such as the dominant frequency, spectral energy, and bandwidth, and combine them into a d-dimensional feature vector in a fixed order. The feature matrix consisting of all the samples is denoted as X:

\begin{matrix} X = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 d} \\ x_{21} & x_{22} & \dots & x_{2 d} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{n 1} & x_{n 2} & \dots & x_{nd} \end{matrix}] \in R^{n \times d} \end{matrix}

(1)

where n is the number of signal segments, d is the feature dimension, and each row corresponds to the feature vector of one sample. This matrix provides a unified description of the key statistical information of the vibration signal in the frequency-domain, laying the foundation for subsequent analyses of the geometric distribution of features, shape importance metrics, and 3D structural representation.

Inter-feature 2D projections and shape importance measures

After obtaining the normalized feature matrix $X \in R^{n \times d}$ , it is necessary to further analyze the geometric distribution properties of different frequency-domain features in the sample space. Various frequency-domain statistical features typically reflect the characteristics of signal variations in different aspects, for example, the dominant frequency mainly reflects the variation of the periodic component, the bandwidth reflects the degree of expansion of the spectral energy, while the peak height is related to the impact component. Because different frequency-domain features vary in their sensitivity to and emphasis on signal changes, their distribution patterns in the sample dimension often differ; this difference directly affects how the samples are expanded into a multidimensional space. Traditional feature weighting methods often rely on class labels and measure feature importance based on inter-class distance, intra-class variance, or discriminative power, with the primary goal of improving classification performance. However, in the absence of labels, there are no clear category boundaries between samples, making it difficult to directly apply traditional discriminative weights based on supervised information. This paper focuses not on the classification performance of individual features, but rather on the extent to which different features contribute to the geometric structure of samples in a multidimensional space. To this end, the “Shape Importance” metric is introduced to characterize the role features play in supporting the formation of spatial structures.

To describe this geometric contribution, in this paper, we use a two-by-two combination of features, where any two features (x_i, x_j) are projected onto a 2D plane to form a point set Z_ij:

\begin{matrix} Z_{ij} = {z_{k}^{(ij)} = [x_{ki}, x_{kj}] ∣ k = 1, \dots 1 n} \end{matrix}

(2)

where x_ki and x_kj denote the values of the kth sample on the ith and jth feature dimensions, respectively, $z_{k}^{(ij)} \in R^{2}$ are the coordinates of the projection of the sample in the 2D plane formed by the feature pair (i, j), and n is the total number of samples. This set of points reflects the joint distribution of the features in 2D space.

When a point cloud formed by combining a particular feature with multiple other features exhibits a large spatial extent, it indicates that this feature can increase the distance between samples in different directions, thereby fully resolving relationships between samples that would otherwise overlap easily, and thus making a greater contribution to the representation of the overall spatial structure. Therefore, the degree of geometric expansion in a 2D projection can serve as an important measure of the role of the structural features. To quantitatively characterize the shape of a 2D point set, its geometric center is first calculated:

m_{ij} = \frac{1}{n} \sum_{k = 1}^{n} z_{k}^{(ij)}

(3)

where $m_{ij} \in R^{2}$ is the geometric center of the feature pair (i, j) in the 2D projection plane, computed as the average position of all the projection points, which is used to characterize the overall location of the sample distribution corresponding to that feature pair. And calculate the Euclidean distance from all samples to the center based on this reference:

\begin{matrix} r_{k}^{(ij)} = ∥ z_{k}^{(ij)} - m_{ij} ∥_{2} \end{matrix}

(4)

where $r_{k}^{(i, j)}$ denotes the Euclidean distance of the kth sample with respect to the geometric center in the projection plane of the feature pair (i, j), reflecting its degree of discretization under that projection; and $∥ \cdot ∥_{2}$ denotes the 2D Euclidean paradigm. Take the maximum value among all distances as the extended radius of this 2D projection:

\begin{matrix} R_{ij} = \max_{k} r_{k}^{(i, j)} \end{matrix}

(5)

For all feature combinations, form the matrix $R \in R^{d \times d}$ (matrix R is a symmetric matrix and the diagonal elements are all zeros), where R_ij characterizes the maximum degree of diffusion of the feature pair (x_i, x_j) in the 2D space. The larger this value is, the more dispersed the sample distribution is under this feature combination, which is more conducive to subsequent 3D structure unfolding. Furthermore, to determine the extent to which a single feature contributes to the overall spatial structure, in this paper, we take the average across all 2D combinations in which it appears to define the “shape importance” metric:

S_{i} = \frac{1}{d - 1} \sum_{j \neq i} R_{ij}, i = 1, \dots 1 d

(6)

Here, S_i represents the average expansion capability of the ith feature across different projection directions. This metric reflects the extent to which a feature contributes to the geometric structure of the sample space, rather than its ability to distinguish between categories in the traditional sense. Unlike statistical discriminant methods that rely on labels, shape importance is measured directly based on the distribution of samples, making it more suitable for structural analysis under unlabeled conditions. Furthermore, unlike manifold learning methods such as t-SNE and UMAP, which primarily preserve local neighborhood structures during the dimension reduction stage, the shape importance analysis proposed in this paper occurs during the high-dimensional feature construction stage. Its purpose is to identify key features that contribute significantly to the overall structure prior to dimension reduction, and to reduce structural overlap through subsequent feature fusion. These two methods operate at different levels—feature construction and spatial representation, respectively—and are not of the same type.

Inter-feature correlation structure identification and coupling relationship construction

In the “Inter-feature 2D projections and shape importance measures” section, the shape importance metrics obtained from 2D projections characterize the extent to which features in each frequency domain contribute to the geometric unfolding of the sample space. Furthermore, to reduce structural redundancy among features and to preserve the main patterns of spectral variations represented by different features, it is necessary to identify which features exhibit consistent response patterns as the samples change, in order to determine whether they reflect the same or similar failure mechanisms. Frequency domain features are often not independent of each other in practical signal processing: Energy-type features often change as a whole with the degree of shock; the main frequency and its octave typically shift in phase with changes in rotational speed or load, while statistical measures related to the pulse components, such as bandwidth and kurtosis, may also show similar trends. This synchronous variation is not merely a mathematical correlation; rather, it reflects the shared response of different characteristics to the same physical process. Therefore, before representing the 3D structure, it is necessary to identify a set of features that vary in tandem in terms of statistical behavior, thereby avoiding the redundant description of similar spectral variation information by multiple features, which could otherwise affect the stable unfolding of the sample space structure. For this reason, this paper uses a linear correlation based on the overall trend of the sample to portray the synchronization between features. Let the feature matrix remain $X \in R^{n \times d}$ , and the degree of correlation between the ith feature and the jth feature is defined by the Pearson’s correlation coefficient³³c_ij as:

\begin{matrix} c_{ij} = \frac{\sum_{k = 1}^{n} (x_{ki} - x_{i}) (x_{kj} - x_{j})}{\sqrt{\sum_{k = 1}^{n} (x_{ki} - x_{i})^{2}} \sqrt{\sum_{k = 1}^{n} (x_{kj} - x_{j})^{2}}} \end{matrix}

(7)

where x_ki and x_kj are the values of the kth sample on the ith and jth feature dimensions, and $x_{i}$ and $x_{j}$ are the sample means of the corresponding features; the correlation coefficient reflects the overall consistency in the variation of two variables across the entire sample. When $∣ c_{ij} ∣$ is large, it indicates that these two features exhibit a strong synchronous relationship under different operating conditions and are more likely to reflect the same spectral variation pattern; when $∣ c_{ij} ∣$ is small, this indicates that there are significant differences in their response mechanisms, making them more suitable to be retained as relatively independent structural features. It should be noted that in this study, the correlation coefficient is calculated by taking its absolute value $∣ c_{ij} ∣$ , the purpose of which is not to differentiate between positive and negative directions of change between features, but to portray the strength of their consistency in response to changes in operating conditions. In practical mechanical systems, some features may still be driven by the same failure mechanism, so the magnitude of the correlation strength is a better indicator of their physical relevance than the sign.

Based on the above calculation results, a correlation matrix $C = [c_{ij}]$ can be constructed. On this basis, a threshold θ is set, and by

\begin{matrix} \begin{matrix} a_{ij} = {\begin{matrix} 1, & ∣ c_{ij} ∣ \geq θ, i \neq j \\ 0, & otherwise \end{matrix} \end{matrix} \end{matrix}

(8)

The adjacency matrix $A = [a_{ij}]$ is obtained. By analyzing the rank distribution of the matrix A, all the features showing highly synchronized changes can be automatically classified into several feature coupling clusters. Let the final number of feature clusters obtained be g. In this case, the features within each cluster primarily describe spectral behavior of the same type, while different clusters correspond to different failure mechanisms or energy transfer patterns. This feature grouping based on related structures is not merely a form of statistical dimension reduction, but rather a reorganization of spectral variation patterns. For example, when the impact component increases, related statistical measures such as energy, bandwidth, and kurtosis tend to increase simultaneously; when the frequency components shift as a whole, the features near the fundamental frequency and its harmonics also shift accordingly. Grouping these features, which exhibit consistent response patterns, into a single structural unit helps reduce information redundancy and allows different failure mechanisms to be represented relatively independently in the subsequent 3D space, thereby enhancing the stability and interpretability of the sample’s structural distribution.

Weighted fusion mechanism under feature coupling structure

In the “Inter-feature correlation structure identification and coupling relationship construction” section, frequency-domain features exhibiting synchronous variation patterns are grouped into several feature clusters through structural identification, such that each cluster primarily corresponds to the same type of spectral variation pattern. Although the same group of features shows similar trends in different working conditions, their contribution to the geometric structure of the sample space is not entirely consistent. Certain features are better able to distinguish between samples in a 2D projection and provide stronger support for the unfolding of spatial structures, while other features change in tandem with the features within the group; their contribution to distinguishing states is relatively limited. Therefore, further weighting within the group helps preserve the primary structural orientation and reduces the interference of redundant expressions on spatial organization.

Traditional feature weighting methods typically aim to improve classification accuracy by using label information to measure the importance of features and assigning greater weights to highly discriminative features. The weighted fusion described in this paper does not rely on class labels, and its purpose is not to improve the performance of a single classifier but rather to preserve the ability of samples to unfold their structure in a multidimensional space. In other words, weight allocation focuses on the contribution of features to the formation of geometric structures, rather than on classification ability in the traditional sense. Based on this, this paper uses the shape importance vector $S = [S_{1}, S_{2}, \dots, S_{d}]$ obtained in the “Inter-feature 2D projections and shape importance measures” section as the basis for the weights to intra-group features. Let the gth feature group be $G_{g} = {p_{1}, p_{2}, \dots, p_{h_{g}}}$ . where p_k denotes the column index of the kth feature in the group in the original matrix and h_g denotes the number of features contained in the group. The fusion weight of the kth feature in the group is defined as

\begin{matrix} ω_{k}^{(g)} = \frac{S_{p_{k}}}{\sum_{r = 1}^{h_{g}} S_{p_{r}}} \end{matrix}

(9)

where p_k denotes the column index of the kth feature in the gth feature cluster in the original feature matrix; p_r denotes the column index of the rth feature in that feature cluster (for traversing all the features in the whole cluster); and $S_{p_{k}}$ is the shape importance value of the feature p_k, which is used to measure the extent to which this feature contributes to the expansion of the sample space; The denominator represents the normalized sum of the importance of all feature shapes within the group, ensuring that the weights reflect the relative structural contributions of each feature within the group. The larger the weight value $ω_{k}^{(g)}$ , the more significant the role of that feature in preserving the integrity of the spatial structure. After determining the weights, all the features in the group are linearly fused according to their weights, defining the new equivalent feature as

u_{i}^{(g)} = \sum_{r = 1}^{h_{g}} ω_{r}^{(g)} X_{i, p_{r}}

(10)

where $u_{i}^{(g)}$ denotes the value of the ith sample on the fusion feature g; $X_{i, p_{r}}$ denotes the value of sample i in the p_rth column of the original feature matrix; and $ω_{r}^{(g)}$ denotes the weighting coefficient of the rth feature in the group. It should be noted that the linear fusion referred to here is not simply a form of dimensionality reduction, but rather a reorganization of the primary structural directions within the group. By retaining dominant features that contribute significantly to the shape and appropriately attenuating secondary features with strong redundancy, the same spectral variation pattern can be represented in space in a more stable form.

After fusing all g feature sets, resulting in the final weighted feature matrix X^*

\begin{matrix} X^{*} = [u^{(1)}, u^{(2)}, \dots, u^{(g)}] \in R^{n \times g} \end{matrix}

(11)

where $u^{(g)}$ denotes the fused feature vector of the gth group; g denotes the number of feature groups, that is, the effective feature dimensions after fusion; n denotes the number of samples; and X^* denotes the weighted fused feature matrix, reducing the dimension from the original d dimensions to g dimensions, while preserving the main spectral variation patterns and reducing the overlap in the expression of features.

From a signal processing point of view, frequency-domain features often reflect the response process of the same failure mechanism in groups. For example, when the impact component increases, related statistical measures such as energy, bandwidth, and kurtosis typically increase in tandem; when the frequency components shift as a whole, the features near the fundamental frequency and its harmonics will also exhibit a consistent shift. By identifying key features within a group that make the primary structural contributions through shape importance analysis and assigning them higher weights during the fusion process, different failure mechanisms can be represented more independently and stably in the subsequent 3D space, thereby enhancing the separability and engineering interpretability of the sample distribution structure.

Construction of 3D structural spaces and presentation of failure modes

After completing the aforementioned feature construction, shape importance analysis, and intra-group weighted fusion, a weighted feature matrix X^* with converged dimensions and reduced redundancy can be obtained. This feature space preserves the primary spectral variation patterns and their structural relationships; however, it remains difficult to directly observe the spatial distribution patterns among different samples based solely on the high-dimensional numerical form. To more intuitively analyze the structural relationships among samples from different operating states and to verify the effectiveness of the constructed feature space in mapping failure modes, this paper further conducts a 3D structural representation and visualization analysis. First, we use Euclidean distance to measure the relative differences among samples in the weighted feature space and construct a distance matrix between the samples. This matrix is used solely to describe the relative positions of samples in X^* space; it does not involve any assumptions about categories. Its purpose is to provide a unified structural foundation for subsequent spatial representations. Based on this, the weighted feature space is mapped to a 3D structural space. This paper employs the t-SNE method to project high-dimensional sample relationships into a lower-dimensional space. Its purpose is not to serve as a core classification algorithm, but rather as a tool for visualizing 3D structures, enabling the observation of how samples are organized in the high-dimensional feature space. By preserving local neighborhood relationships, samples with similar spectral variation patterns remain relatively close to one another in 3D space, while samples with significant differences are gradually separated. This allows structural relationships that are difficult to observe directly in high-dimensional space to be visualized more intuitively. Compared to 2D representations, 3D space has a higher capacity for structural information, which can alleviate sample crowding and overlap during dimensionality reduction to some extent, making the boundaries between different fault states clearer. In particular, for samples with minimal spectral differences or localized overlaps, 3D structural representation offers greater spatial flexibility, enabling different failure modes to form more stable spatial distributions. After obtaining the 3D coordinates, we further employed clustering methods to validate the structure of the sample distribution. It should be noted that the clustering results are not intended to serve as the final fault identification model but are used solely to verify whether the samples form stable clusters in 3D space. By comparing the consistency of sample distributions across different clustering methods, we can determine whether the constructed feature space has achieved the structural characteristics of intra-class compactness and inter-class separation, thereby validating the effectiveness of the 3D structural representation. Through this process, the sample relationships corresponding to the frequency-domain features can be more fully represented in 3D space. This spatial distribution not only reflects the role of feature fusion in improving structural organization but also provides an intuitive structural basis for subsequent fault state analysis, thereby forming a comprehensive analytical process that spans feature construction, structural unfolding, and structural validation.

Metrics for evaluating cluster structures

To further examine the organizational patterns of sample distribution in 3D space, this paper introduces structural evaluation metrics based on geometric distribution to analyze the spatial separation between different clusters and the degree of clustering within each cluster. These metrics are primarily used to help describe structural features in 3D space and to determine whether samples from different operational states form relatively stable and distinguishable spatial distributions; they are not the sole criterion for assessing the validity of the method.

Interclass spatial separation

Inter-cluster separation is used to describe the relative distances between different clusters in 3D space. When different fault states form distinct clusters in space, there is typically a noticeable geometric distance between the clusters.

(1) Cluster center computing

For each class of sample points $y_{1}^{(1)}, y_{2}^{(1)}, \dots, y_{ni}^{(1)} \in R^{3}$ , in each class of clustered clusters C_i, compute their geometric centers, defined as follows:

\begin{matrix} μ_{i} = \frac{1}{n_{i}} \sum_{k = 1}^{n_{i}} y_{k}^{(i)} \end{matrix}

(12)

where μ_i denotes the center coordinate vector of the ith cluster and n_i is the number of samples in that cluster, $y_{k}^{(i)}$ are the coordinates of the kth sample point in class i, which belongs to a point in 3D space.

(2) Calculation of maximum class radius

The distance from the center of a sample point within a cluster is defined in the two-norm:

\begin{matrix} r_{i}^{\max} = \max_{k} ∣ ∣ y_{k}^{(i)} - μ_{i} {∣ ∣}_{2} \end{matrix}

(13)

This value reflects the distance from the farthest sample in class i to the center, that is, the “maximum radius” of the cluster.

(3) Calculation of inter-cluster distance and net spacing

For any two different clusters i and j, the distance between their centers is defined as follows using two-norm:

\begin{matrix} d_{ij} = ‖ μ_{i} - μ_{j} ‖_{2} \end{matrix}

(14)

The “net separation distance” between clusters is further defined as the distance between the centers of two clusters minus the sum of their maximum radii:

\begin{matrix} Δ_{ij} = d_{ij} - (r_{i}^{\max} + r_{j}^{\max}) \end{matrix}

(15)

This metric is used to help visualize the boundaries between different clusters. When Δ _ij is large, it indicates that the boundaries of the two clusters overlap little in space and that their distributions are relatively independent; when Δ _ij is small, it indicates that the boundary distance between the two clusters is small, suggesting that the sample distributions may exhibit strong proximity and relatively weak spatial distinctiveness.

Compactness of class structures

In addition to inter-class separation, intra-class compactness is used to describe the degree to which samples within the same class cluster together in 3D space. If samples of the same type are concentrated in a specific area of space, this indicates that the fault state can form a stable structural region.

(1) Definition of distance within a class

For each class of clustered cluster C_i, define the Euclidean distance between the kth sample point in the cluster and the center of the cluster as $r_{k}^{(i)}$ :

\begin{matrix} r_{k}^{(i)} = ‖ y_{k}^{(i)} - μ_{i} ‖_{2} \end{matrix}

(16)

where μ_i denotes the center coordinate of the ith cluster, and $y_{k}^{(i)}$ is the low-dimensional vector representation of the kth sample in that cluster.

(2) Kernel density estimation methods

To characterize the overall distribution pattern of the radius within the class, this paper uses the kernel density estimation method to fit the probability density of ${r_{k}^{(i)}}$ . The kernel density function is of the form:

{\hat{f}}_{i} (r) = \frac{1}{n_{i} h} \sum_{k = 1}^{ni} K (\frac{r - r_{k}^{(i)}}{h})

(17)

where K(⋅) is the Gaussian kernel function, h is the smoothing bandwidth (in this implementation, this parameter is automatically set to its default value by the “ksdensity” function; it is used solely to smooth the display of the intra-class radius distribution curve and does not contribute to the construction of the feature space or the generation of clustering results), r is the distance variable, and n_i denotes the number of samples in the ith class. This method yields a continuous distribution curve of intra-cluster distances, allowing us to observe the degree of cluster cohesion.

If the distribution of intra-class distances is concentrated and the range of variation is small, this indicates that the samples are clustered closely together in space; conversely, if the distribution of distances is more dispersed, this indicates that the spatial structure of this class of samples is relatively loose.

This paper constructs a 3D structural space based on the geometric structure of frequency-domain features and performs a supplementary analysis of sample distribution patterns using inter-class spatial separation and intra-class structural compactness. These metrics are primarily used to observe the distribution patterns of different failure states in 3D space and to assess structural stability, thereby providing a spatial reference for the analysis of subsequent experimental results.

Fault data validation and structural analysis

The preceding section outlined the overall workflow for 3D fault structure analysis, focusing on the construction of frequency-domain statistical features, shape importance metrics, the identification of feature correlation structures, and weighted fusion methods. Building on this foundation, to verify the effectiveness of the constructed feature structure in real-world vibration signals, this paper conducts experimental analyses using both field-measured vibration data from construction sites and publicly available fault data from Southeast University.

Experimental data and feature settings

To analyze the applicability of the constructed feature structure under different signal conditions, this paper selected two representative vibration datasets as experimental subjects: the field-measured fault dataset and the Southeast University public fault dataset. The field measurement data originates from the equipment operation monitoring process at a wind power plant in Yunnan, China. Vibration signals were collected by sensors installed at critical locations. Because on-site equipment is constantly subjected to varying loads, fluctuating speeds, and ambient noise, the acquired signals exhibit distinct non-stationary characteristics, and spectral differences between various fault states are more susceptible to changes in operating conditions. This dataset includes seven types of fault conditions affecting critical components such as the high-speed shaft, medium-speed shaft, and bearings before and after the generator, and reflects changes in the vibration response of the transmission system under actual operating conditions. The various fault categories are shown in Table 1. It should be noted that type 1 through type 7 in Table 1 refer to the fault status codes corresponding to the raw data files; these are used solely to indicate the source of the samples and for cross-referencing subsequent results and are not included in the feature weighting, 3D embedding, or clustering processes.

Table 1.

Table of seven categories of fault conditions in the site measurement data set.

Number	Fault type
Type 1	Gearbox high-speed tooth bearing inner ring pitting
Type 2	Broken gearbox medium-speed shaft
Type 3	Gearbox high-speed shaft bearing mid-term failure
Type 4	Medium-term wear on generator rear bearings
Type 5	Secondary large rear bearing ball surface stripping
Type 6	Heavy wear on gearbox intermediate stage rear bearing
Type 7	Generator front bearing wear

The Southeast University bearing failure dataset is a publicly available experimental dataset that includes four operational states: healthy, rolling element failure, inner ring failure, and outer ring failure. Compared to field measurement data, the acquisition environment for this dataset is relatively controlled and subject to minimal signal interference, making it suitable for validating fault diagnosis methods. This paper combines field measurement data with publicly available experimental data to evaluate the method’s performance on complex engineering data and to verify its applicability and stability across different data sources. The Southeast University fault dataset experimental platform is shown in Figure 2.

Figure 2.

Southeast University fault dataset experimental platform.

For feature extraction, this paper employs a 12-dimensional set of frequency-domain statistical features derived from each vibration signal segment based on the previously established feature framework. These include fundamental frequency, spectral energy, bandwidth, spectral kurtosis, instantaneous frequency mean, harmonic ratio, spectral peak factor, spectral mean, spectral standard deviation, spectral skewness, spectral centroid, and spectral root mean square frequency. These characteristics describe the differences in fault signals in terms of frequency location, energy distribution, spectral shape, and impact components, providing a unified set of features for subsequent structural analysis. To minimize the impact of differing feature dimensions and numerical ranges on distance calculations, the feature matrix was standardized in the experiment to ensure comparability among features across different numerical scales.

Validation of field measurement datasets and structural analysis

Parameter settings and evaluation metrics

Based on the actual vibration data measured at the construction site, the experiment used the 12-dimensional frequency-domain statistical features described in the “Experimental data and feature settings” section as the initial input, and the feature matrix was normalized. The original signal has a sampling frequency of 1000 Hz. A sliding window method is used to divide the signal into segments, with a window length of 256 samples, an overlap length of 128 samples, and an adjacent window step size of 128 samples. After applying the sliding window method, approximately 200 sample segments are obtained for each fault state, resulting in a total of about 1400 sets of fault samples across the seven categories. After undergoing an FFT transformation, frequency-domain statistical features are extracted from each window signal for subsequent feature structure analysis and clustering validation. It should be noted that this paper does not introduce sample class labels during the clustering process; samples are grouped solely based on distance relationships in the feature space. In the experiment, the number of clusters is set based on the number of known operational states in the dataset to ensure that different methods are compared under the same number of categories; this does not imply that sample labels are used in feature construction or clustering calculations.

In the feature coupling and weighting stage, the feature correlation threshold θ and the t-SNE perplexity parameter³⁴ serve as the primary tunable parameters in this paper. In particular, the threshold θ is used to control the strength of the determination of synchronous relationships between features, which directly affects the number of feature groups formed after feature coupling; the perplexity parameter influences the extent to which t-SNE captures the scale of local neighborhoods during the low-dimensional embedding process. Based on the volume of data and the results of multiple experiments, this paper conducts combination tests on the aforementioned parameters within a reasonable range. The relevant threshold θ is searched for in the range [0.90, 0.98] with a step size of 0.01; the perplexity parameter is adjusted in the range [25, 75] with a step size of 5. For each set of parameter combinations, the processes of feature coupling, shape weighting, 3D structural representation, and clustering validation are performed, and the average contour coefficient of the corresponding results is calculated as an auxiliary evaluation metric. To minimize the impact of random initialization on the results, the experiment bound the random seed to the parameter combination and ran the clustering phase—which involves random initialization—20 times.

The contour coefficient is used to help evaluate the degree of intra-class clustering and inter-class separation in clustering results. It is defined as:

S (i) = \frac{b (i) - a (i)}{\max {a (i), b (i)}}

(18)

In the formula, a(i) represents the average distance between sample i and other samples in the same cluster, and b(i) represents the average distance from sample i to the nearest sample in another cluster. The shape coefficient S(i) takes values in the range [−1, 1]; the closer the value is to 1, the smaller the distance between the sample and the interior of its cluster, and the greater the distance between the sample and other clusters. In the experiments described in this paper, the contour coefficient is primarily used as an auxiliary metric for parameter combination screening and for comparing different feature representations; it is not the sole criterion for determining the validity of a 3D structure. Since t-SNE embedding results are primarily used for low-dimensional visualization, this paper does not interpret the geometric distance in the embedding space as a strict physical distance but rather as a basis for relative comparison among different feature representations under the same experimental conditions.

Comparative analysis of different feature representation methods

To verify the effectiveness of feature structure analysis and weighted fusion methods in improving the distribution of the sample space, this paper employs K-means clustering as a unified validation method to conduct a comparative analysis of the 3D structural results obtained under different feature representation methods. In this section, K-means is primarily used as a basic clustering tool to observe the distribution of samples in 3D space, rather than as a supervised fault detection model. By keeping the number of clustering categories and parameter settings constant, we can more directly compare the impact of different feature representations on spatial separation performance.

During the feature structure optimization process, the relevant thresholds and t-SNE confusion parameters have a certain influence on the final 3D structure. The experiment compiled data on representative combinations of parameters with higher profile coefficients and ranked them in descending order based on their average profile coefficients. The results are shown in Table 2.

Table 2.

Statistical results for combinations of high-profile coefficients based on field measurements.

θ Value	Perplexity	Number of feature groups after coupling g	Average contour coefficient
0.94	55	8	0.935
0.92	65	8	0.932
0.94	50	8	0.928
0.93	60	8	0.927
0.94	60	8	0.922

Note. The table lists only a selection of representative results that meet the threshold criteria for the profile coefficient.

As shown in Table 2, all of the representative parameter combinations listed achieved high average profile coefficients, and the differences in results between the various combinations were minimal. This indicates that the method described in this paper exhibits good stability with respect to the relevant threshold θ and the perplexity parameter. Although the parameter values differ, the number of effective feature groups after feature coupling remains at eight in all cases, indicating that the correlation structure among frequency-domain features is relatively stable and not merely a coincidence resulting from a specific set of parameters. Different combinations of parameters primarily affect the local embedding results without altering the overall structural relationships of the features. Based on the results of the overall profile coefficient and structural distribution, we subsequently selected θ = 0.94 and perplexity = 55 as the representative parameter combination for our method on the field measurement dataset.

To further illustrate the differences between the method described in this paper and traditional feature weighting approaches, Fisher weighting was introduced as a comparison in the experiments. It should be noted that Fisher weighting incorporates class information when calculating feature weights; therefore, in this paper, we use it solely as a baseline that incorporates class-based prior information for the purpose of comparing the differences between traditional discriminative weighting and the structural weighting fusion method proposed in this paper.

Based on this, 3D structural representations were generated for the original frequency-domain features, the Fisher-weighted features, and the weighted features constructed by the method proposed in this paper; the results of their spatial distributions are shown in Figure 3. It should be noted that the labels “type1” through “type7” in the figure are merely cluster IDs generated by the clustering algorithm or visualization labels and do not directly correspond to the original fault states “type 1” through “type 7” listed in Table 1. Fault names were not used as input during the clustering process; fault category information was used solely for interpreting the results after the experiment was completed.

Figure 3.

3D distribution of field measurement data using different feature representations: (a) original frequency-domain features, (b) Fisher-weighted features, and (c) the method proposed in this paper.

As shown in Figure 3(a), when the standardized original 12-dimensional frequency-domain features are used directly, the samples exhibit a certain degree of clustering, but the overall spatial structure remains relatively loose. Type 2, type 3, and type 4 are concentrated in the upper region, with relatively close spacing between clusters and localized adhesion at the boundaries; type 1 and type 6 are located in the lower region; although the two types of samples can be broadly distinguished, the spatial separation between them is insufficient; type 5 exhibits significant stretching in 3D space, with some samples extending into adjacent regions; the point cloud distribution for type 7 is also relatively dispersed. Its average contour coefficient is only 0.688, indicating that when raw frequency-domain features are directly incorporated into the 3D representation, they can reflect some differences in faults but struggle to form a stable, clear spatial structure. The primary reason for this phenomenon is that the 12-dimensional frequency-domain features contain not only useful information related to the fault state but also some redundant and weakly discriminative information. When features are treated equally during distance calculations and spatial embedding, key structural orientations are prone to distortion, leading to phenomena such as stretching, merging, and blurred boundaries in the sample point cloud. Figure 3(b) shows the results of the 3D structure after Fisher weighting. Compared to the original features, the overall distribution improved significantly after Fisher weighting. Types 1, 2, and 4 all formed relatively distinct clusters, the range of the sample point cloud narrowed compared to the results of the original features, and the average slenderness ratio increased to 0.891. This indicates that Fisher weighting can enhance features with strong discriminative power and plays a role in mitigating aliasing in the original feature space. However, as can still be seen in the figure, types 3 and 6 are relatively close in certain local areas, while types 5 and 7 are distributed adjacent to each other in the vertical direction; the boundaries remain unclear; at the same time, the point clouds for types 2 and 5 still exhibit some stretching, indicating that there is room for improvement in intra-class compactness. This is because Fisher weighting primarily assigns weights based on the inter-class differences and intra-class variability of individual features; while it can highlight certain discriminative features, it does not further address the redundant correlations among features. When multiple features exhibit similar spectral variation patterns, redundant information may still contribute to spatial embedding, causing certain categories to retain a clustered distribution or local clustering in 3D space. Figure 3(c) shows the 3D structural results obtained using the method described in this paper. It can be seen that the samples from the seven fault categories form distinct, independent clusters in 3D space, with clear separation between categories and no significant overlap. Compared with the results of the original features, the adhesion in the upper region was significantly improved for types 2, 3, and 4, while the tensile distribution for type 5 was effectively reduced; Compared to the Fisher-weighted results, the spatial boundaries between types 3 and 6, as well as between types 5 and 7, are more distinct, and the issue of clustering of certain categories has been alleviated. Although the point cloud coverage of type 3 is relatively large, it still maintains a clear spatial separation from other categories. The corresponding average profile coefficient reached 0.935, indicating that the method described in this paper achieves a better balance between inter-class separation and intra-class clustering.

The results above indicate that the improvements in the method described in this paper go beyond simply assigning different weights to features. Instead, the method first uses 2D projection to characterize the contribution of each feature to the geometric structure of the samples, then identifies sets of features that change in tandem based on these structures, and finally performs weighted fusion within each group. This process reduces the interference of redundant features on distance relationships while preserving the independent representation of key spectral patterns, such as the main frequency, energy distribution, and impact characteristics. Therefore, under complex operating conditions and in the presence of noise interference, different fault states can form a more stable spatial distribution in 3D space.

Combining the results in Figure 3 with the contour coefficient values, it can be seen that the average contour coefficients for the original features, Fisher-weighted features, and the method proposed in this paper are 0.688, 0.891, and 0.935, respectively, showing a gradual improvement. The results of the original feature analysis indicate that the direct use of multidimensional frequency-domain features is prone to being influenced by redundant and weakly correlated information; The Fisher-weighted results indicate that traditional discriminant weighting can improve the sample distribution, but it still does not adequately capture the structural relationships among features; The method described in this paper reconstructs frequency-domain features by considering both geometric contributions and relevant coupling structures, resulting in a clearer and more stable distribution of samples in 3D space. As can be seen, the structurally weighted fusion method is more effective at capturing the spatial differences between various fault states under complex operating conditions.

Comparative analysis of 2D and 3D structural representations

To further analyze the necessity of 3D structural representation, this paper maps the weighted features obtained using the proposed method to both 2D and 3D spaces via t-SNE under identical feature construction and parameter conditions and employs K-means clustering to provide supplementary validation of the sample distribution in the sample space. The corresponding results are shown in Figure 4, where Figure 4(a) shows the 2D structural results, Figure 4(b) shows the 3D structural results from the default view, and Figure 4(c) shows the 3D structural results from a rotated view.

Figure 4.

Comparison of 2D and 3D representations of field measurement data: (a) 2D structural results, (b) 3D structural results (default view), and (c) 3D structural results (rotated view). 2D: two-dimensional; 3D: three-dimensional.

As shown in Figure 4(a), most fault categories in the 2D structure have formed relatively distinct clusters; the spatial distributions of types 1, 2, 5, and 7 are relatively clear, and the samples within each category exhibit a certain degree of compactness. However, localized compression still occurs in the 2D plane. In particular, types 4 and 6 are located in the middle-right region of the figure; the two types of samples are relatively close to the boundary, and the point cloud of type 6 partially overlaps with that of type 4, which can easily create a visual overlap; although type 3 maintains a certain distance from other categories, its point cloud exhibits a locally dispersed pattern, and there is still room for improvement in terms of compactness. The average contour coefficient for these results is 0.865, indicating that the 2D structure can capture the basic distribution trends of the fault samples; however, it still struggles to fully represent the spatial hierarchy among samples in regions where categories are closely clustered. Figure 4(b) shows the results of the 3D structure obtained under the same conditions. Compared with the 2D results, the 3D space adds an additional dimension of representation, allowing sample clusters that were relatively close together in the 2D plane to spread out further along the third dimension. As can be seen, types 1, 2, 3, 4, 5, and 6 all form distinct clusters in 3D space. The local overlap between types 4 and 6 in the 2D plot has been significantly reduced, and the spatial distance between them is now more clearly defined. At the same time, the average contour coefficient of the 3D structure increased to 0.935, indicating improvements in both inter-class separation and intra-class clustering. It should be noted that, under the default viewing angle shown in Figure 4(b), type 7 is not clearly visible, which may give the impression that this class is missing or overlaps with other classes. Based on 3D spatial characteristics, it can be concluded that this phenomenon is primarily caused by front-to-back occlusion resulting from the viewing angle, rather than complete overlap of the samples in real 3D space. In other words, a 3D map created from a fixed viewpoint is still a projection; if structural distribution is assessed based solely on a single viewpoint, the actual spatial distances between certain categories may be underestimated. To further validate this, Figure 4(c) shows the 3D structure results from a rotated perspective. After rotation, it is clearly visible that the cluster of type 7 samples, which was previously obscured in the default view, has become visible and maintains a distinct spatial position relative to the surrounding categories. At this point, all seven classes of fault samples can be clearly observed in 3D space, with no significant overlap between the classes. In particular, the spatial relationship between type 7 and adjacent categories becomes clearer after rotation, indicating that it is not being confused with other categories but is simply obscured by objects in front of it from the default viewpoint. As can be seen, 3D representations allow for a more comprehensive visualization of the hierarchical relationships between samples, thereby further revealing the distribution of categories that are compressed or obscured in a 2D plane.

As can be seen from Figure 4(a) through (c), although 2D representations can illustrate the overall distribution of faulty samples, they are prone to the effects of planar compression under complex operating conditions, resulting in the boundaries of some categories becoming blurred or locally overlapping. 3D structural representation, by increasing spatial freedom, allows different fault categories to be more fully displayed in space and enables the observation of the actual distances between categories and the visual occlusion caused by projections through rotation. Therefore, the use of a 3D representation in this paper is not merely to enhance visualization but to more comprehensively illustrate the distribution of complex fault samples in the feature space.

Verification of structural stability under different clustering methods

To further validate the stability of the constructed 3D feature structure, this paper introduces hierarchical clustering and spectral clustering for comparative validation under the previously mentioned optimal parameter conditions. It should be noted that the focus of this section is not on comparing the merits of different clustering algorithms, but rather on examining whether, within the same weighted feature space, samples can maintain a relatively consistent spatial separation pattern under different clustering criteria. If a 3D structure is effective only for a specific clustering algorithm, this indicates that its stability is limited; If all clustering methods yield relatively clear distribution results, this further demonstrates that the feature space constructed in this paper possesses good intrinsic separability.

In the experiment, the parameter combinations determined in the “Comparative analysis of different feature representation methods” section were again used. Both hierarchical clustering and spectral clustering were performed on the 3D structural space obtained by the method described in this paper, and the results are shown in Figure 5.

Figure 5.

Results of 3D structure validation using different clustering methods based on field measurement data: (a) hierarchical clustering and (b) spectral clustering. 3D: three-dimensional.

As shown in Figure 5(a), after hierarchical clustering, the seven sample categories still form relatively distinct clusters, with clear spatial separation between them and no significant overlap. Some categories are relatively close to one another in spatial terms, but their boundaries remain fairly distinct, and there is no noticeable blurring within the point cloud. Hierarchical clustering primarily involves the gradual merging of clusters based on the distance relationships between samples, without relying on the assumption of centroid partitioning found in K-means. Therefore, these results indicate that the 3D feature space constructed in this study can still achieve good separation performance under a clustering method based on hierarchical distance relationships. Figure 5(b) shows the results of spectral clustering. It can be seen that, under spectral clustering, samples in each category still exhibit a relatively compact clustered distribution, with distinct gaps between clusters; the overall spatial pattern is largely consistent with the results of K-means and hierarchical clustering. Spectral clustering focuses more on the adjacency relationships and local connectivity structures among samples. These results indicate that the 3D feature space obtained by the method proposed in this paper not only exhibits good separability in terms of global distance distribution but also effectively preserves the differences between various fault states in terms of local neighborhood structures. It should be noted that both hierarchical clustering and spectral clustering are unsupervised classification processes; during clustering, only 3D coordinate information is used, without incorporating fault category information. The cluster labels generated by different clustering methods are used solely for display purposes; the order of these labels has no specific meaning. Therefore, the colors in the figure may not correspond exactly to the K-means results. This phenomenon merely reflects differences in the order in which numbers are displayed by different algorithms; it does not affect the interpretation of the spatial structure.

Overall, K-means, hierarchical clustering, and spectral clustering all yield relatively clear clustering structures in the same weighted 3D feature space. In particular, the average contour coefficient for K-means, hierarchical clustering, and spectral clustering was 0.935 in each case, indicating that the 3D structure is not dependent on any specific clustering criterion. In other words, clustering algorithms primarily serve a structural validation function in this paper; the fundamental reason why samples can be stably separated lies in the fact that the prior feature structure analysis and weighted fusion enhance the ability to express differences in the frequency domain.

3D structural analysis and explanation of failure mechanisms

To avoid relying solely on 3D scatter plots for qualitative interpretation of the results, this paper conducts a supplementary analysis of the 3D structure based on the K-means clustering results, taking into account the inter-cluster net distance, the distribution of intra-cluster radii, and the physical significance of the feature groups. It should be noted that C₁∼C₇ in this section refer only to the cluster IDs output by the K-means algorithm, which are used to describe the spatial distribution; they do not directly correspond to the original fault category IDs in Table 1. Figure 6 shows the results of the net inter-cluster distances between the clusters based on the field measurement data.

Figure 6.

Net separation distances between clusters based on field measurement data.

As shown in Figure 6, the net inter-cluster distances are all positive, indicating that, in the current 3D structure, the clusters maintain an observable spatial separation from one another and do not overlap significantly. In particular, the distances between clusters such as C₅–C₇, C₂–C₄, C₄–C₇, and C₆–C₇ are relatively large, indicating that these states are well separated in 3D space; in contrast, the intervals between clusters such as C₄–C₆, C₁–C₇, and C₃–C₅ are relatively small, indicating that although these clusters are distinct, their boundaries are close together, and they still constitute a combination of states that are likely to produce similar frequency-domain responses. These results are largely consistent with the spatial distribution observed in the scatter plot presented earlier, indicating that the 3D structure obtained using the method described in this paper is not merely visually dispersed but also exhibits certain structural differences along the boundaries between clusters. Figure 7 further presents the kernel density estimates of the distances from samples within each cluster to the cluster center.

Figure 7.

Distribution of kernel density within each cluster based on field measurement data: (a) cluster 1, (b) cluster 2, (c) cluster 3, (d) cluster 4, (e) cluster 5, (f) cluster 6, and (g) cluster 7.

As shown in Figure 7, the radius distributions of most clusters exhibit a single main peak or a cluster of main peaks, indicating that samples within the same cluster generally cluster around the central region. In particular, the peaks of the C₅ and C₇ curves occur relatively early and decline rapidly, indicating good compactness within their respective classes; The distributions of C₂ and C₄ are also relatively concentrated, with limited dispersion. In contrast, the radius distributions for C₁, C₃, and C₆ are relatively broader, indicating that there is still some expansion within these sample categories. This phenomenon is related to the complexity of on-site measurement data: field signals are affected by fluctuations in rotational speed, changes in load, and ambient noise; even under similar operating conditions, there may be some variation in spectral amplitude, energy distribution, and impact components. Consequently, the distribution within certain categories may exhibit a relatively wide range of radii. In terms of feature structure, under optimal parameter conditions, the method described in this paper consolidates 12-dimensional frequency-domain features into eight feature groups, as shown in Table 3.

Table 3.

Composition and physical meaning of the characteristic groups in field measurement data.

Feature set	Composition of features	Main physical significance
G ₁	Main frequency	Changes in the dominant frequency
G ₂	Spectral energy + spectral mean + spectral standard deviation	Fluctuations in energy levels and amplitude
G ₃	Bandwidth	Expansion of the frequency band
G ₄	Spectral kurtosis + spectral peakiness + spectral skewness	Impulsive, spiking, and asymmetric distributions
G ₅	Mean instantaneous frequency	Non-stationary frequency variations
G ₆	Harmonic ratio	Changes in harmonic components
G ₇	Spectral centroid	Shift in the spectral center of mass
G ₈	Mean square frequency	High-frequency energy distribution

As shown in Table 3, the merged feature set is not a random combination but rather exhibits a clear correspondence with the spectral response of the fault vibration signal. G₁, G₆, G₇, and G₈ primarily reflect changes in the frequency structure, including information such as the location of the fundamental frequency, harmonic components, the center of gravity of the spectrum, and the distribution of high-frequency energy. When gear meshing abnormalities, shaft-related failures, or an increase in bearing characteristic frequency components occur, the spectrum typically exhibits changes in the fundamental frequency or harmonic components. Consequently, these features remain relatively independent in the clustering analysis, indicating that they serve as independent indicators of different operating conditions. G₂ and G₃ primarily describe changes in energy amplitude and bandwidth expansion. When equipment experiences faults such as wear, spalling, or fracture, the vibration energy and frequency spectrum often change; therefore, energy and bandwidth characteristics can reflect the overall increase in response caused by the fault. G₄ is composed of spectral kurtosis, spectral peak factor, and spectral skewness and primarily corresponds to phenomena such as local impacts, peak enhancement, and skewed spectral distribution. It is consistent with the impact-type responses caused by bearing pitting, spalling, and localized damage in gears. G₅ reflects frequency response characteristics under non-stationary conditions and is more sensitive to fluctuations in rotational speed and load variations in field data.

As can be seen from the above results, the feature sets obtained using the method described in this paper not only reflect the statistical correlations among the features but also preserve the primary response patterns of different failure mechanisms in the frequency spectrum. The frequency structure group, energy statistics group, bandwidth expansion group, and impact characteristics group describe the differences in fault signals from different perspectives, enabling the samples to form a relatively clear distribution structure in 3D space. When different faults exhibit significant differences in their impulse components, energy levels, or frequency structures, the corresponding samples are more likely to form compact and well-separated spatial clusters; when the spectral response patterns of certain faults are similar, the distance between their clusters is also relatively small. It should be emphasized that this paper does not equate the geometric distance in the t-SNE embedding space directly with actual physical distance, nor does it rely solely on 3D scatter plots as the basis for determining failure mechanisms. This paper uses 3D structural results as an auxiliary observational tool and interprets them in conjunction with the composition of feature groups and their corresponding spectral response mechanisms. This demonstrates that the separation observed in the 3D structure is not merely a result of visualization effects but is related to physical characteristics such as energy distribution, bandwidth expansion, impulse response, and changes in frequency structure under different fault conditions, thereby enhancing the engineering interpretability of the constructed 3D structure.

Validation using the Southeast University public dataset

To further evaluate the adaptability of the method described in this paper across different data sources, we selected the publicly available fault dataset from Southeast University for additional validation. This dataset includes four states: Ball fault, Healthy, Inner fault, and Outer fault, with a sampling frequency of 1000 Hz. The experiment used a sliding window method to partition the samples, with a window length of 2048 data points, an overlap length of 1536 data points, and a step size of 512 data points between adjacent windows, resulting in a total of 336 sample groups. Compared with field measurement data, this dataset was collected under relatively stable conditions with minimal noise interference; however, it differs in terms of fault types, sample size, and spectral distribution characteristics. Therefore, it can be used to evaluate the applicability of the method described in this paper under various data conditions.

The experiment continues to use the 12-dimensional frequency-domain statistical features constructed earlier as input, and the feature matrix is normalized. Given that the sample size of this dataset is smaller than that of the on-site measurement data, the parameter search range has been narrowed accordingly: The threshold θ is set to [0.94, 0.96] with a step size of 0.01; the t-SNE perplexity parameter is set to [15, 25] with a step size of 1. Under different parameter combinations, K-means, hierarchical clustering, and spectral clustering were used for validation. It should be noted that dimension reduction and clustering calculations use only the frequency-domain feature matrix; fault names are not included in the model calculations. The category names shown in the figure are used solely for labeling the results and explaining their engineering significance. Under optimal parameter settings, the method described in this paper consolidates the 12-dimensional frequency-domain features of the Southeast University dataset into nine feature groups; the specific results are shown in Table 4.

Table 4.

Results of feature grouping for the Southeast University dataset and their main physical meanings.

Feature group	Composition of features	Main physical significance
G ₁	Main frequency	Dominant frequency component
G ₂	Spectral energy	Overall vibration energy level
G ₃	Bandwidth + spectral peakiness	Frequency band expansion and peak concentration
G ₄	Spectral kurtosis + spectral skewness	Impulsive and asymmetric spectral distribution
G ₅	Mean instantaneous frequency	Non-stationary frequency variation
G ₆	Harmonic ratio	Harmonic component variation
G ₇	Spectral mean	Average spectral amplitude level
G ₈	Spectral standard deviation	Fluctuation of spectral amplitude
G ₉	Spectral cfentroid + mean square Frequency	Spectral center shift and high-frequency energy distribution

As shown in Table 4, compared with the eight feature structures identified in the field measurement data, the Southeast University dataset comprises nine feature structures. These results demonstrate that the method described in this paper does not rely on a fixed, manually defined combination of features, but rather automatically adjusts the grouping results based on the correlations among frequency-domain features across different datasets. Energy-related statistical features in the construction site data tend to exhibit stronger correlation, whereas some energy and amplitude statistical features in the Southeast University data remain independent, reflecting differences between the two datasets in terms of signal fluctuation and spectral distribution characteristics. The distribution of 3D structures under different clustering methods is shown in Figure 8.

Figure 8.

Distribution of 3D structures in the Southeast University dataset under different clustering methods: (a) K-means, (b) hierarchical clustering, and (c) spectral clustering. 3D: three-dimensional.

As shown in Figure 8, the four sample categories all form distinct clusters in 3D space. The distribution of samples in the “Healthy” class is relatively compact and clearly separated from the three failure states; there was also no significant overlap between the Ball fault, the Inner fault, and the Outer fault. Compared to field measurement data, the cluster patterns in the Southeast University dataset are more concentrated, which is attributed to the relatively stable collection environment and minimal variations in operating conditions. Under the optimal parameter conditions for each clustering method, the kurtosis values for K-means, hierarchical clustering, and spectral clustering were 0.934, 0.940, and 0.942, respectively. The results from these three methods were similar, indicating that this 3D structure maintains a relatively stable distribution pattern under different clustering criteria.

Overall, the validation results using the Southeast University dataset indicate that the method proposed in this paper can still produce a relatively clear 3D structural distribution across different data sources, fault types, and sample sizes. The number of feature groups was adjusted from eight groups based on field measurement data to nine groups based on the Southeast University dataset, further demonstrating that the method can adapt to the frequency-domain correlation structure inherent in the data. This result validates the stability and applicability of the method proposed in this paper across different datasets.

Conclusion

To address the challenges of weak spectral differences, ambiguous sample boundaries, and the difficulty in directly distinguishing fault conditions in gearbox system vibration signals under complex operating conditions, this paper proposes a 3D structural analysis method based on frequency-domain statistical features to perform structural identification and distribution analysis on vibration data under different operating conditions. Experimental analysis of field measurement data and the publicly available dataset from Southeast University reveals that different fault states form structurally stable, well-defined sample distributions in 3D feature space. Field measurement data show that the profile coefficient reaches 0.935 under optimal parameter conditions, which is significantly better than the results obtained from the original frequency-domain characteristics and the traditional Fisher weighting method; the public dataset maintains a high silhouette coefficient across different clustering methods, indicating that this method effectively enhances the spatial separability between different operational states. Further comparative analysis was conducted using K-means, hierarchical clustering, and spectral clustering. The morphological patterns of the sample space remained consistent across different clustering methods, indicating that the resulting structure is not dependent on a single clustering algorithm and exhibits good stability and applicability. The resulting feature set exhibits a certain correlation with fault vibration characteristics such as changes in the dominant frequency, energy fluctuations, bandwidth expansion, impact spikes, harmonic components, and shifts in the center of gravity; this indicates that the 3D structure developed in this paper not only achieves effective separation in the sample space but also provides an explanation based on changes in frequency-domain features and the mechanisms of mechanical vibration. From an engineering perspective, by analyzing the distribution of 3D structures, it is possible to help identify clusters of anomalous samples and trace potential sources of failure, thereby providing a basis for analyzing failure mechanisms and identifying potential failure locations. When abnormal structures that deviate significantly from the normal state appear in 3D space, targeted inspections of key components can be conducted in conjunction with corresponding changes in characteristics, providing a basis for equipment inspections, adjustments to maintenance schedules, and preventive maintenance. Furthermore, the feature structure developed in this paper can not only be used for fault state analysis under complex operating conditions but also serve as effective input for subsequent intelligent diagnostic models, providing a more stable, distinguishable, and physically meaningful feature foundation for fault identification and condition prediction.

Footnotes

ORCID iDs

Dong Zhang

Junbing Qian

Tingting Liu

Zexian Li

Author contributions

Dong Zhang: Writing–Review and Revision, Writing–Original Draft, Visualization, Methodology, Formal Analysis, Data Processing, Conceptualization. Junbing Qian: Funding acquisition, Project administration, Review and editing, Resources, Supervision. Tingting Liu: Supervision, Validation. Zexian Li: Supervision, Validation.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Natural Science Foundation of China (grant no. 62363018), the China General Nuclear Power Group Chuxiong Dayao Wind Power Co., Ltd (grant no. 020-XN10-B-2024-C45-P.N.99-00015), and the Discipline Development Project of the School of Civil Aviation and Aeronautics, Kunming University of Science and Technology.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data cannot be made publicly available upon publication because the cost of preparing, depositing and hosting the data would be prohibitive within the terms of this research project. The data that support the findings of this study are available upon reasonable request from the authors.

References

Dong

Zhang

Zhan

, et al. A novel dynamic predictive maintenance framework for gearboxes utilizing nonlinear Wiener process. Meas Sci Technol 2024; 35(12): 126210. https://doi.org/10.1088/1361-6501/ad762e

Nguyen

Diep

. Vibration-based gearbox fault diagnosis using a multi-scale convolutional neural network with depth-wise feature concatenation. PLoS One 2025; 20(7): e0324905. https://doi.org/10.1371/journal.pone.0324905

Shi

Peng

, et al. Planetary gearbox fault diagnosis using bidirectional-convolutional LSTM networks. Mech Syst Signal Process 2022; 162: 107996. https://doi.org/10.1016/j.ymssp.2021.107996

Wang

, et al. A novel bearing intelligent fault diagnosis framework under time-varying working conditions using recurrent neural network. ISA Trans 2020; 100: 155–170. https://doi.org/10.1016/j.isatra.2019.11.010

Ahmad

Cheng

Xing

, et al. Deep learning-based fault diagnosis of planetary gearbox: a systematic review. J Manuf Syst 2024; 77: 730–745. https://doi.org/10.1016/j.jmsy.2024.10.004

Yang

, et al. Nonlinear dynamic modeling and analysis of a helicopter planetary gear set for tooth crack diagnosis. Measurement 2022; 198: 111347. https://doi.org/10.1016/j.measurement.2022.111347

Han

Feng

Zhang

, et al. Intelligent fault diagnosis of planetary gearbox across conditions based on subdomain distribution adversarial adaptation. Sensors 2024; 24(21): 7017. https://doi.org/10.3390/s24217017

Xue

Wang

Howard

, et al. The diagnostic analysis of the fault coupling effects in planet bearing. Eng Fail Anal 2020; 108: 104266. https://doi.org/10.1016/j.engfailanal.2019.104266

Hashim

Shakya

. A spectral kurtosis based blind deconvolution approach for spur gear fault diagnosis. ISA Trans 2023; 142: 492–500. https://doi.org/10.1016/j.isatra.2023.07.035

10.

Liu

Yang

, et al. Deep adversarial transfer neural network for fault diagnosis of wind turbine gearbox. Int J Green Energy 2023; 20(15): 1750–1762. https://doi.org/10.1080/15435075.2023.2194375

11.

Wei

Yang

, et al. Intelligent fault diagnosis of planetary gearbox based on refined composite hierarchical fuzzy entropy and random forest. ISA Trans 2021; 109: 340–351. https://doi.org/10.1016/j.isatra.2020.10.028

12.

Issaadi

Hemsas

Soualhi

. Wind turbine gearbox diagnosis based on stator current. Energies 2023; 16(14): 5286. https://doi.org/10.3390/en16145286

13.

Mauricio

Gryllias

. Cyclostationary-based multiband envelope spectra extraction for bearing diagnostics: the combined improved envelope spectrum. Mech Syst Signal Process 2021; 149: 107150. https://doi.org/10.1016/j.ymssp.2020.107150

14.

Huang

Zhang

Guo

, et al. Wavelet packet decomposition-based multiscale CNN for fault diagnosis of wind turbine gearbox. IEEE Trans Cybern 2021; 53(1): 443–453. https://doi.org/10.1109/TCYB.2021.3123667

15.

Dhiman

Deb

Muyeen

, et al. Wind turbine gearbox anomaly detection based on adaptive threshold and twin support vector machines. IEEE Trans Energy Convers 2021; 36(4): 3462–3469. https://doi.org/10.1109/TEC.2021.3075897

16.

Meng

Kong

, et al. Small sample fault diagnosis method for wind turbine gearbox based on optimized generative adversarial networks. Eng Fail Anal 2022; 140: 106573. https://doi.org/10.1016/j.engfailanal.2022.106573

17.

Hosseinpour-Zarnaq

Omid

Biabani-Aghdam

. Fault diagnosis of tractor auxiliary gearbox using vibration analysis and random forest classifier. Inf Process Agric 2022; 9(1): 60–67. https://doi.org/10.1016/j.inpa.2021.01.002

18.

Guo

Yang

Hua

, et al. Coupling fault diagnosis of wind turbine gearbox based on multitask parallel convolutional neural networks with overall information. Renew Energy 2021; 178: 639–650. https://doi.org/10.1016/j.renene.2021.06.088

19.

Hogea

Onchiş

Yan

, et al. LogicLSTM: logically-driven long short-term memory model for fault diagnosis in gearboxes. J Manuf Syst 2024; 77: 892–902. https://doi.org/10.1016/j.jmsy.2024.10.003

20.

Jiao

Zhang

Fang

, et al. Ensemble of simplified graph wavelet neural networks for planetary gearbox fault diagnosis. IEEE Trans Instrum Meas 2023; 72: 1–10. https://doi.org/10.1109/TIM.2023.3310092

21.

Ding

. Fault diagnosis of rotating machinery using novel self-attention mechanism TCN with soft thresholding method. Meas Sci Technol 2024; 35(4): 047001. https://doi.org/10.1088/1361-6501/ad1eb3

22.

Zhang

Wang

Zhou

, et al. Trustworthy diagnostics with out-of-distribution detection: a novel max-consistency and min-similarity guided deep ensembles for uncertainty estimation. IEEE Internet Things J 2024; 11(13): 23055–23067. https://doi.org/10.1109/JIOT.2024.3387481

23.

Zhang

Yan

Wang

, et al. Generalizable fault diagnosis under distribution shifts induced by unseen working conditions via synthetic and adversarial sample learning. IEEE Trans Reliab 2025; 74(4): 5517–5530. https://doi.org/10.1109/TR.2025.3616336

24.

Yan

Dong

Wang

, et al. An intelligent method of health state division and assessment for gearbox based on acoustic signals. Qual Reliab Eng Int 2025; 41(4): 1195–1208. https://doi.org/10.1002/qre.3717

25.

Shi

Liu

Gao

. Study of wind turbine fault diagnosis and early warning based on SCADA data. IEEE Access 2021; 9: 124600–124615. https://doi.org/10.1109/ACCESS.2021.3110909

26.

Huang

, et al. The hybrid prediction of wind turbine gearbox oil temperature based on chaotic theory. J Supercomput 2025; 81(8): 927. https://doi.org/10.1007/s11227-025-07438-w

27.

Maaten

Hinton

. Visualizing data using t-SNE. J Mach Learn Res 2008; 9(Nov): 2579–2605.

28.

Hodgson

Wang

Cristea

, et al. Partially-supervised metric learning via dimensionality reduction of text embeddings using transformer encoders and attention mechanisms. IEEE Access 2024; 12: 77536–77554. https://doi.org/10.1109/ACCESS.2024.3403991

29.

Yin

Meng

Cheng

. Failure modeling and multidimensional optimization of gearbox journal bearings to prevent wear failures and reduce friction in wind turbines. Eng Fail Anal 2025; 178: 109748. https://doi.org/10.1016/j.engfailanal.2025.109748

30.

Zhang

Wang

Zhong

, et al. Domain adaptation-based multi-task model incorporating mixture-of-experts and feature decoupling for unseen compound fault diagnosis of gearbox. Meas Sci Technol 2025; 36(10): 106113. https://doi.org/10.1088/1361-6501/ae0cef

31.

Yan

Liu

Jia

. Multiscale cascading deep belief network for fault identification of rotating machinery under various working conditions. Knowl Based Syst 2020; 193: 105484. https://doi.org/10.1016/j.knosys.2020.105484

32.

Zhao

Yao

Deng

, et al. Intelligent fault diagnosis of gearbox under variable working conditions with adaptive intraclass and interclass convolutional neural network. IEEE Trans Neural Netw Learn Syst 2022; 34(9): 6339–6353. https://doi.org/10.1109/TNNLS.2021.3135877

33.

Habbouche

Amirat

Benkedjouh

, et al. Digital twin-based gearbox fault diagnosis using variational mode decomposition and dynamic vibration modeling. Measurement 2025; 246: 116669. https://doi.org/10.1016/j.measurement.2025.116669

34.

Taylor

Merényi

. Automating t-SNE parameterization with prototype-based learning of manifold connectivity. Neurocomputing 2022; 507: 441–452. https://doi.org/10.1016/j.neucom.2022.07.009