An AI vision-based steel wire rope joint twitch detection analysis method
By using AI visual inspection methods to continuously track and finely analyze the pulling motion of wire rope joints, the problems of missed detection and misjudgment in traditional inspection methods are solved, and high-precision, low-misjudgment joint structure safety monitoring is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANXI DEDICATED MEASUREMENT CONTROL CO LTD
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-19
Smart Images

Figure CN122243966A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of AI vision and image analysis technology, and more specifically, to a method for detecting and analyzing the pulling of wire rope joints based on AI vision. Background Technology
[0002] As a key component for connecting and transmitting forces in wire ropes, the installation accuracy and positional stability of wire rope joints directly affect the safe operation of the overall equipment. During the production, installation, and use of wire ropes, joints are prone to positional displacement (i.e., pulling) due to factors such as vibration and uneven stress. If not detected in time, this may lead to joint loosening, accelerated wear of the wire rope, or even safety accidents.
[0003] As the most complex and fatigue-accumulating critical component in the entire wire rope structure, the wire rope joint is highly susceptible to localized loosening, inner wire breakage, and structural slippage under long-term alternating loads, impact loads, and frequent start-stop conditions. During operation, this manifests as transient, sudden, and small-amplitude but impactful jerking behavior. This jerking often occurs in a very short time and within a limited spatial range, and is highly superimposed on the overall periodic vibration and oscillation of the wire rope and the movement of background equipment. Traditional detection methods based on manual inspection or single-point sensor data acquisition struggle to accurately capture its occurrence, leading to missed detections, misjudgments, or delayed responses. Existing vision-based detection methods primarily focus on identifying surface wear, broken wires, and cosmetic defects in the wire rope, lacking specific identification capabilities for transient jerking occurring at the joint area under high-speed operation. They fail to systematically characterize the jerking behavior from the perspectives of motion trajectory, temporal changes, and multi-frame correlation, resulting in insufficient early detection capabilities for potential structural failures at the joint.
[0004] Therefore, there is a need for an AI visual inspection method that can continuously track, finely analyze, and reliably determine the pulling of wire rope joints in complex industrial environments and multi-wire rope parallel operation conditions, so as to achieve real-time detection of the safety status of the joint structure. Summary of the Invention
[0005] To overcome the aforementioned deficiencies of the prior art, embodiments of the present invention provide an AI vision-based method for detecting and analyzing the pulling of wire rope joints to solve the problems mentioned in the background art.
[0006] To achieve the above objectives, the present invention provides the following technical solution: A method for detecting and analyzing the pulling motion of wire rope joints based on AI vision includes the following steps: S1. Extract images containing multiple parallel-running steel wire ropes and their joint areas, and resample the images into two-dimensional unfolded images according to the longitudinal centerline direction of each steel wire rope. S2. In the two-dimensional unfolded image of the initial frame, the structural feature region of the wire rope joint area is divided, and a strip-shaped tracking region distributed horizontally along the image is formed by clustering and merging. S3. Align each frame's two-dimensional unfolded image with the preset initial template using homography, and extract the joint key point set in each strip tracking area; S4. Construct a trajectory sequence based on the center coordinates of the historical key point set of each strip tracking area, and process the trajectory sequence using an interactive multi-model Kalman filter to predict the expected position of each joint in the current frame. S5. Based on the geometric center and predicted position of the joint key point set in the current frame, construct a bipartite graph model. Combine the appearance descriptor similarity of the key point set to construct a probability data association matrix. Use the Hungarian algorithm to match and solve to update the trajectory sequence of all joints. S6. Based on the updated trajectory sequence, calculate the longitudinal displacement difference sequence of each joint across multiple frames and perform change point detection, marking the detected change points as candidate abnormal events. S7. Cluster all candidate abnormal events, merge candidate abnormal events belonging to the same strip tracking area, and output the spatiotemporal image area corresponding to the merged abnormal event as the wire rope joint twitching abnormal area.
[0007] As a further aspect of the present invention, in step S1, resampling the image into a two-dimensional unfolded image specifically includes: A sequence of original images of multiple steel wire ropes arranged in parallel from a top-down perspective is acquired, and geometric distortion correction is performed on each frame of the original image sequence. In the distortion-free image, the longitudinal centerline of each wire rope is extracted by fitting based on the edge features and spatial arrangement direction of the wire rope; Using the longitudinal centerline of each wire rope as a reference, samples are taken at equal intervals along its vertical direction, and the sampled pixel values are arranged in the sampling order to generate a two-dimensional unfolded image of the wire rope.
[0008] As a further aspect of the present invention, in step S2, forming a strip-shaped tracking region distributed horizontally along the image specifically includes: Convolve the two-dimensional unfolded map of the initial frame, calculate the consistency score of phase information of each pixel in different directions and scales, generate a phase consistency response map, and perform threshold segmentation and morphological closing operation to obtain connected structural feature regions. Extract the location coordinates of each structural feature region in the image and the average gray value within the region to construct a feature matrix. Calculate the similarity between each structural feature region based on the feature matrix and construct a similarity matrix. The similarity matrix is decomposed into features, and the feature vectors corresponding to the first few largest feature values are selected to construct a low-dimensional embedding space. In this space, K-means clustering is performed on all structural feature regions. The regions that are continuously distributed along the transverse direction of the wire rope in the clustering results are merged to form a strip-shaped tracking region.
[0009] As a further aspect of the present invention, in step S3, extracting the set of key points of the connector within each strip-shaped tracking area specifically includes: Feature points and corresponding descriptor vectors of histogram of directional gradients are extracted from the two-dimensional unfolded graphs of the preset initial template and the current frame, respectively. Feature point matching is performed based on the distance between the descriptor vectors. The homography matrix is estimated iteratively from the matching point pairs using the random sampling consensus algorithm. The matrix with the most inliers is selected as the registration matrix. The registration matrix is applied to perform a perspective transformation on the two-dimensional unfolded image of the current frame to align with the initial template. For each strip tracking region, the difference of Gaussian function is used to construct the scale space and detect extreme points in the aligned image to determine the location coordinates and scale parameters of the key points. Based on the direction distribution of image gradients within the neighborhood of a keypoint, a principal direction is assigned to each keypoint. Based on the scale parameter and the principal direction, a gradient direction histogram is calculated within the rotation-invariant neighborhood of the keypoint, generating a scale-invariant feature transformation descriptor for the keypoint. All keypoints and descriptors are then combined to form a keypoint set.
[0010] As a further aspect of the present invention, in step S4, predicting the expected position of each connector in the current frame specifically includes: From the historical trajectory of each strip tracking area, extract the corresponding key point set center coordinate sequence as the trajectory sequence of the connector; A filter model set containing uniform motion model and uniform acceleration motion model is initialized for each connector. In each filtering cycle, state prediction is performed based on the state estimate of each model at the previous time step, and the predicted state and prediction covariance of each model are calculated. The model likelihood function is calculated based on the information covariance between the joint trajectory sequence points of each frame and the predicted state of each model, and the interaction probability of each model is updated. The predicted states and predicted covariances of all models are weighted and fused based on the updated interaction probabilities, and the positional component of the fused state vector is used as the expected position.
[0011] As a further aspect of the present invention, in step S5, updating the trajectory sequence of all joints specifically includes: Calculate the geometric center of the key point set within each strip tracking region of the current frame as the observation position, and simultaneously obtain the predicted position of each joint; Based on the Mahalanobis distance between the observed and predicted locations and the cosine distance between the appearance descriptor vectors of the corresponding keypoint sets, the association cost between each observed and predicted location is calculated. Using the observed position and the predicted position as two sets of vertices in the bipartite graph, and the association cost as the edge weight connecting the vertices, a complete bipartite graph model is constructed. The bipartite graph model is transformed into a cost matrix, and the optimal matching of the cost matrix is solved using the Hungarian algorithm to obtain the pairing relationship between the observed position and the predicted position. Based on the pairing relationship, the observed position is updated to the corresponding joint trajectory sequence.
[0012] As a further aspect of the present invention, in S6, calculating the longitudinal displacement difference sequence of each joint across multiple frames and performing change point detection, and marking the detected change points as candidate abnormal events specifically includes: Extract the longitudinal components of the corresponding position coordinates from each updated joint trajectory sequence to form the longitudinal coordinate sequence of the joint; The longitudinal coordinate sequence is subtracted from adjacent frames to obtain a longitudinal displacement difference sequence describing the displacement change between frames. A change point detection algorithm based on Bayesian information criterion is used to process the longitudinal displacement difference sequence to identify change points in the sequence where statistical characteristics change abruptly. Each detected variable point is associated with its corresponding frame, connector, and the identifier of the strip tracking region to which it belongs, and is jointly marked as a candidate abnormal event.
[0013] As a further aspect of the present invention, in S7, the spatiotemporal image region corresponding to the merged abnormal event is output as the abnormal region of the wire rope joint pull, specifically including: Extract the frame time information and center coordinates of the strip tracking area to which each candidate abnormal event belongs, construct feature vectors, calculate the distance between feature vectors, and cluster all feature vectors to form several clusters; Examine the banded region identifiers corresponding to the feature vectors within each cluster, remove clusters with different banded region identifiers and retain clusters with consistent identifiers, calculate the minimum bounding rectangle of the image coordinates, and define the twitching abnormal region in conjunction with the covered time period.
[0014] The technical effects and advantages of the AI vision-based steel wire rope joint pull detection and analysis method of the present invention are as follows: This invention constructs a complete AI visual analysis process for detecting the slippage of wire rope joints, enabling continuous and stable tracking, precise motion modeling, and highly reliable anomaly identification of joint areas of multiple parallel-running wire ropes. It can effectively capture and quantitatively analyze the transient slippage behavior of joints under complex industrial field conditions.
[0015] Compared to traditional methods based on manual inspection or simple image difference, this invention significantly reduces the impact of background interference and viewpoint changes on detection accuracy through two-dimensional unfolding modeling and strip-shaped tracking region division. Through multi-model Kalman filtering and bipartite graph data association mechanisms, robust tracking of multi-joint targets under high-speed operating conditions is achieved, avoiding trajectory drift and target loss. By implementing change point detection on the longitudinal displacement difference sequence of the trajectory, sudden twitching and normal periodic vibration are effectively distinguished, improving the sensitivity and accuracy of anomaly identification. Furthermore, spatiotemporal clustering is combined to merge anomalies from multiple frames, outputting structured twitching anomaly regions, reducing false alarms and improving the availability of engineering alarms. The overall solution can operate stably under complex conditions such as strong vibration, multiple interferences, and high-speed operation, possessing high real-time performance, reliability, and engineering adaptability. It provides a high-precision, low-false-judgment intelligent detection method for the safety monitoring of wire rope joint structures, significantly improving equipment operation safety assurance capabilities and maintenance management levels. Attached Figure Description
[0016] Figure 1 This is a schematic diagram of a wire rope joint pull detection and analysis method based on AI vision according to the present invention. Detailed Implementation
[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention. Example 1
[0018] Figure 1 This invention presents a method for detecting and analyzing the pulling motion of wire rope joints based on AI vision, which includes the following steps: S1. Extract images containing multiple parallel-running steel wire ropes and their joint areas, and resample the images into two-dimensional unfolded images according to the longitudinal centerline direction of each steel wire rope. S2. In the two-dimensional unfolded image of the initial frame, the structural feature region of the wire rope joint area is divided, and a strip-shaped tracking region distributed horizontally along the image is formed by clustering and merging. S3. Align each frame's two-dimensional unfolded image with the preset initial template using homography, and extract the joint key point set in each strip tracking area; S4. Construct a trajectory sequence based on the center coordinates of the historical key point set of each strip tracking area, and process the trajectory sequence using an interactive multi-model Kalman filter to predict the expected position of each joint in the current frame. S5. Based on the geometric center and predicted position of the joint key point set in the current frame, construct a bipartite graph model. Combine the appearance descriptor similarity of the key point set to construct a probability data association matrix. Use the Hungarian algorithm to match and solve to update the trajectory sequence of all joints. S6. Based on the updated trajectory sequence, calculate the longitudinal displacement difference sequence of each joint across multiple frames and perform change point detection, marking the detected change points as candidate abnormal events. S7. Cluster all candidate abnormal events, merge candidate abnormal events belonging to the same strip tracking area, and output the spatiotemporal image area corresponding to the merged abnormal event as the wire rope joint twitching abnormal area.
[0019] In step S1, the image is resampled into a two-dimensional unfolded image.
[0020] An industrial-grade area array camera was deployed above the wire rope running channel, with its optical axis perpendicular to the wire rope running plane, to acquire a top-down image sequence covering all parallel wire ropes and their joint areas. Timestamps for each frame were recorded synchronously during acquisition to ensure consistency in subsequent time-series analysis. To address on-site installation errors and lens distortion, pre-acquired standard checkerboard calibration images were used to calibrate the camera's intrinsic and distortion parameters. Based on the calibration results, radial and tangential distortion corrections were performed on each frame of the original image sequence to obtain distortion-free images. Subsequently, based on the continuous high-contrast edge structure of the wire ropes in the distortion-free images, a deep convolutional feature extraction network targeting the wire ropes was constructed. Multi-scale convolution operations were performed on the images to extract the response features of the wire rope edge regions at different scales, and stable edge feature maps were generated through feature fusion. Furthermore, considering the spatially parallel and nearly parallel distribution of the wire ropes, directional consistency constraints were applied to the edge feature maps, strengthening edge responses consistent with the main direction of the wire ropes and effectively suppressing noise responses in non-target directions. Subsequently, connected component analysis and region skeleton extraction methods were used to separate the continuous edge contours corresponding to each wire rope in the edge feature map. Least square fitting was then performed on the pixels within the contours to obtain the longitudinal centerline representation of each wire rope in the current frame. By smoothing the centerline fitting results of multiple consecutive frames, a stable and reliable longitudinal centerline of the wire rope was obtained.
[0021] After obtaining the longitudinal centerline of each wire rope, considering the characteristics of the wire ropes being long and continuous along the running direction with limited lateral structural changes, a local orthogonal coordinate system perpendicular to the longitudinal centerline is constructed using the longitudinal centerline as the geometric reference. Within this coordinate system, sampling lines with fixed pixel intervals are set along the normal direction of the centerline, and corresponding image pixel values are extracted from each sampling line at a set interval. The sampling interval is uniformly set based on the wire rope diameter and image resolution; for example, to cover the full width of the wire rope diameter, the coverage width is set to 1.2 times the number of pixels corresponding to the wire rope diameter, thus ensuring that the main body and joint structure of the wire rope are completely sampled. The sampling step size along the longitudinal centerline direction is matched and set according to the wire rope running speed and camera frame rate to maintain spatial continuity between adjacent unfolded rows and avoid structural breaks or overlaps. After completing the two-dimensional sampling, the pixel values obtained from each sampling line are arranged sequentially according to the wire rope running direction, constructing a two-dimensional matrix with the longitudinal centerline as the unfolding axis and the vertical sampling as the unfolding width, thus forming a two-dimensional unfolded image of the wire rope. The unfolded image geometrically maps the bent steel wire rope in the original image into a regular rectangular region, making the joint area appear as a region of local structural abrupt change in the unfolded image, which significantly reduces the interference of the overall motion background on subsequent feature extraction and trajectory modeling.
[0022] In S2, a strip-shaped tracking region is formed that is distributed horizontally along the image.
[0023] The first frame of the two-dimensional unfolded image, acquired at the start of the wire rope operation or during the initial stable operation of the system, is selected as the initial frame image. This initial frame image contains complete structural information of multiple wire ropes in their unfolded state. For the wire ropes and their joint areas, which exhibit distinct directional and local texture abrupt changes in the unfolded image, multi-scale, multi-directional convolution kernels are used to perform convolution operations on the initial frame of the two-dimensional unfolded image to obtain the phase response features of the image at different directions and scales. During convolution, the phase change trend of the local neighborhood of each pixel is calculated simultaneously in multiple directions to obtain the phase consistency score of each pixel in structural boundaries, texture abrupt changes, and brightness abrupt changes. This effectively enhances the response intensity of the wire rope joint area and its surrounding structural contours and significantly suppresses interference from background areas and uniformly textured areas. Subsequently, based on the statistical distribution of the phase consistency score, the response image is thresholded. The threshold is determined using an adaptive setting method based on the phase response histogram of the entire image. For example, the top 10% to 15% interval of the phase consistency response value distribution is selected as the significant structural response threshold interval, thus ensuring that the main body of the wire rope and the joint area are stably preserved. After thresholding, morphological closing operations are performed on the resulting binary image. By connecting and smoothing small-scale fracture structures, isolated noise points are eliminated and tiny voids inside the structure are filled, ultimately obtaining a set of structural feature regions composed of multiple connected regions.
[0024] After obtaining the connected structural feature regions, for each region, the set of pixels in its two-dimensional unfolded image is traversed and statistically analyzed to calculate the coordinates of its geometric center, representing its spatial distribution in the image. Simultaneously, the grayscale values of all pixels within the region are accumulated and averaged to obtain grayscale feature parameters reflecting the overall brightness and texture intensity of the region. By combining the geometric center coordinates and average grayscale values of the structural feature regions, a unified format region feature vector is constructed, forming a feature matrix encompassing both spatial location and texture brightness information. This feature matrix reflects the spatial proximity and appearance similarity between different structural feature regions in terms of numerical distribution. Based on this, a method based on a joint metric of Euclidean distance and grayscale difference is used to calculate the distance between the feature vectors of any two structural feature regions, obtaining a similarity value reflecting their spatial proximity and appearance consistency. During the distance calculation, the spatial location component and the grayscale component are normalized to eliminate the influence of dimensional differences on the similarity calculation, ensuring a balanced contribution of spatial distribution and texture features in the similarity measurement. Subsequently, the similarity values calculated between all pairs of structural feature regions were uniformly organized to construct a complete similarity matrix. This matrix uses the structural feature region number as the row and column index, and the matrix element values represent the degree of similarity between different regions, thus depicting the spatial organization and texture similarity relationship of all structural feature regions in the wire rope unfolding diagram as a whole.
[0025] After obtaining the similarity matrix, eigenvalue decomposition is performed to extract its main eigenvalues and corresponding eigenvectors. By analyzing the amplitude distribution of the eigenvalues, the eigenvectors corresponding to the top few largest eigenvalues with a cumulative contribution rate reaching a preset proportion are selected to construct a low-dimensional embedding space that reflects the main distribution characteristics of the structural feature regions. This embedding space effectively reduces the dimensionality complexity of the original feature space while maintaining the relative similarity between regions. Subsequently, the coordinates of all structural feature regions in the low-dimensional embedding space are used as clustering input, and the K-means clustering algorithm is used to group them. The number of clusters is set according to the number of horizontally arranged wire ropes in the unfolded diagram. For example, when the unfolded diagram contains six parallel wire ropes, the number of clusters is set to six, so that each clustering result corresponds to the main structural region of one wire rope in spatial distribution. After initial clustering, connectivity analysis is performed on the lateral distribution of structural feature regions within each cluster in the original two-dimensional unfolded diagram. Multiple structural feature regions that are continuously arranged in the lateral direction and whose spacing is within the width of the wire rope are merged, ultimately forming a strip-shaped tracking area that extends laterally along the wire rope and continuously covers the longitudinal direction. This strip-shaped tracking area spatially corresponds stably to the unfolded area of each wire rope and its joint.
[0026] In step S3, the set of key points of the connector is extracted in each strip tracking area.
[0027] A preset initial template is constructed. The initial template is a two-dimensional unfolded image of a frame acquired when the wire rope system was in a stable operating state, which serves as a reference image. This template contains complete and clear information about multiple wire ropes and their joint structures, and is used as a unified spatial alignment reference for all subsequent frames. After obtaining the two-dimensional unfolded image of the current frame, feature extraction based on directional gradient statistics is performed on both the initial template and the current frame image. By calculating the magnitude and direction distribution of the gray-level gradient in the local neighborhood of the image, a histogram descriptor of the directional gradient is constructed. Pixels with significant abrupt changes in gradient magnitude and stable direction distribution are used as feature points, thereby extracting a set of feature points with stable geometric structure and good discriminability from the two images. Subsequently, pairwise distance calculations are performed on the feature point descriptor vectors extracted from the initial template and the current frame, and an initial matching relationship between feature points is established based on the minimum distance criterion. To avoid mismatches affecting the subsequent registration accuracy, a ratio constraint strategy is further introduced. For each pair of candidate matching points, a ratio judgment is made between the optimal matching distance and the second-best matching distance, and point pairs with low matching confidence are filtered out to obtain a set of high-confidence matching point pairs. Based on this, a random sampling consensus algorithm is used to iteratively process the matching point pairs. In each iteration, the minimum number of matching point pairs is randomly selected to estimate the homography transformation model, and a consistency check is performed on all matching points. The number of interior points that conform to the model constraints is counted. This random sampling and verification process is repeated until the maximum set number of iterations is reached. Finally, the set of model parameters with the largest number of interior points is selected as the optimal homography matrix for the current frame relative to the initial template, thereby achieving accurate spatial registration of the wire rope unfolding diagram under complex motion backgrounds.
[0028] A perspective transformation is performed on the current frame's 2D unfolded image, mapping it uniformly to the same spatial coordinate system as the initial template. This eliminates spatial offsets caused by minor camera shake, changes in the overall pose of the wire rope, and optical imaging distortion, ensuring geometric consistency across consecutive frames. After spatial alignment, multi-scale image representations are independently constructed within each strip-tracking region. Gaussian smoothing operations with different scale parameters are applied to the image of this region, forming a series of smoothed images from fine to coarse. Difference operations are performed on the smoothing results between adjacent scales to obtain a Gaussian difference image sequence reflecting the differences in local texture structure response under scale changes. Within this sequence, a 3D extremum search is performed on each pixel within its spatial neighborhood and adjacent scale layers. When a pixel simultaneously satisfies the maximum or minimum response conditions in its local neighborhood within its scale layer and the scale layers above and below, that pixel is identified as a stable extremum point. In this way, a set of candidate keypoints with scale-invariant properties is detected in different scale spaces. Subsequently, fine-grained localization processing is performed on each extreme point. By fitting a quadratic surface to the grayscale distribution of surrounding pixels, its sub-pixel-level position coordinates are corrected. Simultaneously, based on the response level of the extreme point in scale space, its corresponding scale parameters are determined. Through the above processing, the final keypoint position coordinates and scale parameters with good stability in both spatial location and scale dimensions are obtained.
[0029] For each keypoint, a local neighborhood window centered on the keypoint is constructed at its corresponding scale. Gradient magnitude and direction are calculated for pixels within this neighborhood, and a direction histogram is generated based on the gradient direction distribution. By smoothing and peak detection of the direction histogram, the direction with the largest response magnitude is selected as the principal direction of the keypoint, thus assigning a stable directional reference coordinate system to the keypoint. Subsequently, constrained by the keypoint's scale parameters and principal direction, the local region is divided into blocks within its rotation-invariant neighborhood. The pixel gradient direction distribution is statistically analyzed for each sub-region, constructing multiple local gradient histograms. These sub-region histograms are then concatenated in a preset order to form a complete scale-invariant feature transformation descriptor. Amplitude normalization is introduced during the descriptor construction process to reduce the impact of illumination changes and local brightness fluctuations on feature expression, ensuring good stability of the resulting descriptor even under complex industrial lighting environments. By performing the above direction allocation and descriptor generation operations on all keypoints within the strip tracking area, a keypoint set containing the keypoint's spatial coordinates, scale parameters, principal direction information, and corresponding descriptor vectors is formed.
[0030] In step S4, the expected position of each connector in the current frame is predicted.
[0031] For each established keypoint trajectory set within a strip-shaped tracking area, spatial statistical processing is performed on the keypoint sets extracted from the same joint across multiple consecutive frames. The geometric center coordinates of the keypoint set in each frame are calculated, and the trajectory of the geometric center's change over time is used as the joint's original trajectory sequence. This stable and continuous spatial position change characterizes the joint's overall motion behavior. Based on the obtained trajectory sequence, a filter model set is constructed for each joint. This model set includes at least two types of basic motion models: a uniform motion model describing a smooth operating state and a uniformly accelerated motion model describing starting, braking, or impact states. The uniform motion model characterizes the smooth movement trend of the joint under stable traction by linearly extending the position changes between adjacent time points. The uniformly accelerated motion model introduces an acceleration component on top of the position and velocity state quantities to characterize the nonlinear motion changes of the wire rope under conditions such as sudden load changes and tension fluctuations. During model initialization, the initial state of the model is set according to the actual operating parameters of the wire rope system. The initial position is taken from the trajectory point of the first frame, the initial velocity is determined based on the difference in center coordinates between two adjacent frames, and the initial acceleration is set according to the historical trajectory change trend. For example, a small positive acceleration is set in the initial stage of system startup, and zero is set in the stable operation stage. Subsequently, in each filtering cycle, state prediction calculations are performed on the uniform velocity model and the uniform acceleration model respectively. Based on the state estimation results of the previous moment, the joint position, velocity, and acceleration state of the current moment are recursively calculated, and the corresponding prediction covariance matrix is updated synchronously to represent the dynamic changes in prediction uncertainty.
[0032] After obtaining the predicted states and prediction covariances of each motion model, for the joint trajectory sequence points extracted in the current frame, the state residuals between them and the predicted states of each model are calculated. A novelty covariance is then constructed based on the corresponding prediction covariances to quantify the consistency between the actual observations and the model predictions. By analyzing the statistical distribution of the novelty covariance, the matching probability of each model to the actual joint motion state at the current moment is calculated, forming a model likelihood function to characterize the explanatory power of different motion models for the current motion state. Based on the model likelihood function, the interaction probabilities of each motion model within the model set are updated, giving greater weight to models with higher consistency with the current trajectory observation, while correspondingly weakening the weights of models deviating from the actual trajectory. Subsequently, based on the updated interaction probabilities, the state vectors and prediction covariances predicted by each model are weighted and fused, integrating the prediction results of different motion models in the same state space to generate a unified fused state vector and fused covariance matrix. The fused state vector simultaneously contains position, velocity, and acceleration components, where the position component reflects the optimal predicted spatial position of the joint in the current frame. Finally, the position component in the fused state vector is output as the expected position of the connector in the current frame.
[0033] In step S5, the trajectory sequence of all connectors is updated.
[0034] For each strip-shaped tracking region in the current frame, the extracted keypoint set is first subjected to spatial statistical processing of the coordinates of all keypoints in the set, and the coordinates of its geometric center are calculated as the observation position of the corresponding joint in the current frame within the strip-shaped region. The geometric center is obtained by averaging the horizontal and vertical coordinates of all keypoints in the region to reduce the interference of individual abnormal keypoints on the position estimation and make the observation position more stable and reliable. At the same time, the predicted position of each joint in the current frame is read from the aforementioned multi-model filtering prediction results to construct the spatial matching relationship between observation and prediction. For each pair of observation and prediction positions, Mahalanobis distance is calculated based on their spatial differences in the two-dimensional unfolded coordinate system to reflect the statistical closeness of their positional distribution, thereby effectively measuring the spatial consistency between different predicted trajectories and actual observations. On this basis, appearance feature constraints are further introduced. For each keypoint set corresponding to the observation position, an appearance description subset is constructed based on its internal keypoint descriptor vectors. By averaging or weighted fusion of the descriptor vectors within the set, an appearance descriptor vector representing the local appearance structure of the joint is formed. The construction method of the appearance descriptor vector is as follows: After normalizing the generated scale-invariant feature transformation descriptor, a weighted sum is performed based on the spatial distribution weights of key points in the joint structure, so that the descriptor vector can stably characterize the local texture and structural morphology of the joint. Subsequently, the cosine distance between the observed appearance descriptor vector and the historical appearance descriptor vector associated with the predicted trajectory is calculated to quantify the similarity between the two at the texture and structural levels. Finally, the Mahalanobis distance and cosine distance are linearly combined according to a unified normalized scale to construct an association cost that comprehensively reflects the consistency of spatial location and appearance structure.
[0035] After obtaining the association costs between all observed and predicted positions, a bipartite graph structure is constructed, with the current frame's observed position set and predicted position set as two sets of vertices respectively. The observed position vertex set represents the actual joint observation results obtained through keypoint statistics in the current frame, while the predicted position vertex set represents the joint motion estimation results obtained through multi-model filtering. For any pair of observed and predicted position vertices, a weighted edge connecting them is established based on the previously calculated association costs. The edge weight directly reflects the matching probability between the observation and prediction; a smaller weight indicates a higher degree of matching. By performing the above edge-building process on all observed and predicted positions, a bipartite graph model containing complete matching relationship information is constructed. This model structurally describes all possible observation-prediction correspondences in the current frame. To ensure the stability of the matching process, the association costs are uniformly scaled and normalized during bipartite graph construction to maintain a consistent cost distribution between different joints and avoid individual outliers having an excessive impact on the overall matching results. Meanwhile, in order to suppress obviously unreasonable matching relationships, edges whose cost exceeds the set upper limit threshold are eliminated. The threshold is set based on the maximum reasonable displacement range obtained from historical trajectory statistics. For example, twice the maximum displacement change within 30 consecutive frames is selected as the threshold to ensure that only physically reasonable matching candidate edges are retained.
[0036] After constructing the bipartite graph model, the edge weight information in the model is transformed into a two-dimensional cost matrix according to the arrangement of observed and predicted vertices. The row indices of the matrix correspond to the observed position numbers in the current frame, the column indices correspond to the predicted position numbers, and the matrix element values represent the association cost between the corresponding observation and prediction. If there is no reasonable matching relationship between an observation and a prediction, a penalty value much larger than the normal cost range is assigned to it in the cost matrix to prevent it from being selected during the optimal matching process. Subsequently, the Hungarian algorithm is applied to the cost matrix. Through multiple rounds of iterative transformations and minimum value coverage operations on the matrix rows and columns, redundant matching paths are gradually eliminated, ultimately obtaining the matching combination result with the minimum global cost. During the solution process, the Hungarian algorithm ensures that each observed position establishes a unique pairing relationship with only one predicted position, and each predicted position also corresponds to only one observed position, thus achieving a one-to-one optimal matching. Through this process, the final pairing relationship between each joint observed position and predicted trajectory in the current frame is obtained. Based on the pairing relationship, the observation position of the current frame is updated to the trajectory sequence of the corresponding connector, and the appearance descriptor history and motion state information of the connector are updated synchronously; for predicted trajectories that fail to match valid observations, their predicted state is maintained and the loss count is recorded.
[0037] In step S6, the longitudinal displacement difference sequence of each joint between multiple frames is calculated and change point detection is performed. The detected change points are marked as candidate abnormal events.
[0038] For each wire rope joint trajectory sequence after trajectory update, its longitudinal position component is extracted from the two-dimensional unfolded coordinates of the continuous frames. This longitudinal component changes along the wire rope running direction and can directly reflect the dynamic motion state of the joint under traction and load. By arranging the longitudinal coordinates of each frame in chronological order, a longitudinal coordinate sequence of the joint is formed, thus obtaining a one-dimensional time series describing the continuous motion trajectory of the joint. Subsequently, the difference operation between adjacent frames is performed on the longitudinal coordinate sequence, that is, the change in longitudinal coordinate between two adjacent frames is calculated to obtain a longitudinal displacement difference sequence describing the displacement change amplitude of the joint between adjacent time points. Under normal operating conditions, this difference sequence usually exhibits characteristics of small amplitude, smooth change and stable statistical properties. However, when the joint experiences transient jerking, it will exhibit abnormal patterns such as a sudden increase in displacement change amplitude, aggravated fluctuation or abrupt sign change. Based on the above characteristics, a change point detection algorithm based on the Bayesian information criterion is introduced into the longitudinal displacement difference sequence for processing. By comparing the statistical models of the time series under different segmentation assumptions, the trade-off between model complexity and fitting error before and after introducing segmentation points is evaluated, thereby automatically determining the optimal segmentation structure. In the specific process, corresponding segmented statistical models are established for different candidate segmentation positions of the difference sequence, and the evaluation value of each segmented model is calculated according to the Bayesian information criterion. When the introduction of a new segmentation point significantly reduces the overall evaluation value, the position is identified as a candidate change point where the statistical characteristics have abruptly changed. The change judgment criteria are comprehensively determined based on the significant changes in the mean, variance, or distribution pattern of the displacement difference sequence before and after the position. For example, when the mean change of two adjacent sequence segments exceeds twice the historical normal fluctuation range, and the variance increases significantly at the same time, a significant statistical change can be identified at that position.
[0039] After completing the change point detection of the longitudinal displacement differential sequence, each identified change point undergoes spatiotemporal correlation processing to form a structured candidate abnormal event record. Specifically, based on the time index of the change point in the longitudinal displacement differential sequence, the original frame number corresponding to the change point is determined by backtracking, thus clarifying the specific time and location of the anomaly. Subsequently, based on the joint trajectory number to which the longitudinal displacement differential sequence belongs, the specific wire rope joint object where the anomaly occurred is determined, establishing a clear association between each change point and a unique joint target. Further combining the aforementioned trajectory management and strip region division results, the strip tracking region identifier corresponding to the joint in the current frame is read, thereby accurately locating the change point event within the specific wire rope channel and spatial location range. Through the above triple correlation processing, each change point is fully labeled as a candidate abnormal event entity containing time information, joint number, and strip tracking region identifier. To ensure the engineering usability of candidate anomalies, time aggregation is performed on multiple change points occurring within several consecutive frames. When the interval between adjacent change points is less than a set time threshold, they are grouped into consecutive manifestations of the same anomaly process, thus avoiding duplicate marking of the same twitching behavior. The time threshold is set based on the operating speed of the wire rope system and the sampling frame rate. For example, under a sampling condition of 30 frames per second, multiple change points occurring within five consecutive frames are considered consecutive manifestations of the same anomaly event. Through the above-described standardized association and marking process, a set of candidate anomalies with a clear structure and explicit semantics is ultimately formed, which can be directly used for subsequent clustering, merging, and alarm decision-making.
[0040] In S7, the spatiotemporal image region corresponding to the merged abnormal event is output as the abnormal region of the wire rope joint pull.
[0041] For each candidate anomaly set, the frame time information, connector number, and spatial location parameters of its corresponding strip tracking region are read sequentially. The spatial location of the strip tracking region is represented by the geometric center coordinates of the region in the two-dimensional unfolded diagram. By combining the time index of the candidate anomaly with its corresponding spatial center coordinates, a unified format anomaly feature vector is constructed, ensuring that each feature vector reflects the distribution characteristics of the anomaly in both the temporal and spatial dimensions. After constructing the feature vector set of all candidate anomalies, the spatiotemporal distance between any two feature vectors is calculated. The temporal distance is represented by the difference in frame numbers, and the spatial distance is represented by the Euclidean distance of the center coordinates in the unfolded diagram coordinate system. Both types of distances are normalized to ensure they are on a uniform numerical scale, thus avoiding the unbalanced impact of temporal and spatial dimension differences on the clustering results. Based on the normalized spatiotemporal distance, a distance-based clustering analysis method is used to cluster all feature vectors. Anomalies with close spatiotemporal distances are iteratively merged, automatically grouping candidate anomalies with continuously changing spatial locations within adjacent time periods into the same cluster. During clustering, joint constraints on the time interval and spatial distance of anomalies ensure that anomalies within the same cluster physically correspond to the continuous behavior of the same wire rope joint pulling process, while anomalies generated by different pulling processes are naturally separated into different clusters. Through this clustering process, several clusters are ultimately formed, each corresponding to a complete pulling anomaly process represented across multiple time frames.
[0042] After obtaining multiple clusters, for each cluster containing anomaly event feature vectors, the corresponding strip tracking region identifier is read one by one, and a consistency check is performed on the strip region identifiers of all feature vectors within the cluster. When anomaly events from different strip tracking regions are detected within the same cluster, the cluster is determined to be inconsistent in spatial structure, and the entire cluster is removed to avoid incorrectly merging anomaly events from different wire rope joints due to spatiotemporal clustering errors. For clusters with completely consistent strip region identifiers, the spatial coordinate set of all anomaly events within the cluster is further extracted in the two-dimensional unfolded image. The minimum and maximum values of this coordinate set are calculated in the horizontal and vertical directions, respectively, to construct a minimum bounding rectangle region that can completely cover the spatial distribution range of all anomaly events in the cluster. This minimum bounding rectangle accurately marks the spatial coverage range of this twitching anomaly process in the image coordinate system, and can intuitively reflect the specific wire rope location where the anomaly occurred and its affected area. Simultaneously, the frame time indexes of all abnormal events within the cluster are statistically analyzed to determine the start and end frames of the abnormal process, thereby obtaining the duration interval of the twitching anomaly in the time dimension. Finally, the minimum bounding rectangle region is combined with the corresponding time interval to construct a structured description of the wire rope joint twitching anomaly region. This description includes both spatial range and temporal span information, accurately depicting the complete spatiotemporal manifestation of the twitching anomaly. Through the above processing, the reasonable integration and precise positioning of multi-frame discrete abnormal events are achieved, outputting a twitching anomaly region with clear engineering significance and practical usability, providing direct evidence for subsequent alarm triggering, image backtracking, and operational safety assessment.
[0043] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium can be a solid-state drive.
[0044] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0045] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0046] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or modules may be electrical, mechanical, or other forms.
[0047] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0048] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.
[0049] If the aforementioned functions are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0050] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0051] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for detecting and analyzing the pulling motion of wire rope joints based on AI vision, characterized in that, Includes the following steps: S1. Extract images containing multiple parallel-running steel wire ropes and their joint areas, and resample the images into two-dimensional unfolded images according to the longitudinal centerline direction of each steel wire rope. S2. In the two-dimensional unfolded image of the initial frame, the structural feature region of the wire rope joint area is divided, and a strip-shaped tracking region distributed horizontally along the image is formed by clustering and merging. S3. Align each frame's two-dimensional unfolded image with the preset initial template using homography, and extract the joint key point set in each strip tracking area; S4. Construct a trajectory sequence based on the center coordinates of the historical key point set of each strip tracking area, and process the trajectory sequence using an interactive multi-model Kalman filter to predict the expected position of each joint in the current frame. S5. Based on the geometric center and predicted position of the joint key point set in the current frame, construct a bipartite graph model. Combine the appearance descriptor similarity of the key point set to construct a probability data association matrix. Use the Hungarian algorithm to match and solve to update the trajectory sequence of all joints. S6. Based on the updated trajectory sequence, calculate the longitudinal displacement difference sequence of each joint across multiple frames and perform change point detection, marking the detected change points as candidate abnormal events. S7. Cluster all candidate abnormal events, merge candidate abnormal events belonging to the same strip tracking area, and output the spatiotemporal image area corresponding to the merged abnormal event as the wire rope joint twitching abnormal area.
2. The method for detecting and analyzing the pulling of wire rope joints based on AI vision according to claim 1, characterized in that, In step S1, resampling the image into a two-dimensional unfolded image specifically includes: A sequence of original images of multiple steel wire ropes arranged in parallel from a top-down perspective is acquired, and geometric distortion correction is performed on each frame of the original image sequence. In the distortion-free image, the longitudinal centerline of each wire rope is extracted by fitting based on the edge features and spatial arrangement direction of the wire rope; Using the longitudinal centerline of each wire rope as a reference, samples are taken at equal intervals along its vertical direction, and the sampled pixel values are arranged in the sampling order to generate a two-dimensional unfolded image of the wire rope.
3. The method for detecting and analyzing the pulling of wire rope joints based on AI vision according to claim 1, characterized in that, In step S2, forming a strip-shaped tracking region distributed horizontally along the image specifically includes: Convolve the two-dimensional unfolded map of the initial frame, calculate the consistency score of phase information of each pixel in different directions and scales, generate a phase consistency response map, and perform threshold segmentation and morphological closing operation to obtain connected structural feature regions. Extract the location coordinates of each structural feature region in the image and the average gray value within the region to construct a feature matrix. Calculate the similarity between each structural feature region based on the feature matrix and construct a similarity matrix. The similarity matrix is decomposed into features, and the feature vectors corresponding to the first few largest feature values are selected to construct a low-dimensional embedding space. In this space, K-means clustering is performed on all structural feature regions. The regions that are continuously distributed along the transverse direction of the wire rope in the clustering results are merged to form a strip-shaped tracking region.
4. The method for detecting and analyzing the pulling of wire rope joints based on AI vision according to claim 1, characterized in that, In step S3, extracting the set of key points for the joint within each strip-shaped tracking area specifically includes: Feature points and corresponding descriptor vectors of histogram of directional gradients are extracted from the two-dimensional unfolded graphs of the preset initial template and the current frame, respectively. Feature point matching is performed based on the distance between the descriptor vectors. The homography matrix is estimated iteratively from the matching point pairs using the random sampling consensus algorithm. The matrix with the most inliers is selected as the registration matrix. The registration matrix is applied to perform a perspective transformation on the two-dimensional unfolded image of the current frame to align with the initial template. For each strip tracking region, the difference of Gaussian function is used to construct the scale space and detect extreme points in the aligned image to determine the location coordinates and scale parameters of the key points. Based on the direction distribution of image gradients within the neighborhood of a keypoint, a principal direction is assigned to each keypoint. Based on the scale parameter and the principal direction, a gradient direction histogram is calculated within the rotation-invariant neighborhood of the keypoint, generating a scale-invariant feature transformation descriptor for the keypoint. All keypoints and descriptors are then combined to form a keypoint set.
5. The method for detecting and analyzing the pulling of wire rope joints based on AI vision according to claim 1, characterized in that, In step S4, predicting the expected position of each connector in the current frame specifically includes: From the historical trajectory of each strip tracking area, extract the corresponding key point set center coordinate sequence as the trajectory sequence of the connector; A filter model set containing uniform motion model and uniform acceleration motion model is initialized for each connector. In each filtering cycle, state prediction is performed based on the state estimate of each model at the previous time step, and the predicted state and prediction covariance of each model are calculated. The model likelihood function is calculated based on the information covariance between the joint trajectory sequence points of each frame and the predicted state of each model, and the interaction probability of each model is updated. The predicted states and predicted covariances of all models are weighted and fused based on the updated interaction probabilities, and the positional component of the fused state vector is used as the expected position.
6. The method for detecting and analyzing the pulling of wire rope joints based on AI vision according to claim 1, characterized in that, In step S5, updating the trajectory sequence of all connectors specifically includes: Calculate the geometric center of the keypoint set within each strip tracking region of the current frame as the observation position, and simultaneously obtain the predicted position of each joint; Based on the Mahalanobis distance between the observed and predicted locations and the cosine distance between the appearance descriptor vectors of the corresponding keypoint sets, the association cost between each observed and predicted location is calculated. Using the observed position and the predicted position as two sets of vertices in the bipartite graph, and the association cost as the edge weight connecting the vertices, a complete bipartite graph model is constructed. The bipartite graph model is transformed into a cost matrix, and the optimal matching of the cost matrix is solved using the Hungarian algorithm to obtain the pairing relationship between the observed position and the predicted position. Based on the pairing relationship, the observed position is updated to the corresponding joint trajectory sequence.
7. The method for detecting and analyzing the pulling of wire rope joints based on AI vision according to claim 1, characterized in that, In step S6, calculating the longitudinal displacement difference sequence of each joint across multiple frames and performing change point detection, and marking the detected change points as candidate abnormal events specifically includes: Extract the longitudinal components of the corresponding position coordinates from each updated joint trajectory sequence to form the longitudinal coordinate sequence of the joint; The longitudinal coordinate sequence is subtracted from adjacent frames to obtain a longitudinal displacement difference sequence describing the displacement change between frames. A change point detection algorithm based on Bayesian information criterion is used to process the longitudinal displacement difference sequence to identify change points in the sequence where statistical characteristics change abruptly. Each detected variable point is associated with its corresponding frame, connector, and the identifier of the strip tracking region to which it belongs, and is jointly marked as a candidate abnormal event.
8. The method for detecting and analyzing the pulling of wire rope joints based on AI vision according to claim 1, characterized in that, In step S7, the spatiotemporal image region corresponding to the merged abnormal event is specifically included as the abnormal region of the wire rope joint pull-out, including: Extract the frame time information and center coordinates of the strip tracking area to which each candidate abnormal event belongs, construct feature vectors, calculate the distance between feature vectors, and cluster all feature vectors to form several clusters; Examine the banded region identifiers corresponding to the feature vectors within each cluster, remove clusters with different banded region identifiers and retain clusters with consistent identifiers, calculate the minimum bounding rectangle of the image coordinates, and define the twitching abnormal region in conjunction with the covered time period.