A data mining system for multi-granularity analysis
By constructing a multi-granularity logical index tree and introducing residual judgment and adaptive scheduling routing mechanisms, the problem of adapting logical indexes to physical distribution is solved, achieving stable query latency and efficient data mining under extreme operating conditions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU JINSHUDUN INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-19
AI Technical Summary
Existing data processing architectures struggle to resolve the compatibility issues between logical index metadata and the underlying physical distribution under extreme operating conditions, leading to increased retrieval latency and impacting the stability and efficiency of real-time decision-making systems in industrial settings.
A multi-granularity logical index tree is constructed, which records feature vectors in non-leaf nodes, independent of physical distribution. Residual determination and adaptive scheduling routing mechanisms are introduced to block unnecessary logical addressing paths, directly access physical storage blocks, and reduce query latency.
Under extreme conditions, maintain query latency stability, avoid the risk of memory overflow caused by index bloat, and ensure the efficient execution of multi-granularity analysis tasks.
Smart Images

Figure CN122240688A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of data mining technology, and in particular relates to a data mining system for multi-granularity analysis. Background Technology
[0002] Currently, industrial time series data mining is a key support for improving the efficiency of ultra-large-scale continuous manufacturing and ensuring production safety. This field generally adopts a method based on multi-resolution logical views, which builds materialized views of different time granularities on the underlying database to achieve multi-granular macro-trend monitoring of massive industrial production data. With the increasing complexity of manufacturing processes, the demand for anomaly tracing in industrial sites has evolved from macro-yield analysis to micro-root cause drilling. This requires the mining system to present global deviation trends spanning several months within a short time window, as well as accurately locate millisecond-level sensor pulses. This demand for multi-granular continuous iteration has led to a mismatch between logical indexes and physical storage distribution in the existing database architecture. In order to ensure the information fidelity during the data mining process, the existing hierarchical indexing mechanism often needs to carry a large number of statistical features or feature vectors in non-leaf nodes.
[0003] However, when faced with massive amounts of data under extreme operating conditions, existing data processing architectures often only address hardware cluster expansion or basic topology optimization, failing to resolve the compatibility issue between logical index metadata and the underlying physical distribution. Furthermore, existing technical solutions also exhibit limitations when attempting to optimize prediction accuracy through software algorithms or control models. For example, Chinese invention patent application CN120873572A discloses an industrial time series prediction method and system that extracts features through multi-scale segmentation and a dual attention mechanism, and utilizes dynamic embedding technology to construct an adaptive graph structure to characterize the correlations between multiple factors. This is because industrial field sensors are often subject to complex electromagnetic interference, generating high-frequency white noise, or generating noise during continuous equipment failure. Dense abnormal pulses pose a challenge to mathematical fitting indexes, which hold a consensus in the industry. When industrial data loses the convergence of mathematical fitting within block intervals, index nodes based on polynomials or orthogonal transformations significantly increase the coefficient dimension in pursuit of fitting accuracy, or repair reconstruction errors by recording massive discrete residual compensation vectors. This objectively leads to a non-linear increase in the byte length of logical index metadata, resulting in an inverted phenomenon where the index volume exceeds the storage volume of the underlying physical data blocks. In this scenario, the existing logical addressing path will cause CPU cache misses and computational resource dissipation, causing the retrieval latency of cross-granularity analysis to jump from milliseconds to minutes, causing real-time decision-making systems to shut down under extreme abnormal conditions that require the most micro-data support.
[0004] Therefore, the technical problem to be solved by this invention is how to construct a hierarchical index structure with byte capacity awareness, which can maintain high-fidelity feature records, eliminate the addressing disaster caused by logical index expansion under extreme industrial conditions, and ensure constant latency response of multi-granularity analysis tasks under any conditions. Summary of the Invention
[0005] To address the problems mentioned in the background art, the technical solution of the present invention is as follows: A data mining system for multi-granularity analysis, the system comprising: The data input interface unit is used to receive industrial time series data that characterizes the operating status of production equipment; the hierarchical index construction unit is used to extract feature vectors from the industrial time series data and construct a multi-granularity logical index tree; wherein, the feature vectors are recorded in the non-leaf nodes of the multi-granularity logical index tree so that the logical index topology is independent of the physical distribution of the industrial time series data. The residual determination unit is used to calculate the residual sequence between industrial time series data and reconstructed data obtained based on feature vector inversion; and when the residual amplitude of a sampling point in the residual sequence exceeds the preset accuracy threshold, an addressing reset flag is written to the head of the corresponding node in the multi-granularity logical index tree. The adaptive scheduling and routing unit is used to respond to multi-granularity analysis query requests. When the addressing reset flag is read, it blocks the addressing path based on the multi-granularity logical index tree and resets the query path to the physical storage block of industrial time series data to perform block-level data copying and output the feature sequence within the corresponding time window.
[0006] Preferably, the hierarchical index building unit is used to perform nonlinear dimensionality reduction on industrial time series data. The storage volume of the multi-scale feature vectors recorded in the non-leaf nodes is lower than the volume of the original sampled data recorded in the corresponding physical storage block address. This is to suppress the nonlinear expansion of the metadata of the multi-granularity logical index tree when abnormal pulse fluctuations occur in the industrial time series data, so that the average query addressing time of the system fluctuates by less than 15% during the data pulse, thereby ensuring the addressing stability of the multi-granularity query task under I / O throughput constraints.
[0007] Preferably, the residual determination unit is also used to perform multi-resolution feature reconstruction calculation in memory using multi-scale feature vectors when the residual amplitude does not exceed the preset accuracy tolerance threshold, so as to output a macro trend analysis view.
[0008] Preferably, the data input interface unit further includes a pre-filter module; the pre-filter module is used to identify random high-frequency noise in industrial time series data, and to perform noise reduction processing on industrial time series data based on the smoothing coefficient determined by the historical residual variance, so as to reduce the invalid disturbance of the residual sequence under steady-state conditions.
[0009] Preferably, the adaptive scheduling routing unit is also used to obtain the I / O execution flow status of the corresponding physical storage block address when executing the query path reset direction; and when it is determined that the execution flow is in a congested state, to start the cache prefetching action based on the address reset direction flag bit, and load the original sampled data in the target time window into the storage buffer.
[0010] Preferably, the hierarchical index building unit is also used to establish a spatiotemporal correlation mapping in a multi-granularity logical index tree; wherein, the spatiotemporal correlation mapping is used to map multiple industrial time series data across physical channels to a unified logical addressing space to support cross-level industrial data traceability and analysis tasks.
[0011] Preferably, the preset accuracy tolerance threshold is dynamically set based on the sensor's rated accuracy level and the system background noise benchmark; the residual judgment unit is also used to periodically correct the preset accuracy tolerance threshold to offset the measurement drift caused by electromagnetic environment fluctuations in the industrial field.
[0012] Preferably, the residual amplitude is calculated according to the following formula: L=|S_raw-S_rec|, where L is the residual amplitude, S_raw is the value of a specific sampling point in the industrial time series data, and S_rec is the value of the corresponding sampling point in the reconstructed data.
[0013] Preferably, when the adaptive scheduling routing unit performs a reset, it locates the sampling point based on the physical offset of the distributed columnar storage architecture, thereby achieving a 10ms-level read response for feature data.
[0014] Preferably, the system also includes a global anomaly analysis module; the global anomaly analysis module is used to receive the feature sequence output by the adaptive scheduling routing unit and the macro trend analysis view output by the residual judgment unit, perform cross-granularity correlation mining of the industrial production process, and output the anomaly root cause analysis results for assisting manufacturing process optimization decisions.
[0015] Compared with existing technologies, the data mining system for multi-granularity analysis of this invention has the following advantages: 1. In multi-granularity data mining, a data organization topology with function inversion capability is established to eliminate the storage constraints of fixed-granularity views. The original time series data is converted into a vector of basis function coefficients and recorded in the non-leaf nodes of the hierarchical index tree through fractal basis function index construction units. This enables a single index structure to have the ability to continuously describe the operating state of industrial physical objects. This mechanism changes the traditional multi-level storage architecture that relies on fixed-interval sampling or materialized views. When the system receives mining requests with non-preset granularity, it can adaptively calculate the optimal truncation order and perform inverse transformation reconstruction according to the target time granularity constraint. This effectively avoids the storage space expansion caused by building massive redundant materialized views to meet multi-dimensional mining needs and ensures that low addressing latency can be maintained for analysis requests of any span.
[0016] 2. A physical balance between macro-trend characterization and micro-anomaly detection is achieved through a residual bypass compensation mechanism. The feature aggregation entropy generated by the residual bypass extraction unit and the fractal basis function index construction unit works synergistically in the tree structure. The trunk index is responsible for carrying the evolution law of industrial data, while the bypass vector accurately extracts discrete abnormal pulses and their spatiotemporal coordinates that exceed the accuracy threshold. This dual-path retention mechanism ensures that the system can still restore high-frequency physical events that have been smoothed out by the mathematical fitting model during the coarse-grained data reconstruction process. This solves the problem of lost isolated singularities caused by mathematical filtering effect in traditional downsampling mining technology, and provides feature evidence with absolute accuracy for root cause drilling analysis in industrial sites.
[0017] 3. A byte-capacity-aware index topology phase transition mechanism is introduced to avoid retrieval performance collapse under high-entropy conditions. The index topology phase transition controller built into the residual bypass extraction unit introduces a competitive evaluation logic between logical computation cost and physical direct read cost. When continuous faults occur in the industrial field, resulting in a dense cluster of abnormal pulses, the actual total number of bytes of the index node metadata is compared with the number of bytes stored in the underlying physical data block in real time. At the critical point where the logical description cost exceeds the physical storage cost, the index phase transition is forcibly triggered. The system reduces the complexity of the inverse transformation instruction of the basis function to a direct read pointer pointing to the underlying physical memory address, thereby blocking the nonlinear dissipation of processing power caused by massive residual features, solving the memory overflow risk caused by the expansion of index metadata under extreme conditions, and safeguarding the deterministic boundary of query response. Attached Figure Description
[0018] Figure 1 This is the overall architecture and unit interaction diagram of the multi-granularity analysis system of the present invention; Figure 2 This is a flowchart illustrating the logic of the adaptive scheduling routing unit of the present invention executing the query path reset direction. Detailed Implementation
[0019] The technical solutions of the embodiments of this application will be clearly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.
[0020] A data mining system for multi-granularity analysis, the system comprising: The data input interface unit is used to receive industrial time series data that characterizes the operating status of production equipment; the hierarchical index construction unit is used to extract feature vectors from the industrial time series data and construct a multi-granularity logical index tree; wherein, the feature vectors are recorded in the non-leaf nodes of the multi-granularity logical index tree so that the logical index topology is independent of the physical distribution of the industrial time series data. The residual determination unit is used to calculate the residual sequence between industrial time series data and reconstructed data obtained based on feature vector inversion; and when the residual amplitude of a sampling point in the residual sequence exceeds the preset accuracy threshold, an addressing reset flag is written to the head of the corresponding node in the multi-granularity logical index tree. The adaptive scheduling and routing unit is used to respond to multi-granularity analysis query requests. When the addressing reset flag is read, it blocks the addressing path based on the multi-granularity logical index tree and resets the query path to the physical storage block of industrial time series data to perform block-level data copying and output the feature sequence within the corresponding time window.
[0021] Preferably, the hierarchical index building unit is used to perform nonlinear dimensionality reduction on industrial time series data. The storage volume of the multi-scale feature vectors recorded in the non-leaf nodes is lower than the volume of the original sampled data recorded in the corresponding physical storage block address. This is to suppress the nonlinear expansion of the metadata of the multi-granularity logical index tree when abnormal pulse fluctuations occur in the industrial time series data, so that the average query addressing time of the system fluctuates by less than 15% during the data pulse, thereby ensuring the addressing stability of the multi-granularity query task under I / O throughput constraints.
[0022] Preferably, the residual determination unit is also used to perform multi-resolution feature reconstruction calculation in memory using multi-scale feature vectors when the residual amplitude does not exceed the preset accuracy tolerance threshold, so as to output a macro trend analysis view.
[0023] Preferably, the data input interface unit further includes a pre-filter module; the pre-filter module is used to identify random high-frequency noise in industrial time series data, and to perform noise reduction processing on industrial time series data based on the smoothing coefficient determined by the historical residual variance, so as to reduce the invalid disturbance of the residual sequence under steady-state conditions.
[0024] Preferably, the adaptive scheduling routing unit is also used to obtain the I / O execution flow status of the corresponding physical storage block address when executing the query path reset direction; and when it is determined that the execution flow is in a congested state, to start the cache prefetching action based on the address reset direction flag bit, and load the original sampled data in the target time window into the storage buffer.
[0025] Preferably, the hierarchical index building unit is also used to establish a spatiotemporal correlation mapping in a multi-granularity logical index tree; wherein, the spatiotemporal correlation mapping is used to map multiple industrial time series data across physical channels to a unified logical addressing space to support cross-level industrial data traceability and analysis tasks.
[0026] Preferably, the preset accuracy tolerance threshold is dynamically set based on the sensor's rated accuracy level and the system background noise benchmark; the residual judgment unit is also used to periodically correct the preset accuracy tolerance threshold to offset the measurement drift caused by electromagnetic environment fluctuations in the industrial field.
[0027] Preferably, the residual amplitude is calculated according to the following formula: L=|S_raw-S_rec|, where L is the residual amplitude, S_raw is the value of a specific sampling point in the industrial time series data, and S_rec is the value of the corresponding sampling point in the reconstructed data.
[0028] Preferably, when the adaptive scheduling routing unit performs a reset, it locates the sampling point based on the physical offset of the distributed columnar storage architecture, thereby achieving a 10ms-level read response for feature data.
[0029] Preferably, the system also includes a global anomaly analysis module; the global anomaly analysis module is used to receive the feature sequence output by the adaptive scheduling routing unit and the macro trend analysis view output by the residual judgment unit, perform cross-granularity correlation mining of the industrial production process, and output the anomaly root cause analysis results for assisting manufacturing process optimization decisions.
[0030] Example 1: Under the continuous high-frequency mechanical oscillation condition of wafer manufacturing equipment, the industrial time series data received by the data input interface unit contains dense abnormal pulse signals within a millisecond-level time window. The conventional logical index topology records all discrete abnormal features that exceed the precision tolerance threshold in the non-leaf nodes. The total number of actual bytes occupied by these abnormal features in the node memory buffer expands exponentially, and the metadata volume exceeds the physical storage volume of the underlying physical data block. This index storage inversion causes the risk of CPU cache misses and memory overflow, resulting in minute-level oscillations in the retrieval delay of cross-granularity analysis. The hierarchical index construction unit extracts the feature vectors of the industrial time series data and constructs a multi-granularity logical index tree. The feature vectors are recorded in the non-leaf nodes of the multi-granularity logical index tree. The residual determination unit calculates the residual sequence between the industrial time series data and the reconstructed data obtained based on the feature vector inversion. The residual determination unit uses the formula L=|S_raw-S_rec The residual amplitude is calculated; where L is the residual amplitude, S_raw is the value of a specific sampling point in the industrial time series data, and S_rec is the value of the corresponding sampling point in the reconstructed data. The calibration procedure for the preset accuracy tolerance threshold is as follows: Collect baseline time series data for 5 consecutive minutes under no-load steady-state conditions of the production equipment, calculate the mathematical variance of the residuals of adjacent sampling points in the baseline dataset, multiply it by the sensor's rated accuracy coefficient to output the preset accuracy tolerance threshold. When determining the preset accuracy threshold, select 10 consecutive minutes of baseline industrial time series data under no-load steady-state conditions of the production equipment, store the original sampling sequence in the memory buffer using the data input interface unit, extract the feature vector of the baseline dataset through the hierarchical index construction unit and reconstruct it into the reconstructed sequence S_rec, calculate the residual amplitude L corresponding to each sampling point, calculate the mathematical variance σ^2 of all L in the baseline dataset, multiply the mathematical variance σ^2 by the sensor's rated accuracy coefficient K, with a value range of 1.2 to 1.5, and write the obtained value as the preset accuracy threshold into the parameter register of the residual judgment unit.
[0031] Each non-leaf node of the multi-granularity logical index tree contains a reserved 1-bit address reset flag. When the residual determination unit determines that the residual amplitude L of the sampling point exceeds the preset precision threshold, it uses the processor to calculate the physical storage space address offset based on the difference between the global timestamp of the first sampling point in the current time window and the system startup timestamp, combined with the hardware sampling period parameters. This offset is then added to the base physical starting address allocated by the distributed columnar storage architecture to generate an absolute physical memory pointer mapped to the starting position of the underlying physical storage block. The address reset flag is set to 1, and the generated absolute physical memory pointer is stored in the node data area to replace the original feature vector. In the case where dense abnormal pulses cause the residual amplitude of a sampling point in the residual sequence to exceed the preset precision tolerance threshold, the residual determination unit continuously accumulates the total number of actual feature bytes occupied in the memory buffer of the non-leaf node and compares it with the corresponding underlying physical storage... The physical storage bytes of the block in the distributed columnar storage architecture are compared numerically. The residual determination unit configures a 16-bit feature accumulation register at the node head. Whenever the residual amplitude of a sampling point is determined to be out of limit, the determination unit does not traverse the entire tree, but directly increments the current register value by 4 bytes (i.e., the standard storage capacity of a single-precision floating-point number). When the register value reaches the preset physical storage block quota boundary of 4096 bytes, the system triggers the hardware comparator to output a high-level signal, instantly activating the addressing reset write instruction to the flag bit. If the actual total number of feature bytes is greater than or equal to the number of physical storage bytes, the residual determination unit blocks the write path of the feature vector to the corresponding node of the multi-granularity logical index tree, erases the data in the node memory buffer, and writes the addressing reset flag bit and the absolute physical memory pointer mapped to the starting addressing offset of the underlying physical storage block at the head of the corresponding node.
[0032] The adaptive scheduling and routing unit responds to multi-granularity analysis query requests, which carry target time window parameters. The adaptive scheduling and routing unit addresses along the multi-granularity logical index tree. Under the condition of reading the addressing reset flag and the absolute physical memory pointer, the adaptive scheduling and routing unit blocks the addressing path based on the multi-granularity logical index tree and resets the query path to the physical storage block of the industrial time series data. The adaptive scheduling and routing unit uses the absolute physical memory pointer to perform block-level data copying of the underlying industrial time series data from the persistent medium of the columnar storage architecture to the processor's storage buffer, and outputs the feature sequence within the corresponding time window. This physical memory direct read path replaces the inverse calculation of logical features, reducing the complex mathematical inverse transformation to the underlying physical addressing overhead. The average query addressing time of the system fluctuates by less than 15% during the data pulse, ensuring the addressing stability of the multi-granularity query task under input / output throughput constraints.
[0033] Example 2: When the spindle bearing of a wafer manufacturing equipment operates at 20,000 revolutions per minute, transient high-frequency mechanical oscillations are induced. This experiment uses an industrial-grade electric spindle and a triaxial accelerometer to construct a physical test bench. The sampling frequency of the triaxial accelerometer is calibrated to 20kHz and the measurement accuracy is calibrated to 0.1mV. The data acquisition unit extracts the vibration time series data of the spindle throughout its entire life cycle. In order to reproduce the electromagnetic disturbance reality in the industrial workshop, Gaussian white noise with a signal-to-noise ratio of 20dB and power frequency harmonic interference with a frequency of 50Hz are actively injected into the signal transmission link. Two sets of comparative models are set up in the experiment. The control group only deploys a conventional radix tree index and does not include residual judgment and addressing reset modules. The experimental group deploys the complete data mining system claimed in this specification.
[0034] The calibration of the preset accuracy threshold is designed to balance the sensitivity of abnormal pulse capture with the data byte expansion of the multi-granularity logical index tree. When the threshold approaches the lower limit, high-frequency noise spikes cross the judgment boundary and cause index node fragmentation. When the threshold approaches the upper limit, the system risks smoothing out micro-crack oscillation characteristics. The system collects baseline time series data of the spindle running continuously for 10.5 minutes under no-load steady-state conditions. The residual judgment unit calculates the residual amplitude of adjacent sampling points according to the formula L=|S_raw-S_rec|. Where L is the residual amplitude, S_raw is the value of a specific sampling point in the industrial time series data, and S_rec is the value of the corresponding sampling point in the reconstructed data. The residual judgment unit calculates the mathematical variance of the above residual amplitude in the baseline dataset and multiplies the mathematical variance by the rated accuracy coefficient of 1.2 given in the sensor specification. The output preset accuracy threshold is 15.4mV. This calibration procedure transforms parameter determination into a quantitative calculation path based on the sensor's noise floor and steady-state distribution characteristics.
[0035] As the spindle continues to operate and enters the mechanical damage-induced stage with depth gradient characteristics, the data input interface unit continuously receives industrial time-series data mixed with noise. The hierarchical index construction unit extracts feature vectors and writes them to the non-leaf nodes of the multi-granularity logical index tree. Under the initial wear condition with a damage depth of 0.12mm, the abnormal pulse density is low. The average query addressing time of the control group is measured to be 45.2ms, while that of the experimental group is measured to be 42.1ms. When the damage depth expands to the moderate spalling condition of 0.55mm, the dense alternating load induces a surge in high-frequency oscillations. The number of metadata bytes in the multi-granularity logical index tree of the control group shows a non-linear expansion, and its average query addressing time increases to 215.8ms. The residual judgment unit of the experimental group measures that the residual amplitude of a large number of sampling points exceeds the preset accuracy threshold of 15.4mV. The residual judgment unit blocks the logic write path and writes the address reset flag and the absolute physical memory pointer to the head of the non-leaf node. The query addressing time of the experimental group under this condition is significantly reduced. The average time to converge was 46.3ms. When the damage depth crossed the physical critical point of 1.05mm and entered the pre-locking interval, the memory buffer of the non-leaf node in the control group experienced large-scale overflow blocking, and the average time to query and address reached 1845.6ms. The adaptive scheduling routing unit of the experimental group read the addressing reset direction flag and copied the underlying original time series data block from the persistent medium of the columnar storage architecture to the processor's storage buffer using the absolute physical memory pointer. Its average time to query and address remained at 51.2ms. The aforementioned gradient test data covering steady-state to extreme damage conditions indicated that, under the physical condition of preserving the high-frequency abnormal characteristics of the underlying industrial time series data, the experimental group system relied on the addressing reset direction flag to cut off the positive correlation coupling between the volume of the multi-granularity logical index tree and the physical data fluctuation density. Under the condition of multi-dimensional noise intervention and extreme abnormal pulse bursts, the cross-granularity addressing time was constrained to a value range of 55ms, maintaining the constant capacity of the node memory buffer and the determinism of the system addressing time.
[0036] Example 3: The Industrial Internet of Things (IIoT) platform continuously receives industrial time-series data representing the operating status of production equipment. The hierarchical index construction unit extracts the bottom-level industrial time-series data block according to a time window containing 512 consecutive sampling points, calculates the mathematical inner product of the bottom-level industrial time-series data block and the internally fixed orthogonal polynomial basis function group, extracts the discrete coefficient values output by the hierarchical index construction unit, generates a basis function coefficient vector representing the data distribution pattern, and records the basis function coefficient vector as a feature vector in the non-leaf nodes of the multi-granularity logical index tree. The residual determination unit calculates the reconstructed data according to the formula S_rec=C⋅Φ, where S_rec is the reconstructed data, C is the basis function coefficient vector, and Φ is the orthogonal polynomial basis function discrete sampling reconstruction matrix stored in the storage unit. The residual determination unit calculates the difference between the value of a specific sampling point in the industrial time-series data and the value of the corresponding sampling point in the reconstructed data, and generates a residual sequence.
[0037] When the residual amplitude of a sampling point in the residual sequence exceeds the preset precision threshold, the residual determination unit extracts the global timestamp attached to the first sampling point in the current time window. The residual determination unit calculates the time difference between the global timestamp and the system initialization start timetamp, divides it by the fixed sampling period parameter set by the sensor hardware, and outputs the relative time series index value. The residual determination unit multiplies the relative time series index value by the 4 bytes occupied by the single-precision floating-point value in the storage medium and outputs the address offset of the physical storage space. The residual determination unit extracts the reference physical starting address dynamically allocated for the data stream by the distributed columnar storage architecture, adds the reference physical starting address to the address offset, and generates an absolute physical memory pointer mapped to the starting position of the underlying physical storage block. The residual determination unit blocks the write path of the feature vector to the corresponding node of the multi-granularity logical index tree, erases the data in the node memory buffer, and writes the absolute physical memory pointer and address reset to the flag bit into the header of the corresponding node.
[0038] The adaptive scheduling and routing unit responds to multi-granularity analysis query requests carrying target time window parameters, addresses along the multi-granularity logical index tree, and when it reads the addressing reset flag, it blocks the addressing path based on the multi-granularity logical index tree. Based on the absolute physical memory pointer, the adaptive scheduling and routing unit copies the original sampled data sequence from the persistent medium of the columnar storage architecture to the processor's storage buffer, and outputs the feature sequence within the corresponding time window. The physical address mapping operation replaces the inversion and reconstruction operation of the logical feature vector, maintaining the constant size of the multi-granularity logical index tree metadata.
[0039] Example 4: In an engineering scenario where a data mining system is deployed for the first time on a wafer manufacturing equipment with an unknown configuration, the hierarchical index construction unit connects to an offline calibration procedure for the orthogonal polynomial basis function set before starting real-time feature extraction. The calibration control module drives the idle physical equipment to operate and injects a sweep frequency excitation signal into the spindle. The data acquisition unit synchronously records the steady-state response time series. The basis function generation operator divides the time series into calibration data blocks aligned with a preset time window and extracts the covariance matrix of each data block. The basis function generation operator calculates the eigenvalue distribution sequence of the covariance matrix and filters out the set of feature vectors with a cumulative contribution rate greater than 0.95. The calibration control module writes the set of feature vectors into the persistent medium after orthogonalization transformation and outputs the orthogonal polynomial basis function set and the corresponding discrete sampling reconstruction matrix Φ.
[0040] When switching to online data monitoring mode, the hierarchical index construction unit uses an orthogonal polynomial basis function set to calculate the digital inner product of the underlying data block and outputs the basis function coefficient vector C. The residual judgment unit calculates the reconstructed data according to the formula S_rec=C⋅Φ. Here, S_rec is the reconstructed data, C is the basis function coefficient vector, and Φ is the discrete sampling reconstruction matrix. This basis function set internalizes the background resonance mode of the physical device. The vibration of the conventional device is mapped and restored in the reconstructed data. The residual amplitude in the residual sequence converges within the sensor noise floor range. When the device suddenly breaks and generates an abnormal pulse that deviates from this representation range, the residual amplitude of a specific sampling point exceeds the preset accuracy threshold. The residual judgment unit triggers the synthesis of the absolute physical memory pointer and the writing of the addressing reset flag bit. The reset mechanism of the logical index tree is decoupled from the hardware differences of the underlying heterogeneous devices.
[0041] Example 5: When the data mining system is deployed in a continuous manufacturing physical environment with unknown periodic characteristics, the hierarchical index construction unit initiates a calibration procedure for the time window length parameter before performing feature extraction. The data acquisition unit extracts a broad-spectrum vibration time series covering the processing cycle of the production equipment. The computation center applies a fast Fourier transform algorithm to the broad-spectrum vibration time series to extract the lowest fundamental frequency component of the mechanical oscillation. The computation center calculates the total number of discrete sampling points contained within the time window according to the formula W_len=f_s / f_min ⋅K_safe. Wherein, W_len is the time window length parameter, f_s is the hardware sampling frequency of the data acquisition unit, f_min is the lowest fundamental frequency component, and K_safe is the system-fixed oversampling tolerance coefficient. This calibration procedure transforms the scale boundary of the truncated window into a specific value anchored to the inherent vibration frequency of the physical equipment, so that the feature vector contains the periodicity of the underlying mechanical physical waveform while blocking the computational overhead caused by redundant data truncation.
[0042] In a distributed columnar storage architecture that triggers a load balancing migration of backend physical data blocks, the adaptive scheduling routing unit initiates memory address consistency verification logic before retrieving data blocks using absolute physical memory pointers. This unit sends a query command to the storage cluster's metadata registry to retrieve the latest migration epoch timestamp of the target physical data block. The unit compares this latest migration epoch timestamp with the generation timestamp of the absolute physical memory pointer recorded in the header of the corresponding non-leaf node. If the latest migration epoch timestamp is greater than the generation timestamp, the unit erases the invalid pointer address and requests an updated baseline physical address from the metadata registry. The unit also initiates a heartbeat detection logic every 1ms, reading the data block's address information. The 32-bit memory-mapped register of the distributed storage cluster controller obtains the latest address offset of the target data block. If the current physical address is inconsistent with the value recorded in the register, the routing unit forcibly suspends the current I / O execution flow and completes the address rewriting operation of the absolute physical memory pointer within 2ms to ensure that the redirection path always points to the valid physical storage cluster start address. The adaptive scheduling routing unit adds the updated baseline physical starting address to the addressing offset output by the preceding process, synthesizes the calibrated absolute physical memory pointer, and copies the original sampled data sequence from the persistent medium of the columnar storage architecture. This verification flow constructs a synchronous update mechanism between the multi-granularity logical index reset mapping and the underlying data block dynamic topology to maintain the physical addressing continuity of the multi-granularity data mining task in the storage node drift environment.
[0043] The embodiments of this application have been described above with reference to the accompanying drawings. Unless otherwise specified, the embodiments and features in the embodiments of this application can be combined with each other. This application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit of this application and the scope of protection of this invention, and all of these forms are within the protection scope of this application.
Claims
1. A data mining system for multi-granularity analysis, characterized by, The system includes: The data input interface unit is used to receive industrial time series data that characterizes the operating status of production equipment; the hierarchical index construction unit is used to extract feature vectors from the industrial time series data and construct a multi-granularity logical index tree; wherein, the feature vectors are recorded in the non-leaf nodes of the multi-granularity logical index tree so that the logical index topology is independent of the physical distribution of the industrial time series data. The residual determination unit is used to calculate the residual sequence between industrial time series data and reconstructed data obtained based on feature vector inversion; and when the residual amplitude of a sampling point in the residual sequence exceeds the preset accuracy threshold, an addressing reset flag is written to the head of the corresponding node in the multi-granularity logical index tree. The adaptive scheduling and routing unit is used to respond to multi-granularity analysis query requests. When the addressing reset flag is read, it blocks the addressing path based on the multi-granularity logical index tree and resets the query path to the physical storage block of industrial time series data to perform block-level data copying and output the feature sequence within the corresponding time window.
2. The data mining system for multi-granularity analysis according to claim 1, wherein, The hierarchical index building unit is used to perform nonlinear dimensionality reduction on industrial time series data. The storage volume of the multi-scale feature vectors recorded in the non-leaf nodes is lower than the volume of the original sampled data recorded in the corresponding physical storage block address. This is to suppress the nonlinear expansion of the metadata of the multi-granularity logical index tree when abnormal pulse fluctuations occur in the industrial time series data, so that the average query addressing time of the system fluctuates by less than 15% during the data pulse.
3. The data mining system for multi-granularity analysis according to claim 1, wherein, The residual determination unit is also used to perform multi-resolution feature reconstruction calculation in memory using multi-scale feature vectors when the residual amplitude does not exceed the preset accuracy tolerance threshold, so as to output a macro trend analysis view.
4. The data mining system for multi-granularity analysis according to claim 1, wherein, The data input interface unit also includes a pre-filter module; The pre-filter module is used to identify random high-frequency noise in industrial time series data and perform noise reduction processing on industrial time series data based on the smoothing coefficient determined by the historical residual variance, so as to reduce the invalid disturbance of the residual sequence under steady-state conditions.
5. The data mining system for multi-granularity analysis of claim 1, wherein, The adaptive scheduling routing unit is also used to obtain the I / O execution flow status of the corresponding physical storage block address when executing the query path reset direction; and when it is determined that the execution flow is in a congested state, it starts the cache prefetch action based on the address reset direction flag bit to load the original sampled data in the target time window into the storage buffer.
6. The data mining system for multi-granularity analysis according to claim 2, wherein, The hierarchical index building unit is also used to establish spatiotemporal correlation mapping in a multi-granularity logical index tree; wherein, the spatiotemporal correlation mapping is used to map multiple industrial time series data across physical channels to a unified logical addressing space to support cross-level industrial data traceability and analysis tasks.
7. The data mining system for multi-granularity analysis of claim 1, wherein, The preset accuracy tolerance threshold is dynamically set based on the sensor's rated accuracy level and the system background noise benchmark; the residual judgment unit is also used to periodically correct the preset accuracy tolerance threshold to offset the measurement drift caused by electromagnetic environment fluctuations in the industrial field.
8. The data mining system for multi-granularity analysis of claim 1, wherein, The residual amplitude is calculated according to the following formula: L=|S_raw-S_rec|, where L is the residual amplitude, S_raw is the value of a specific sampling point in the industrial time series data, and S_rec is the value of the corresponding sampling point in the reconstructed data.
9. The data mining system for multi-granularity analysis of claim 1, wherein, When the adaptive scheduling routing unit performs a reset, it locates the sampling point based on the physical offset of the distributed columnar storage architecture.
10. The data mining system for multi-granularity analysis of claim 1, wherein, The system also includes a global anomaly analysis module; the global anomaly analysis module is used to receive the feature sequence output by the adaptive scheduling routing unit and the macro trend analysis view output by the residual judgment unit, perform cross-granularity correlation mining of the industrial production process, and output the anomaly root cause analysis results to assist in manufacturing process optimization decisions.