Multi-source heterogeneous big data processing and knowledge extraction system and method based on cross-modal fusion

By assigning identity identifiers to target objects, constructing a hierarchical knowledge dimension system, and performing cross-modal standardization processing, personalized baseline models are generated. This solves the fusion barriers and individual adaptability issues of multi-source heterogeneous big data, realizes accurate association and structured archiving of multi-dimensional data, and improves the accuracy of knowledge extraction and management efficiency.

CN122241607APending Publication Date: 2026-06-19XIANKE GROUP HOLDINGS LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIANKE GROUP HOLDINGS LTD
Filing Date
2026-04-29
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing multi-source heterogeneous big data processing and management technologies suffer from barriers to cross-modal data fusion, poor individual adaptability, and difficulty in balancing timeliness and continuity. This results in insufficient knowledge extraction accuracy and a high risk of misjudging or missing abnormal knowledge, making it difficult to meet the needs of comprehensive and refined management.

Method used

By assigning identity identifiers to target objects, constructing a hierarchical knowledge dimension system, performing cross-modal standardization processing, building a personalized baseline model, and combining time-series validity screening and differentiated time-series processing, a dynamic knowledge indicator set is generated, and weighted fusion and correction calibration are performed to achieve accurate association and structured archiving of multi-source heterogeneous big data.

🎯Benefits of technology

It achieves unified numerical conversion of multi-dimensional data, reduces the risk of misjudgment and omission, improves the accuracy of knowledge extraction, and adapts to the needs of refined big data management and knowledge application in multiple fields.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241607A_ABST
    Figure CN122241607A_ABST
Patent Text Reader

Abstract

This invention discloses a system and method for cross-modal fusion of multi-source heterogeneous big data processing and knowledge extraction, belonging to the field of big data processing and database management technology. It aims to solve the problems of chaotic multi-source heterogeneous big data management, prominent cross-modal fusion barriers, low knowledge extraction accuracy, and poor individual adaptability in existing technologies. This invention assigns a globally unique identifier to the target to be processed, constructs a hierarchical knowledge dimension system and a structured database of multi-source heterogeneous big data; through multi-modal differential quantification, two-stage standardization processing, and time-series validity screening, a dynamic knowledge indicator set is generated; and then, through weighted fusion and anomaly calibration, the knowledge dimension is quantified and graded. This invention effectively improves the efficiency of multi-source big data fusion and the accuracy and individual adaptability of knowledge extraction, and can be widely adapted to the big data management and knowledge application needs of various fields such as health monitoring, industrial operation and maintenance, and agricultural monitoring.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of big data processing and database management technology, and more specifically to a system and method for cross-modal fusion multi-source heterogeneous big data processing and knowledge extraction. Background Technology

[0002] With the rapid development of fields such as industrial internet, smart operation and maintenance, and health management, the demand for comprehensive knowledge management and refined decision support for target objects in various scenarios continues to increase. The sources of monitoring data are constantly expanding, forming a multi-source heterogeneous big data system in various forms. This type of data contains multi-dimensional key information on changes in the state of target objects and is the core foundation for achieving accurate knowledge assessment, anomaly early warning, and hierarchical control.

[0003] However, existing multi-source heterogeneous big data processing and management technologies suffer from three major pain points: First, cross-modal data fusion faces significant barriers, making it difficult to effectively adapt to the structural differences and representational characteristics of different modalities. This hinders the unified quantification and structured integration of multi-source and multi-type data, preventing the formation of collaborative knowledge representation capabilities from large amounts of multi-dimensional data, resulting in low efficiency and low data utilization in big data analysis. Second, the lack of individual adaptive big data standardization mechanisms, relying heavily on industry-standard baselines for data processing, makes it difficult to adapt to the individual characteristics of different target objects, leading to insufficient knowledge extraction accuracy and a high risk of misjudgment and omission of abnormal knowledge. Third, the difficulty in balancing the timeliness and continuity of time-series big data processing makes it impossible to provide stable and reliable data support for accurate knowledge level classification, and fails to meet the needs of multi-dimensional and refined big data management and knowledge application in various scenarios. Therefore, to overcome these limitations, this invention proposes a cross-modal fusion multi-source heterogeneous big data processing and knowledge extraction system and method. Summary of the Invention

[0004] To address the shortcomings of existing technologies, the present invention aims to provide a cross-modal fusion system and method for processing and extracting knowledge from multi-source heterogeneous big data, thereby solving the technical problem that existing technologies struggle to achieve comprehensive, accurate, and individualized knowledge extraction and knowledge level classification from multi-source, multi-type, and heterogeneous big data.

[0005] To achieve the above objectives, the present invention provides the following technical solution:

[0006] Methods for cross-modal fusion of multi-source heterogeneous big data processing and knowledge extraction include:

[0007] Assign identity identifiers to target objects to be processed, construct a hierarchical knowledge dimension system for the target objects, and predefine a set of data indicators associated with the knowledge dimensions; receive multi-source heterogeneous big data bound to the identity identifiers of the target objects in real time, and construct a multi-source heterogeneous structured database of the hierarchical knowledge dimension system of the target objects;

[0008] Based on the modality type corresponding to each data indicator in the multi-source heterogeneous structured database, cross-modal standardization processing is performed on the multi-source heterogeneous big data to convert it into primary standardized indicator values. The general baseline range of each data indicator is matched to construct the personalized baseline model of the target object. The primary standardized indicator values ​​are then subjected to interval mapping secondary standardization processing to obtain the standardized indicator values ​​of each data indicator.

[0009] In response to the knowledge extraction trigger, the standardized indicator values ​​of the associated data indicators are filtered for time-series validity, a sequence of valid standardized indicator values ​​is constructed, and differentiated time-series processing is performed according to the indicator change type identifier to generate knowledge representation indicator values ​​corresponding to each data indicator, and a dynamic knowledge indicator set for the current knowledge extraction is constructed.

[0010] Based on the personalized baseline model, anomaly detection is performed on the knowledge representation index values ​​in the dynamic knowledge index set. All knowledge representation index values ​​under the knowledge dimension are weighted, fused, and corrected to generate the knowledge dimension fusion quantification value corresponding to the knowledge dimension and classify the knowledge level.

[0011] Specifically, the steps for performing cross-modal standardization processing on multi-source heterogeneous big data and converting it into primary standardized index values ​​include:

[0012] A hierarchical retrieval method is used to retrieve all multi-source heterogeneous structured databases belonging to different knowledge dimensions and data indicators, and to classify them by modality type; the modality types include numerical, signal, image, and text types.

[0013] For multi-source heterogeneous big data of different modalities, a modality standardization processing framework is built, and a standardized quantization method is configured for each modality type to perform cross-modal differentiated quantization on multi-source heterogeneous big data of different modalities.

[0014] The standardized quantization method includes numerical truncation, signal feature extraction, image feature level quantization, and text semantic scoring.

[0015] Define the numerical representation attributes of each data indicator, whereby the numerical representation attributes refer to the correlation between the quantified value of the data indicator and the quality of the corresponding knowledge dimension.

[0016] Based on the numerical representation attributes of each data indicator, all multi-source heterogeneous big data of various modal types that have completed cross-modal differentiated quantification are processed in the same direction to obtain the primary standardized indicator values ​​of each multi-source heterogeneous big data, and then bound and archived with the identity identifier and data indicator identifier of the corresponding target object.

[0017] Specifically, the steps for constructing a personalized baseline model of the target object include:

[0018] A pre-constructed population baseline database that matches the domain of the target object is retrieved. The population baseline database is divided into population clusters according to different attribute label combinations. The population clusters are configured with general baseline ranges for each data indicator and population steady-state standardized index values.

[0019] Based on the attribute tags bound to the identity of the target object, an attribute tag vectorization mapping method is adopted to transform the various attribute tags of the target object into attribute feature vectors of a unified dimension.

[0020] The similarity calculation is performed between the attribute feature vector corresponding to the target object and the attribute feature vector of the attribute label combination of each group cluster. The group cluster with the highest label similarity is selected as the target matching group. The general baseline interval corresponding to the target matching group is extracted to form the initial baseline reference range.

[0021] Retrieve the set of primary standardized index values ​​for the target object and filter out the reference standardized index values ​​that fall within the initial baseline reference range;

[0022] Based on the comprehensive deviation between the reference standardized index value and the steady-state standardized index value of the target matching group, the initial baseline reference range is adjusted to form a personalized standard range for the target object.

[0023] The personalized standard intervals corresponding to each data indicator are integrated and encapsulated to form a personalized baseline model for the target object.

[0024] Specifically, the steps of performing interval-mapping secondary standardization on the primary standardized index values ​​to obtain the standardized index values ​​of each data index include:

[0025] Retrieve the personalized baseline model corresponding to the identity of the target object, and extract the personalized standard intervals corresponding to each data indicator as the normalization benchmark;

[0026] For each data metric, determine the lower and upper limits of its personalized standard range, and set a uniform value range applicable to all data metrics.

[0027] The primary standardized indicator value corresponding to each data indicator is transformed into a preset unified numerical range by using its personalized standard range as a reference through linear mapping; after standardization and normalization, a standardized indicator value is formed.

[0028] The standardized indicator values ​​are archived by associating them with the target object's identity identifier, knowledge dimension identifier, data indicator identifier, and collection timestamp.

[0029] Specifically, the step of constructing a dynamic set of knowledge metrics for current knowledge assessment includes:

[0030] Based on the set of data metrics associated with the knowledge dimensions that trigger knowledge extraction, locate the corresponding data metrics and their standardized values.

[0031] For each data metric, based on the pre-configured effective lifecycle of the time series, and combined with the collection timestamp and the current knowledge evaluation timestamp, effective standardized metric values ​​are selected;

[0032] The effective lifespan of the time series is the pre-set effective collection time range for each data indicator.

[0033] The effective standardized indicator values ​​are sorted according to the collection timestamp to construct a time series sequence of the effective standardized indicator values ​​corresponding to each data indicator.

[0034] Based on the pre-configured indicator change type identifiers for each data indicator, differentiated time-series processing is performed to generate knowledge representation indicator values ​​corresponding to each data indicator; and these values ​​are then archived to form a dynamic knowledge indicator set.

[0035] Specifically, the types of indicator changes include instantaneous characteristic indicators and trend characteristic indicators;

[0036] The step of performing differentiated time-series processing based on the pre-configured indicator change type identifiers for each data indicator includes:

[0037] For the instantaneous representation index, the effective standardized index value that is closest to the collection timestamp and the current knowledge evaluation timestamp in the effective standardized index value time series is selected as the representation index value;

[0038] For the aforementioned trend representation index, dynamic temporal weights are allocated through a temporal decay function based on the time interval between the collection timestamp of the effective standardized index value and the current knowledge evaluation timestamp. The time series sequence of the effective standardized index value is then weighted and fused based on the dynamic temporal weights to obtain the knowledge representation index value.

[0039] Specifically, the steps of generating knowledge dimension fusion quantification values ​​corresponding to knowledge dimensions and classifying knowledge levels include:

[0040] Extract all knowledge representation index values ​​and corresponding preset association weights from the dynamic knowledge index set. The preset association weights are quantitative coefficients that represent the importance of each data index to the state representation of the corresponding knowledge dimension.

[0041] Retrieve the personalized baseline model corresponding to the target object, extract the personalized standard intervals corresponding to each data indicator, and use the linear mapping formula consistent with the secondary standardization process to synchronously map the personalized standard intervals to the preset unified numerical intervals to generate personalized judgment intervals corresponding to each data indicator.

[0042] Based on the personalized judgment range of each data indicator, anomaly judgment is made on the knowledge representation indicator values, and abnormal indicators are identified and their degree of abnormality is calculated.

[0043] Using the preset association weights of each data indicator within the dynamic knowledge indicator set as the fusion coefficients, the values ​​of all knowledge representation indicators are weighted and summed to obtain the initial knowledge dimension fusion quantification value.

[0044] Based on the number of abnormal indicators, the degree of abnormality, and their corresponding preset association weights, a fusion correction coefficient is constructed to correct and calibrate the initial knowledge dimension fusion quantification value, thereby obtaining the knowledge dimension fusion quantification value; and based on the knowledge dimension grading standard, the knowledge dimension fusion quantification value is mapped to the corresponding knowledge level.

[0045] Specifically, the fusion correction coefficient is constructed as follows:

[0046] The number of abnormal indicators is counted, the sum of the preset correlation weights of all abnormal indicators is calculated, and the average degree of abnormality of all abnormal indicators is calculated.

[0047] Using the proportion of abnormal weights and the average degree of abnormality as core constraint factors, a construction logic that monotonically decreases as the core constraint factors increase is used to generate a fusion correction coefficient. When there are no abnormal indicators, the fusion correction coefficient is set to 1, and no decay correction is applied to the initial knowledge dimension fusion quantification value.

[0048] Specifically, the hierarchical knowledge dimension system includes at least one knowledge dimension, and each knowledge dimension has a predefined set of associated data indicators that includes at least one data indicator.

[0049] The multi-source heterogeneous structured database uses the identity identifier of the target object as the index, knowledge dimension as the main classification dimension, and data indicators as the sub-classification dimension. It performs structured archiving and storage according to the data indicator type to which the multi-source heterogeneous big data belongs.

[0050] A cross-modal fusion multi-source heterogeneous big data processing and knowledge extraction system, including a structured management module, a big data processing module, a dynamic knowledge extraction module, and a knowledge fusion evaluation module;

[0051] The structured management module is used to assign identity identifiers to target objects to be processed, construct a hierarchical knowledge dimension system for the target objects, and predefine a set of data indicators associated with the knowledge dimensions; it also receives multi-source heterogeneous big data bound to the identity identifiers of the target objects in real time, and constructs a multi-source heterogeneous structured database of the hierarchical knowledge dimension system of the target objects.

[0052] The big data processing module is used to perform cross-modal standardization processing on multi-source heterogeneous big data according to the modal type corresponding to each data indicator in the multi-source heterogeneous structured database, convert it into primary standardized indicator values, match the general baseline range of each data indicator, construct the personalized baseline model of the target object, and perform interval mapping secondary standardization processing on the primary standardized indicator values ​​to obtain the standardized indicator values ​​of each data indicator.

[0053] The dynamic knowledge extraction module is used to respond to the knowledge extraction trigger by performing time-series validity screening on the standardized indicator values ​​of related data indicators, constructing a sequence of valid standardized indicator values, and performing differentiated time-series processing according to the indicator change type identifier to generate knowledge representation indicator values ​​corresponding to each data indicator, and constructing a dynamic knowledge indicator set for the current knowledge extraction.

[0054] The knowledge fusion evaluation module is used to determine anomalies in the knowledge representation index values ​​in the dynamic knowledge index set based on the personalized baseline model, perform weighted fusion and correction calibration on all knowledge representation index values ​​under the knowledge dimension, generate the knowledge dimension fusion quantitative value corresponding to the knowledge dimension, and classify the knowledge level.

[0055] The beneficial effects of this invention are:

[0056] This application achieves precise association and structured archiving of multi-source heterogeneous big data by assigning unique identifiers to targets, constructing a hierarchical knowledge dimension system, and predefining a set of related indicators, thus building a fully traceable multi-source heterogeneous big data structured database. By configuring adapted standardized quantification methods for different modalities and combining differentiated and homogeneous processing, it completes unified numerical conversion of multi-source data, breaking down barriers to cross-modal fusion and improving the utilization rate of multi-dimensional data. Furthermore, by matching attribute labels to group baselines and combining them with the target's own steady-state data for correction, it constructs a personalized baseline model adapted to individual characteristics, using this as a benchmark to complete secondary standardization processing, achieving the same value range for all indicators. It achieves comparable transformation and solves the problem of mismatch between general standards and individual characteristics, reducing the risk of misjudgment and omission. By triggering the time-series validity screening during knowledge extraction and the differentiated time-series processing for indicators with different time characteristics, it constructs a dynamic set of knowledge indicators, taking into account the timeliness and continuity of knowledge extraction and eliminating invalid data interference. Finally, it completes the weighted fusion of multiple indicators with preset correlation weights, constructs correction coefficients based on the characteristics of abnormal indicators to calibrate the knowledge evaluation value, generates the knowledge dimension fusion quantitative results and completes the knowledge level classification, solves the problem of the one-sidedness of knowledge representation by a single indicator, improves the accuracy of knowledge extraction, and can be widely adapted to the needs of refined big data management and knowledge application in multiple fields. Attached Figure Description

[0057] Figure 1 This is a flowchart of the cross-modal fusion method for multi-source heterogeneous big data processing and knowledge extraction according to the present invention;

[0058] Figure 2 This invention provides a flowchart of how multi-source heterogeneous big data is transformed into primary standardized index values ​​through cross-modal differential quantification and homogenization processing.

[0059] Figure 3 A flowchart for constructing a personalized baseline model of the target object for this invention;

[0060] Figure 4 This is a flowchart for generating knowledge dimension fusion quantification values ​​corresponding to knowledge dimensions and classifying knowledge levels for this invention. Detailed Implementation

[0061] Example 1

[0062] Please see Figure 1 This embodiment introduces a method for cross-modal fusion of multi-source heterogeneous big data processing and knowledge extraction, including:

[0063] Step S1: Assign a globally unique identity to the target object to be processed, and construct a hierarchical knowledge dimension system that matches the domain to which the target object belongs. The hierarchical knowledge dimension system contains at least one knowledge dimension, and a set of associated data indicators is predefined for each knowledge dimension. The set of associated data indicators contains at least one data indicator that can be acquired through collection and can represent changes in the corresponding knowledge representation. Multi-source heterogeneous big data bound to the target object's identity is received in real time. Multi-source heterogeneous big data includes, but is not limited to, numerical data, signal data, image data, and text data. The collection timestamp, collection source identifier, and data indicator identifier corresponding to each piece of multi-source heterogeneous big data are retained synchronously. Based on the data indicators to which the multi-source heterogeneous big data belongs, it is associated with the corresponding knowledge dimension. Using the target object's identity as the index, the knowledge dimension as the main classification dimension, and the data indicator as the sub-classification dimension, all the associated multi-source heterogeneous big data are structured and archived for storage, constructing a hierarchical knowledge dimension system multi-source heterogeneous structured database, providing complete and traceable data support for subsequent big data analysis, feature extraction, and knowledge quantification evaluation.

[0064] In this embodiment, the target object refers to the subject that needs to be subject to full-dimensional knowledge management and quantitative evaluation, including but not limited to natural persons subject to health status monitoring, industrial machinery and equipment subject to operational status monitoring, electrical equipment subject to maintenance status monitoring, and agricultural planting and breeding subject to growth status monitoring. The hierarchical knowledge dimension system refers to a set of dimensions that are pre-constructed to comprehensively represent the operation, health, and maintenance status of the target object and have hierarchical semantic relationships, based on the knowledge application needs of the target object's domain. Each knowledge dimension corresponds to a core knowledge representation target and evaluation dimension of the target object. Data indicators refer to standardized knowledge representation items that can directly or indirectly represent changes in the corresponding knowledge representation and can be obtained through corresponding collection methods.

[0065] For example, when the target object is a user to be processed in the field of TCM health monitoring, the hierarchical knowledge dimension system includes, but is not limited to, the knowledge dimension of liver function, the knowledge dimension of spleen and stomach function, the knowledge dimension of heart and kidney function, and the knowledge dimension of qi, blood and body fluids. For example, when the knowledge dimension is the knowledge dimension of liver function, its corresponding set of associated data indicators includes, but is not limited to, tongue image data indicators, facial complexion image data indicators, TCM consultation text data indicators, liver function biochemical test numerical data indicators, pulse signal data indicators, and sleep quality monitoring numerical data indicators.

[0066] For example, when the target object is a CNC machine tool in the field of mechanical processing, the hierarchical knowledge dimension system includes, but is not limited to, the spindle operation knowledge dimension, the feed system transmission knowledge dimension, the tool wear knowledge dimension, and the overall machine failure risk monitoring dimension; for example, when the knowledge dimension is the tool wear knowledge dimension, its corresponding set of associated data indicators includes, but is not limited to, spindle vibration signal data indicators, cutting force signal data indicators, workpiece surface defect image data indicators, spindle motor current numerical data indicators, tool feed deviation numerical data indicators, and cumulative machining time numerical data indicators.

[0067] Step S2: For the multi-source heterogeneous structured database constructed in Step S1, which contains multi-source heterogeneous big data belonging to different knowledge dimensions, data indicators, and modal types, standardized quantification methods adapted to the modal types corresponding to each data indicator are adopted. Combined with homogenization processing, the multi-source heterogeneous big data is transformed into primary standardized indicator values ​​through cross-modal differentiated quantification and homogenization processing. The group baseline database is retrieved, and based on the attribute tags bound to the target object's identity identifier, the target matching group is selected through vector similarity calculation. The general baseline interval of all data indicators corresponding to the target matching group is extracted and integrated to form an initial baseline reference range adapted to the target object's group. The comprehensive deviation from the steady-state standardized indicator values ​​of the target matching group is quantified, and the initial baseline reference range is adjusted to form a personalized baseline model specific to the current target object. The personalized baseline model contains personalized standard intervals corresponding to each data indicator, which are used to clarify the normal reference range of each data indicator relative to the target object itself. Using the corresponding personalized standard interval in the personalized baseline model as the normalization benchmark, the primary standardized index values ​​are subjected to interval mapping secondary standardization processing, which maps the primary standardized index values ​​to a preset unified numerical interval, resulting in standardized index values ​​that eliminate differences in dimensions, magnitudes, and index attributes. This allows the quantitative results of different data indicators under the same knowledge dimension to be directly weighted, fused, and compared for evaluation, providing a unified, comparable, and standardized input that is adapted to the individual characteristics of the target object for subsequent cross-modal data fusion and knowledge dimension quantification.

[0068] In this embodiment, attribute tags refer to identification information used to distinguish the individual characteristics, category attributes, and operating environment of a target object. These tags are used to achieve accurate matching between the target object and similar groups in the population baseline database, thereby providing a matching basis for the construction of a personalized baseline model. For example, when the target object is a user to be processed in the field of traditional Chinese medicine health monitoring, the attribute tags include, but are not limited to, the target object's age, gender, constitution type, place of residence, past medical history, and chronic disease information. When the target object is a CNC machine tool in the field of machining, the attribute tags include, but are not limited to, machine tool model information, specification parameters, operating conditions, cumulative runtime, historical maintenance records, and processing object type information.

[0069] Modal types include numerical, signal, image, and text types. Standardized quantization methods refer to the processing methods used for multi-source heterogeneous big data of different modal types to transform unstructured, non-numerical, or heterogeneous data into initial quantized values ​​in a unified expression form. These methods include numerical truncation, signal feature extraction, image feature level quantization, and text semantic scoring. For example, when the target object is a user of traditional Chinese medicine health monitoring, and the data indicator is tongue image data, the tongue coating color feature is extracted through a tongue image segmentation network, and the tongue coating color is classified into levels such as pale white, pale red, red, crimson, and purple according to traditional Chinese medicine diagnostic standards. The color levels are then converted into primary standardized index values ​​within the range of 0–1 using a continuous mapping function. It is confirmed that the larger the index value, the better the knowledge representation, thus completing the homogenization process. Based on the target object's age, gender, physical condition, and other attribute tags, the general baseline interval of the group with the attribute similarity standard is matched from the group baseline database. Combined with the target object's historical healthy tongue image data for correction, a personalized baseline model of the target object's tongue image index is obtained. Then, using the personalized standard interval as a benchmark, the primary standardized index values ​​are linearly mapped to the unified interval [0,1] to obtain standardized index values ​​that can be used for liver function knowledge assessment.

[0070] For example, when the target object is a CNC machine tool and the data index is the spindle vibration signal data index, the effective value of the vibration signal is extracted as the primary standardized index value. Since the smaller the vibration value, the better the knowledge representation, it is transformed inversely to achieve index homogenization. Based on the target object's model, working condition, cumulative running time and other attribute tags, the corresponding group baseline is matched and corrected to obtain a personalized vibration standard interval. Then, the homogenized primary standardized index value is mapped to a unified interval to obtain a standardized index value that can be used for spindle operation knowledge assessment.

[0071] Please see Figure 2 In one embodiment, the specific steps for transforming multi-source heterogeneous big data into primary standardized indicator values ​​through cross-modal differential quantization and homogenization processing include:

[0072] This method performs hierarchical retrieval on multi-source heterogeneous structured databases, extracts all multi-source heterogeneous big data belonging to different knowledge dimensions and data indicators, and classifies modal types to provide a classification basis for subsequent differentiated, standardized, and quantitative processing, thus solving the technical defects of chaotic modal classification and single processing method in existing technologies.

[0073] For multi-source heterogeneous big data of different modal types, a modality standardization processing framework is built, and a standardized quantization method that is highly adapted to the data characteristics and representation logic of each modality type is configured for cross-modal differentiated quantization of multi-source heterogeneous big data of different modal types.

[0074] For example, for multi-source heterogeneous big data corresponding to numerical data indicators, numerical truncation preprocessing and normalization quantization are performed. Specifically, based on the physical meaning boundary, industry standard threshold, or historical steady-state distribution range of the data indicator, a reasonable truncation boundary is set. Extreme abnormal data exceeding the truncation boundary is truncated, retaining valid data within the reasonable range of the data indicator. For example, the equipment temperature indicator [0, 120℃] and the heart rate indicator [40, 180 beats / min]. Then, outlier identification and removal are performed on the truncated valid data, and random errors generated during the collection process are eliminated through data smoothing and normalization. Finally, the processed numerical data is transformed into initial values ​​with a unified expression form, ensuring that the quantification results of numerical data can truly and accurately reflect the knowledge representation characteristics of the corresponding data indicator.

[0075] For multi-source heterogeneous big data corresponding to signal data indicators, signal feature extraction and quantization are used to extract characteristic parameters such as amplitude, variance, and frequency of the signal, transforming unstructured signal waveforms into structured feature quantization values, realizing standardized representation of signal data, and solving the problems of signal data being difficult to quantify directly and having inconsistent representation.

[0076] For multi-source heterogeneous big data corresponding to image data indicators, image feature hierarchical quantization is adopted to extract visual features such as color, texture, and shape of the image. Based on the semantic connotation and representational meaning of visual features, hierarchical mapping or continuous function mapping is used to transform visual features into standardized quantitative values, realize the quantifiable expression of image data, and ensure that image features can accurately correspond to the knowledge representation changes of data indicators.

[0077] For multi-source heterogeneous big data corresponding to text-based data indicators, text semantic scoring quantification is adopted. Semantic information in the text is extracted and combined with the knowledge application requirements of data indicators. The text semantic information description is transformed into standardized scoring values ​​corresponding to knowledge representations, realizing the quantitative transformation of text-based data and solving the technical problem that text-based data is highly fuzzy and difficult to use directly for quantitative evaluation.

[0078] Define the numerical representation attributes of each data indicator. The numerical representation attributes refer to the relationship between the quantitative value of the data indicator and the quality of the corresponding knowledge dimension. They are used to define the direction of change of the quantitative value and the meaning of knowledge representation, provide a basis for judgment for homogenization processing, and ensure that the quantitative values ​​of different data indicators have a unified representation logic. This includes three types: those where larger values ​​indicate better knowledge representation, those where smaller values ​​indicate better knowledge representation, and those where the value deviates further from the benchmark value, indicating worse knowledge representation. Based on the numerical representation attributes of each data indicator, all multi-source heterogeneous big data of each modality type that has completed cross-modal differentiated quantization undergoes a homogenization process. Specifically, for quantified values ​​indicating better knowledge representation (larger values), the original quantified value remains unchanged; for quantified values ​​indicating better knowledge representation (smaller values), a homogenization transformation is performed to ensure that the direction of change of the quantified value is consistent with that of the type indicating better knowledge representation (larger values); for quantified values ​​indicating worse knowledge representation (the value deviates further from the benchmark value), a benchmark deviation quantization transformation is performed to convert the original quantified value into a quantified value representing the degree of deviation from the benchmark value, and then homogenization calibration is performed to ensure that this type of quantified value is consistent with the representation logic of other types of quantified values. Through the above differentiated homogenization processing, the differences in quantification logic caused by different numerical representation attributes are eliminated, and the primary standardized indicator values ​​of each multi-source heterogeneous big data are obtained.

[0079] All primary standardized index values ​​that have completed the homogenization process are standardized and regularized. Using a unified format encapsulation technology, the primary standardized index values ​​corresponding to different modal types and different data indicators are bound and archived with the identity identifier and data indicator identifier of the corresponding target object. This provides standardized quantitative input for subsequent personalized state baseline matching and secondary standardization processing.

[0080] Please see Figure 3 In one embodiment, the specific steps for constructing a personalized baseline model of the target object include:

[0081] When the target object's multi-source heterogeneous structured database accumulates historical big data that meets the preset minimum data volume requirement, the preset minimum data volume refers to the lower limit of the steady-state data sample size that is pre-set to ensure the statistical reliability of individual steady-state characteristics and covers the entire physiological cycle. This can be preset based on statistical confidence, domain experience, or iterative verification. For example, it can be based on 30 to 50 groups of steady-state samples or no less than 3 complete operating cycles of continuous data to meet the central limit theorem. A pre-constructed group baseline database that matches the target object's domain is retrieved. The group baseline database is formed by hierarchical clustering based on the historical steady-state monitoring data of multiple target objects in the same domain. The database is divided into several group clusters according to different attribute label combinations. Each group cluster is configured with a general baseline range for each data indicator and a standardized group steady-state indicator value, providing a standardized group reference for subsequent baseline matching. The general baseline interval refers to the normal fluctuation range of each data indicator formed by the same group in the same field under steady-state conditions. It is used to characterize the general reference interval of each data indicator of the same group. The steady-state state refers to the state in which the target object is free from abnormalities and faults and meets the normal operation or health standards of the field. The group steady-state standardized index values ​​are pre-constructed offline in the following ways: collect a large amount of historical big data of the target objects in the same field, filter out the historical data that is judged to be in a steady state according to the standards of the field, and perform cross-modal differential quantification on the filtered steady-state historical data through standardization quantification methods and homogenization processing to obtain the individual steady-state standardized index values ​​of each target object; perform hierarchical clustering of each target object based on attribute labels, perform statistical analysis on the individual steady-state standardized index values ​​of all target objects in the same group cluster, and finally determine the set of group steady-state standardized index values ​​corresponding to the group cluster, and store them in the group baseline database.

[0082] Based on the attribute tags bound to the target object's identity, an attribute tag vectorization mapping method is adopted to transform the various attribute tags of the target object into attribute feature vectors of a unified dimension, providing a standardized data format for subsequent similarity calculation.

[0083] By employing a vector similarity calculation method, the attribute feature vector corresponding to the target object is compared with the attribute feature vector of the attribute label combination of each group cluster in the group baseline database. This yields the label similarity between the target object and each group cluster, enabling accurate matching and filtering of the target object with similar groups.

[0084] The group cluster with the highest label similarity is selected as the target matching group. The general baseline range of all data indicators corresponding to the target matching group is extracted and integrated to form the initial baseline reference range that adapts to the target object's group.

[0085] After retrieving multi-source heterogeneous monitoring data of the target object and performing cross-modal differential quantification and homogenization processing, a set of primary standardized index values ​​of the target object is obtained.

[0086] The set of primary standardized index values ​​of the target object is compared and filtered with the initial baseline reference range. The primary standardized index values ​​that fall within the initial baseline reference range are retained as the reference standardized index values ​​of the target object, which are used to characterize the historical data distribution status of the target object that conforms to the steady-state characteristics of the same group.

[0087] Based on the comprehensive deviation between the reference standardized index value of the target object and the steady-state standardized index value of the target matching group, the initial baseline reference range is adjusted by interval correction to eliminate the deviation between the general baseline of the group and the individual data characteristics of the target object, thus forming personalized standard intervals for each data index specific to the current target object.

[0088] Specifically, the steps for interval correction of the initial baseline reference range include:

[0089] The individual deviations of each reference standardized index value under the same data index of the target object and the steady-state standardized index value of the target matching group are calculated separately. All individual deviations are statistically analyzed to determine the overall deviation level. Based on the overall deviation level, the upper and lower boundaries of the initial baseline reference range are synchronously shifted and adjusted. Specifically, the deviation offset between the distribution center of the target object's reference standardized index value and the distribution center of the target matching group's steady-state standardized index value is calculated. This offset is then synchronously superimposed on the upper and lower boundaries of the initial baseline reference range, keeping the interval width unchanged. This ensures that the adjusted interval center matches the distribution center of the target object's reference standardized index value, thereby forming a personalized standard interval specific to the current target object.

[0090] The personalized standard ranges corresponding to each data indicator are integrated and encapsulated to form a personalized baseline model that is uniquely bound to the target object's identity. The personalized baseline model is then archived and stored in association with the target object's identity, providing an individual-specific normalized benchmark for subsequent secondary standardization processing.

[0091] In one embodiment, the specific steps for performing interval-mapping secondary standardization on the primary standardized index values ​​include:

[0092] The system retrieves a personalized baseline model uniquely bound to the target object's identity, extracts personalized standard intervals corresponding to each data indicator, and uses these personalized standard intervals as the normalization benchmark for interval mapping of the current data indicators, providing a reference for the unified processing of indicators with different numerical ranges.

[0093] For each data indicator, determine its personalized standard range lower and upper limits, and set a unified numerical range applicable to all data indicators, so that different indicators have consistent numerical range constraints after processing. The unified numerical range refers to the target numerical range uniformly set for all data indicators for standardized mapping, used to unify the numerical range, dimensions, and magnitude differences of different data indicators, and to achieve comparability and fusion of multi-source heterogeneous indicators.

[0094] The primary standardized index value corresponding to each data indicator is transformed into a preset unified numerical range by using its personalized standard range as a reference through linear mapping, so that the relative position of the original value in the personalized standard range remains unchanged, thereby achieving a unified representation of indicators of different magnitudes and ranges.

[0095] The numerical values ​​that have completed the linear mapping are standardized and encapsulated using a unified format and precision to form standardized index values ​​that eliminate differences in dimensions, magnitudes, and index attributes.

[0096] The standardized indicator values ​​are linked and archived with the target object's identity identifier, knowledge dimension identifier, data indicator identifier, and collection timestamp, so that different data indicators under the same knowledge dimension can be directly weighted, integrated, and compared for evaluation, providing standardized input for subsequent cross-modal data fusion and knowledge quantification.

[0097] Step S3: When knowledge extraction is triggered, the triggering conditions include any of the following: periodic inspection, abnormal event alarm, exceeding the indicator threshold, or manual active request. Based on the pre-configured time-series effective lifecycle of each data indicator, the standardized indicator values ​​obtained in Step S2 are screened for time-series effectiveness, and a sequence of effective standardized indicator values ​​for each data indicator is constructed. The indicator type is distinguished according to the indicator change type identifier. The indicator change type is used to characterize the time dimension characteristics of the data indicator reflecting the knowledge representation of the target object, including instantaneous representation indicators and trend representation indicators. Instantaneous representation indicators are used to reflect the immediate knowledge representation of the target object at the time of collection, while trend representation indicators are used to reflect the change law and development trend of the knowledge representation of the target object in a continuous time period. For instantaneous representation indicators, the most recent effective value is used as the current knowledge evaluation value, and for trend representation indicators, a time-series decay weighted fusion is used to obtain the knowledge representation indicator value. Based on the target object identity identifier and hierarchical knowledge dimension system, a structured archive is formed to form a dynamic knowledge indicator set for each knowledge dimension used for current knowledge evaluation, providing a time-series reasonable, weight-adapted, stable and reliable input foundation for subsequent cross-modal indicator fusion and knowledge quantification.

[0098] In this embodiment, the effective lifespan of the time series is the effective collection time range pre-set according to the physical meaning, rate of change and domain knowledge assessment requirements of each data indicator; instantaneous characterization indicators include but are not limited to real-time blood glucose values, real-time tongue coating color quantification values, instantaneous vibration amplitude, and instantaneous execution deviation values; trend characterization indicators include but are not limited to body temperature change curves over continuous periods, multi-day dietary records, cumulative processing errors, and periodic pulse change trends; the time series decay function is a function that monotonically decreases with increasing time interval, including but not limited to linear decay functions and exponential decay functions.

[0099] For example, when the target is a user of TCM health monitoring and the data indicator is the knowledge representation indicator value corresponding to the tongue image, the indicator is an instantaneous representation indicator, and only the effective standardized indicator value of the frame closest to the current knowledge evaluation time is selected as the input for the current knowledge evaluation; when the data indicator is a sleep quality indicator for multiple consecutive days, the indicator is a trend representation indicator, and based on the time decay function, higher time weights are assigned to sleep data that are closer to the current date, and a comprehensive sleep quality standardized indicator value is obtained through weighted fusion.

[0100] For example, when the target object is a CNC machine tool and the data index is the knowledge representation index value corresponding to a single spindle vibration signal, the index is an instantaneous representation index, and only the latest collected valid vibration index value is used; when the data index is the tool wear deviation index within a continuous period, the index is a trend representation index, and the knowledge representation index value that can reflect the wear development trend is obtained through time-series decay weighted fusion, ensuring that the evaluation of the knowledge dimension is more in line with the real change law.

[0101] In one implementation, the step of constructing a dynamic set of knowledge metrics for each knowledge dimension for current knowledge evaluation includes:

[0102] When any knowledge extraction is triggered, the corresponding data indicator and its standardized value are located based on the predefined set of related data indicators for that knowledge dimension.

[0103] For each data metric, based on the pre-configured effective lifespan of the time series and combining the data collection timestamp with the current knowledge evaluation timestamp, the time series validity of the standardized metric value is determined. Standardized metric values ​​that have timed out, are incomplete, or deviate abnormally are removed, while valid standardized metric values ​​are retained. Specifically, standardized metric values ​​whose collection timestamps exceed the effective lifespan of the time series, whose data fields are missing, or whose completeness requirements are not met are judged as invalid data and are removed. Only standardized metric values ​​that are within the valid time window, have complete data, and have reasonable values ​​are considered valid standardized metric values.

[0104] The retained effective standardized indicator values ​​are sorted according to the collection timestamp, and a time series sequence of effective standardized indicator values ​​corresponding to each data indicator is constructed.

[0105] Based on the pre-configured indicator change type identifiers for each data indicator, instantaneous representation indicators and trend representation indicators are distinguished, and differentiated time series processing is performed on each. Based on the time series sequence of the effective standardized indicator values ​​corresponding to each data indicator, the knowledge representation indicator values ​​corresponding to each data indicator are generated.

[0106] Specifically, for instantaneous representation indicators, the effective standardized indicator value whose timestamp is closest to the current knowledge assessment timestamp within the corresponding effective standardized indicator value time series is selected as the knowledge representation indicator value for the current knowledge assessment.

[0107] For trend-representation indicators, dynamic time-series weights are allocated through a time-series decay function based on the time interval between the collection timestamp of each effective standardized indicator value in the corresponding effective standardized indicator value time-series sequence and the current knowledge evaluation timestamp. Based on the dynamic time-series weights, the effective standardized indicator value time-series sequence is weighted and fused to obtain the knowledge representation indicator value of the indicator in the current evaluation period.

[0108] Using the identity identifier of the target object as an index, and following a structure of knowledge dimension as the main classification and data indicators as the sub-classification, all knowledge representation indicator values ​​are uniformly archived to form a dynamic knowledge indicator set that is adapted to the current evaluation node, has reasonable time constraints, and adaptive weight allocation. This provides a standardized and highly reliable input foundation for subsequent cross-modal indicator fusion and knowledge quantification evaluation.

[0109] Step S4: For the dynamic knowledge indicator set constructed in Step S3, extract the knowledge representation indicator values ​​and preset association weights of all data indicators under the corresponding knowledge dimension; based on the linear mapping relationship between the personalized baseline model constructed in Step S2 and the secondary standardization process, synchronously map the personalized standard intervals corresponding to each data indicator to the preset unified numerical interval, generate the personalized judgment intervals of each data indicator under the unified numerical interval, and perform anomaly judgment on the knowledge representation indicator values ​​of each data indicator in the dynamic knowledge indicator set based on the personalized judgment intervals, clarifying the abnormal state and degree of anomaly of a single data indicator; using the preset association weights corresponding to each data indicator as the core fusion basis, and combining the abnormal state and degree of anomaly of a single data indicator, perform weighted fusion and correction calibration on all knowledge representation indicator values ​​under the knowledge dimension, generating the knowledge dimension fusion quantitative value corresponding to the knowledge dimension; based on the preset knowledge dimension grading standard of the target object's domain, classify the knowledge dimension fusion quantitative value to the corresponding knowledge level. The quantitative value and knowledge level of each knowledge dimension are integrated, along with the anomaly judgment results and anomaly degree of each data indicator in the corresponding dynamic knowledge indicator set. These are then bound and archived one by one with the unique identity of the target object, the knowledge dimension identifier, and the knowledge evaluation timestamp. This forms an independent quantitative evaluation result for each knowledge dimension of the target object, providing accurate quantitative basis for subsequent multi-dimensional knowledge positioning and anomaly tracing. At the same time, it solves the technical problems of the one-sidedness of single modality and single indicator evaluation and the inability of multi-source indicators to be comprehensively measured in the same dimension, thereby improving the comprehensiveness and accuracy of knowledge evaluation.

[0110] In this embodiment, the preset association weight is a quantitative coefficient that represents the importance of each data indicator to the corresponding knowledge dimension state representation. The weight value is positively correlated with the degree of influence of the indicator on the monitoring dimension. The sum of the preset association weights of all data indicators under the same knowledge dimension is 1. The knowledge dimension grading standard is preset based on the monitoring needs, industry norms and safety standards of the target object's field. The knowledge level classification thresholds of different fields can be flexibly adjusted, and the grading standard corresponds one-to-one with the numerical range of the unified numerical interval to ensure the consistency and comparability of the knowledge evaluation results of all indicators.

[0111] For example, when the target is a user of TCM health monitoring and the knowledge dimension is liver function knowledge dimension, the dynamic knowledge indicator set corresponding to the liver function knowledge dimension output in step S3 is directly extracted. The set includes knowledge representation indicator values ​​for tongue image, pulse signal, TCM consultation text, liver function biochemical test values, and sleep quality monitoring values. All knowledge representation indicator values ​​are mapped to a unified numerical range of [0,1], and after homogenization, the larger the value, the better the liver function status. Pre-set association weights for each indicator, with liver function biochemical test indicator weight 0.4, pulse signal indicator weight 0.25, tongue image indicator weight 0.2, TCM consultation text indicator weight 0.1, and sleep quality monitoring indicator weight 0.05. All weights are... The sum is 1; based on the user's personalized baseline model, the personalized standard intervals of each indicator are synchronously transformed to the [0,1] interval through a linear mapping relationship consistent with the secondary standardization, generating the personalized judgment intervals corresponding to each indicator; based on the personalized judgment intervals, it is determined whether each knowledge representation indicator value is within the normal range, abnormal indicators are marked and the relative deviation is calculated to determine the degree of abnormality; based on the preset association weights, all knowledge representation indicator values ​​are weighted and summed to obtain the initial knowledge dimension fusion quantification value; combined with the number of abnormal indicators, the degree of abnormality and the corresponding weights, the initial knowledge dimension fusion quantification value is corrected and calibrated to obtain the final knowledge dimension fusion quantification value; finally, according to the preset liver function grading standard in the field of traditional Chinese medicine health monitoring, the final knowledge dimension fusion quantification value is divided into the knowledge levels corresponding to normal, mildly abnormal, moderately abnormal and severely abnormal.

[0112] For example, when the target object is a CNC machine tool and the knowledge dimension is the tool wear knowledge dimension, the dynamic knowledge index set corresponding to the tool wear knowledge dimension output in step S3 is directly extracted. The set includes knowledge representation index values ​​of spindle vibration signal, cutting force signal, workpiece surface defect image, tool feed deviation, and spindle motor current. All knowledge representation index values ​​are mapped to a unified numerical range of [0, 100], and after homogenization, the larger the value, the better the tool wear state. A correlation weight is pre-set for each index, where the cutting force signal index has a weight of 0.35, and the workpiece surface defect image index has a weight of 0.35. The weights of the following indicators are assigned: 0.3 for the trap indicator, 0.2 for the spindle vibration signal indicator, 0.1 for the tool feed deviation indicator, and 0.05 for the spindle motor current indicator. The sum of all weights is 1. Based on the personalized baseline model of the machine tool, the personalized standard ranges of each indicator are synchronously mapped to the unified numerical range [0,100] to generate personalized judgment ranges for each indicator, thus completing the judgment of the abnormal state and degree of abnormality of a single indicator. The initial knowledge dimension fusion quantification value is generated by weighted summation based on the preset correlation weights. After correcting and calibrating the knowledge evaluation value in combination with the abnormal results of the single indicator, it is classified into the corresponding knowledge levels of normal, light wear, moderate wear, and heavy wear according to the preset tool wear grading standard in the field of machining.

[0113] Please see Figure 4 In one embodiment, the specific steps for generating the knowledge dimension fusion quantification value corresponding to the knowledge dimension and classifying the knowledge level include:

[0114] Extract all knowledge representation index values ​​and corresponding preset association weights from the dynamic knowledge index set output in step S3 that corresponds to the current knowledge dimension.

[0115] Retrieve the personalized baseline model corresponding to the target object in step S2, extract the personalized standard intervals corresponding to each data indicator in the dynamic knowledge indicator set, and use the linear mapping formula consistent with the secondary standardization process to synchronously map the personalized standard intervals to the preset unified numerical intervals to generate personalized judgment intervals corresponding to each data indicator.

[0116] Based on the personalized judgment range of each data indicator, anomaly judgment is performed on the corresponding knowledge representation indicator values ​​within the dynamic knowledge indicator set:

[0117] If the knowledge representation index value is lower than the lower limit of the personalized judgment interval, it is judged as an abnormal index; if the knowledge representation index value is within the personalized judgment interval or higher than the upper limit of the interval, it is judged as a normal index. For abnormal indices, the relative deviation between the knowledge representation index value and the lower limit of the personalized judgment interval is calculated as the degree of abnormality of the abnormal index.

[0118] Using the preset association weights of each data indicator within the dynamic knowledge indicator set as the fusion coefficients, the values ​​of all knowledge representation indicators are weighted and summed to obtain the initial knowledge dimension fusion quantification value for the current knowledge dimension.

[0119] Based on the number of abnormal indicators, the degree of abnormality and their corresponding preset association weights, a fusion correction coefficient is constructed to correct and calibrate the initial knowledge dimension fusion quantification value, eliminate the bias effect of single indicator abnormality on the overall evaluation result, and obtain the knowledge dimension fusion quantification value.

[0120] Specifically, the fusion correction coefficient is constructed as follows:

[0121] The number of abnormal indicators under the current knowledge dimension is counted, and two core constraint parameters are calculated simultaneously: first, the cumulative sum of the preset association weights corresponding to all abnormal indicators to obtain the abnormal weight ratio; second, the arithmetic mean of the abnormality degree of all abnormal indicators to obtain the abnormality degree mean.

[0122] Using the proportion of abnormal weights and the average degree of abnormality as core constraint factors, a fusion correction coefficient is generated using a construction logic that monotonically decreases as the core constraint factor increases. For example, the formula for calculating the fusion correction coefficient is:

[0123]

[0124] in, This is the fusion correction coefficient, with a value range of (0,1]. This is the sum of the preset correlation weights of all abnormal indicators under the current knowledge dimension, i.e., the percentage of abnormal weights. Since the sum of the preset correlation weights of all data indicators under the same knowledge dimension is 1, therefore... ∈[0,1]; This represents the average anomaly level of all abnormal indicators under the current knowledge dimension. The anomaly level is the relative deviation between the knowledge representation indicator value and the boundary of the personalized judgment interval, and has been pre-normalized to the [0,1] interval. ∈[0,1]; , The preset weighting and anomaly degree influence coefficients are both constants greater than 0 and can be flexibly adjusted according to the monitoring needs of the target object's field. , The preset weighting and anomaly severity influence coefficients are both constants greater than 0, and can be flexibly adjusted according to the monitoring needs of the target object's domain; typical value examples are as follows: For industrial equipment operation and maintenance scenarios... , Emphasis is placed on the weighting of abnormal indicators; physiological health monitoring scenarios are selected. , It focuses on the degree of abnormal deviation of a single indicator; power system dispatching scenarios take , The weights and anomalies are considered in a balanced way; the default value is used in general scenarios. 0.5 0.5.

[0125] When there are no abnormal indicators =0、 =0, at which point the fusion correction factor is... =1, no decay correction is applied to the initial knowledge dimension fusion quantification value; when the weight ratio of abnormal indicators is higher and the average degree of abnormality is greater, the fusion correction coefficient is increased. It exhibits a monotonically decreasing trend, achieving reasonable calibration of the initial knowledge dimension fusion quantification value and eliminating the biased impact of single indicator anomalies on the overall evaluation results.

[0126] Based on the pre-defined knowledge dimension grading standards of the target object's domain, the final knowledge dimension fusion quantification value is mapped to the corresponding knowledge level.

[0127] The knowledge dimension integrates quantitative values, knowledge levels, and anomaly judgment results and anomaly degrees of various data indicators, and is uniformly bound and archived with the target object's identity identifier, knowledge dimension identifier, and knowledge evaluation timestamp to form a complete quantitative evaluation result of the knowledge dimension.

[0128] This embodiment introduces a cross-modal fusion multi-source heterogeneous big data processing and knowledge extraction system, including a structured management module, a big data processing module, a dynamic knowledge extraction module, and a knowledge fusion evaluation module;

[0129] The structured management module assigns globally unique identifiers to target objects to be processed, constructs a hierarchical knowledge dimension system matching the domain of the target object, and includes at least one knowledge dimension. Each knowledge dimension has a predefined set of associated data indicators. It receives multi-source heterogeneous big data bound to the target object's identifier in real time. This multi-source heterogeneous big data includes, but is not limited to, numerical data, signal data, image data, and text data. It synchronously retains the collection timestamp, collection source identifier, and associated data indicator identifier for each piece of multi-source heterogeneous big data. Based on the data indicators of the multi-source heterogeneous big data, it associates it with the corresponding knowledge dimension. Using the target object's identifier as an index, the knowledge dimension as the primary classification dimension, and the data indicator as the sub-classification dimension, it performs structured archiving and storage of all associated multi-source heterogeneous big data, constructing a hierarchical knowledge dimension system multi-source heterogeneous structured database. This provides complete and traceable data support for subsequent data analysis, feature extraction, and knowledge quantification evaluation.

[0130] The big data processing module is used to process multi-source heterogeneous big data from multi-source heterogeneous structured databases, which belong to different knowledge dimensions, data indicators, and modal types. According to the modal type corresponding to each data indicator, it employs standardized quantification methods adapted to the modal type, combined with homogenization processing, to transform multi-source heterogeneous big data into primary standardized indicator values ​​through cross-modal differentiated quantification and homogenization processing. It then retrieves a pre-constructed group baseline database matching the target object's domain. Based on attribute tags bound to the target object's identity, it uses vector similarity calculation to calculate the tag similarity between the target object's attribute tags and the tags of each group cluster in the group baseline database. It then filters out target matching groups whose tag similarity reaches a preset tag similarity threshold. This preset tag similarity threshold is a critical value used to determine the degree of attribute matching between the target object and the group cluster, representing the minimum acceptable similarity in attribute features. The value range is typically [0.7, 0.9], set based on the clustering accuracy of the group baseline database, the monitoring needs of the target object's domain, and the balance between matching group coverage and matching accuracy. The general baseline range of all data indicators corresponding to the target matching group is extracted and integrated to form an initial baseline reference range adapted to the target object's group. The comprehensive deviation between the target object's primary standardized indicator value and the steady-state standardized indicator value of the target matching group is quantified. Specifically, the absolute deviation of all single indicators is summed according to the preset correlation weight of each indicator to obtain the comprehensive deviation and calculate the offset. The initial baseline reference range is then corrected to form a personalized baseline model uniquely bound to the target object's identity. The personalized baseline model contains a personalized standard range corresponding to each data indicator. Using the personalized standard range corresponding to each data indicator in the personalized baseline model as the normalization benchmark, the primary standardized indicator values ​​are subjected to a secondary standardization process of interval mapping. The primary standardized indicator values ​​are uniformly mapped to a preset unified numerical range to obtain standardized indicator values ​​that eliminate differences in dimensions, magnitudes, and indicator attributes. This allows the quantitative results of different data indicators under the same knowledge dimension to be directly weighted, fused, and compared for evaluation, providing a unified, comparable, and standardized input adapted to the individual characteristics of the target object for subsequent cross-modal data fusion and knowledge dimension quantification.

[0131] The dynamic knowledge extraction module, upon triggering knowledge extraction, filters the standardized indicator values ​​obtained from secondary standardization based on the pre-configured time-series effective lifecycle of each data indicator. It removes standardized indicator values ​​that have timed out, are incomplete, or deviate abnormally, retaining only valid standardized indicator values ​​and sorting them according to their collection timestamps to construct a sequence of valid standardized indicator values ​​for each data indicator. The module distinguishes indicator types based on their corresponding indicator change type identifiers. The indicator change type characterizes the time dimension of the data indicator's reflection of the target object's state, including instantaneous and trend-based indicators. Instantaneous indicators reflect the immediate state of the target object at the time of collection, while trend-based indicators reflect the state change patterns and development trends of the target object over a continuous time period. For instantaneous indicators, the module selects the collection time and the current knowledge... The most recently evaluated frame's effective standardized index value is used as the knowledge representation index value for the current evaluation. For trend-based indicators, a temporal decay function is constructed based on the time interval between the acquisition time and the current knowledge evaluation time. Dynamic temporal weights are assigned to the effective standardized index values ​​at different acquisition times. Based on these dynamic temporal weights, the effective standardized index values ​​from multiple frames are weighted and fused to obtain the knowledge representation index value of the corresponding data indicator within the current evaluation period. Using the target object's identity as an index, and with knowledge dimensions as the primary classification dimension and data indicators as the sub-classification dimensions, all knowledge representation index values ​​that have undergone validity screening, temporal determination, and dynamic weighted fusion are structured and archived to form a dynamic knowledge index set for each knowledge dimension used in the current knowledge evaluation. This provides a temporally reasonable, weight-adapted, and highly reliable index input foundation for subsequent cross-modal index fusion and knowledge dimension quantification.

[0132] The knowledge fusion evaluation module is used to extract the knowledge representation index values ​​and preset association weights of all data indicators under the corresponding knowledge dimension for a completed dynamic knowledge index set. Based on the linear mapping relationship between the personalized baseline model corresponding to the target object and the secondary standardization process, the personalized standard intervals corresponding to each data indicator are synchronously mapped to the preset unified numerical interval, generating personalized judgment intervals for each data indicator under the unified numerical interval. Based on the personalized judgment intervals, the knowledge representation index values ​​of each data indicator in the dynamic knowledge index set are anomaly judged to clarify the abnormal state and degree of anomaly of individual data indicators. Using the preset association weights corresponding to each data indicator as the core fusion basis, all knowledge representation index values ​​under the knowledge dimension are weighted and summed to obtain the initial knowledge dimension fusion quantification value. Combining the number and degree of anomalies and their corresponding preset association weights, a fusion correction coefficient is constructed to adjust the initial knowledge dimension fusion quantity. The quantified values ​​are corrected and calibrated to generate the final fusion quantified values ​​of the knowledge dimensions corresponding to the knowledge dimensions. Based on the pre-set knowledge dimension grading standards of the target object's domain, the final fusion quantified values ​​of the knowledge dimensions are classified into the corresponding knowledge levels. At the same time, the deviation between the final fusion quantified values ​​of the knowledge dimensions and the steady-state benchmark values ​​within a unified numerical range is calculated to clarify the degree of deviation. The fusion quantified values ​​of the knowledge dimensions, knowledge levels, deviations, and the anomaly judgment results and anomaly degrees of each data indicator in the corresponding dynamic knowledge indicator set for each knowledge dimension are bound and archived one by one with the unique identity of the target object, the knowledge dimension identifier, and the knowledge evaluation timestamp. This forms the independent quantitative evaluation results of each knowledge dimension of the target object, providing accurate quantitative basis for subsequent multi-dimensional knowledge positioning and anomaly tracing. It also solves the technical problems of the one-sidedness of single modality and single indicator evaluation and the inability of multi-source indicators to be comprehensively measured in the same dimension, thereby improving the comprehensiveness and accuracy of knowledge evaluation.

[0133] Working principle and its effects:

[0134] This invention, based on the core design concept of cross-modal fusion, addresses industry pain points such as the difficulty of fusion of multi-source heterogeneous big data, the one-sided dimensions of knowledge representation, and insufficient individual adaptability. It constructs a closed-loop technical system covering the entire process from data structured management, two-stage standardization processing, dynamic indicator screening to multi-dimensional knowledge fusion evaluation. By deeply coupling the data standardization processing logic with individual characteristic adaptation, time-series characteristic differentiated control, and multi-indicator fusion calibration, it opens up the value transformation link of multi-source heterogeneous big data from the entire process of data foundation, standardization benchmark, time-series adaptation, and knowledge fusion. The core achievement is to comprehensively, accurately, and individually adaptable knowledge extraction and knowledge level classification for target objects.

[0135] This invention first assigns a globally unique identifier to the target to be processed, constructs a hierarchical knowledge dimension system and a set of associated indicators, and uses the identifier as an index to complete the structured archiving of multi-source heterogeneous big data, achieving accurate association and traceable management of multimodal data, solving the problem of scattered and difficult collaboration of multi-source data, and laying a solid foundation for big data processing. Based on this, this invention configures adapted standardized quantification methods for different modalities of data, combining differentiated and unidirectional processing with unified indicator knowledge representation logic, breaking down cross-modal fusion barriers and improving the utilization rate of multi-dimensional data. Simultaneously, this invention constructs a personalized baseline model adapted to individual characteristics by matching target attribute labels to a general baseline of the group, combined with its own steady-state data correction, and completes secondary standardization based on this baseline. This achieves comparable conversion of indicators within the same value range and solves the problem of mismatch between general standards and individual characteristics, reducing the risk of misjudgment. When knowledge extraction is triggered, this invention filters valid data based on a preset time series effective life cycle, performs differentiated time series processing on instantaneous and trend-type indicators, constructs a dynamic knowledge indicator set, takes into account the timeliness and continuity of knowledge extraction, and ensures data reliability; finally, it completes the weighted fusion of multiple indicators with preset correlation weights, constructs a monotonically decreasing correction coefficient based on abnormal features to calibrate the knowledge evaluation value, generates a knowledge dimension fusion quantification result and completes the knowledge level classification, solves the problem of the one-sidedness of knowledge representation by a single indicator, and improves the accuracy of knowledge extraction.

[0136] In summary, this invention achieves intelligent management and control of multi-source heterogeneous big data throughout the entire process from collection, processing, screening to knowledge fusion and evaluation. It not only fully explores the collaborative knowledge value of multi-dimensional and multi-modal big data, but also takes into account the individual characteristics of different target objects and the temporal characteristics of different monitoring indicators. It effectively solves the core pain point in existing technologies where multi-source heterogeneous big data cannot support comprehensive, accurate, and individualized knowledge extraction and quantitative evaluation. It can be widely adapted to the needs of refined big data management and knowledge application in many fields such as health management, industrial equipment operation and maintenance, and agricultural planting monitoring.

[0137] The above description is merely a preferred embodiment of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should also be considered within the scope of protection of the present invention.

Claims

1. A multi-source heterogeneous big data processing and knowledge extraction method based on cross-modal fusion, characterized in that, include: Assign identity identifiers to target objects to be processed, construct a hierarchical knowledge dimension system for the target objects, and predefine a set of data indicators associated with the knowledge dimensions; receive multi-source heterogeneous big data bound to the identity identifiers of the target objects in real time, and construct a multi-source heterogeneous structured database of the hierarchical knowledge dimension system of the target objects; Based on the modality type corresponding to each data indicator in the multi-source heterogeneous structured database, cross-modal standardization processing is performed on the multi-source heterogeneous big data to convert it into primary standardized indicator values. The general baseline range of each data indicator is matched to construct the personalized baseline model of the target object. The primary standardized indicator values ​​are then subjected to interval mapping secondary standardization processing to obtain the standardized indicator values ​​of each data indicator. In response to the knowledge extraction trigger, the standardized indicator values ​​of the associated data indicators are filtered for time-series validity, a sequence of valid standardized indicator values ​​is constructed, and differentiated time-series processing is performed according to the indicator change type identifier to generate knowledge representation indicator values ​​corresponding to each data indicator, and a dynamic knowledge indicator set for the current knowledge extraction is constructed. Based on the personalized baseline model, anomaly detection is performed on the knowledge representation index values ​​in the dynamic knowledge index set. All knowledge representation index values ​​under the knowledge dimension are weighted, fused, and corrected to generate the knowledge dimension fusion quantification value corresponding to the knowledge dimension and classify the knowledge level.

2. The method as claimed in claim 1, wherein, The steps for cross-modal standardization of multi-source heterogeneous big data and its transformation into primary standardized index values ​​include: A hierarchical retrieval method is used to retrieve all multi-source heterogeneous structured databases belonging to different knowledge dimensions and data indicators, and to classify them by modality type; the modality types include numerical, signal, image, and text types. For multi-source heterogeneous big data of different modalities, a modality standardization processing framework is built, and a standardized quantization method is configured for each modality type to perform cross-modal differentiated quantization on multi-source heterogeneous big data of different modalities. The standardized quantization method includes numerical truncation, signal feature extraction, image feature level quantization, and text semantic scoring. Define the numerical representation attributes of each data indicator, whereby the numerical representation attributes refer to the correlation between the quantified value of the data indicator and the quality of the corresponding knowledge dimension. Based on the numerical representation attributes of each data indicator, all multi-source heterogeneous big data of various modal types that have completed cross-modal differentiated quantification are processed in the same direction to obtain the primary standardized indicator values ​​of each multi-source heterogeneous big data, and then bound and archived with the identity identifier and data indicator identifier of the corresponding target object.

3. The method as claimed in claim 1, wherein, The steps for constructing a personalized baseline model of the target object include: A pre-constructed population baseline database that matches the domain of the target object is retrieved. The population baseline database is divided into population clusters according to different attribute label combinations. The population clusters are configured with general baseline ranges for each data indicator and population steady-state standardized index values. Based on the attribute tags bound to the identity of the target object, an attribute tag vectorization mapping method is adopted to transform the various attribute tags of the target object into attribute feature vectors of a unified dimension. The similarity calculation is performed between the attribute feature vector corresponding to the target object and the attribute feature vector of the attribute label combination of each group cluster. The group cluster with the highest label similarity is selected as the target matching group. The general baseline interval corresponding to the target matching group is extracted to form the initial baseline reference range. Retrieve the set of primary standardized index values ​​for the target object and filter out the reference standardized index values ​​that fall within the initial baseline reference range; Based on the comprehensive deviation between the reference standardized index value and the steady-state standardized index value of the target matching group, the initial baseline reference range is adjusted to form a personalized standard range for the target object. The personalized standard intervals corresponding to each data indicator are integrated and encapsulated to form a personalized baseline model for the target object.

4. The method for cross-modal fusion multi-source heterogeneous big data processing and knowledge extraction as described in claim 1, characterized in that, The steps for performing interval-mapping secondary standardization on the primary standardized index values ​​to obtain the standardized index values ​​of each data index include: Retrieve the personalized baseline model corresponding to the identity of the target object, and extract the personalized standard intervals corresponding to each data indicator as the normalization benchmark; For each data metric, determine the lower and upper limits of its personalized standard range, and set a uniform value range applicable to all data metrics. The primary standardized indicator value corresponding to each data indicator is transformed into a preset unified numerical range by using its personalized standard range as a reference through linear mapping; after standardization and normalization, a standardized indicator value is formed. The standardized indicator values ​​are archived by associating them with the target object's identity identifier, knowledge dimension identifier, data indicator identifier, and collection timestamp.

5. The method for cross-modal fusion of multi-source heterogeneous big data processing and knowledge extraction as described in claim 1, characterized in that, The steps for constructing a dynamic set of knowledge metrics for current knowledge assessment include: Based on the set of data metrics associated with the knowledge dimensions that trigger knowledge extraction, locate the corresponding data metrics and their standardized values. For each data metric, based on the pre-configured effective lifecycle of the time series, and combined with the collection timestamp and the current knowledge evaluation timestamp, effective standardized metric values ​​are selected; The effective lifespan of the time series is the pre-set effective collection time range for each data indicator. The effective standardized indicator values ​​are sorted according to the collection timestamp to construct a time series sequence of the effective standardized indicator values ​​corresponding to each data indicator. Based on the pre-configured indicator change type identifiers for each data indicator, differentiated time-series processing is performed to generate knowledge representation indicator values ​​corresponding to each data indicator; and these values ​​are then archived to form a dynamic knowledge indicator set.

6. The method for cross-modal fusion of multi-source heterogeneous big data processing and knowledge extraction as described in claim 5, characterized in that, The types of indicator changes include instantaneous indicators and trend indicators; The step of performing differentiated time-series processing based on the pre-configured indicator change type identifiers for each data indicator includes: For the instantaneous representation index, the effective standardized index value that is closest to the collection timestamp and the current knowledge evaluation timestamp in the effective standardized index value time series is selected as the representation index value; For the aforementioned trend representation index, dynamic temporal weights are allocated through a temporal decay function based on the time interval between the collection timestamp of the effective standardized index value and the current knowledge evaluation timestamp. The time series sequence of the effective standardized index value is then weighted and fused based on the dynamic temporal weights to obtain the knowledge representation index value.

7. The method for cross-modal fusion multi-source heterogeneous big data processing and knowledge extraction as described in claim 1, characterized in that, The steps of generating knowledge dimension fusion quantification values ​​corresponding to knowledge dimensions and classifying knowledge levels include: Extract all knowledge representation index values ​​and corresponding preset association weights from the dynamic knowledge index set. The preset association weights are quantitative coefficients that represent the importance of each data index to the state representation of the corresponding knowledge dimension. Retrieve the personalized baseline model corresponding to the target object, extract the personalized standard intervals corresponding to each data indicator, and use the linear mapping formula consistent with the secondary standardization process to synchronously map the personalized standard intervals to the preset unified numerical intervals to generate personalized judgment intervals corresponding to each data indicator. Based on the personalized judgment range of each data indicator, anomaly judgment is made on the knowledge representation indicator values, and abnormal indicators are identified and their degree of abnormality is calculated. Using the preset association weights of each data indicator within the dynamic knowledge indicator set as the fusion coefficients, the values ​​of all knowledge representation indicators are weighted and summed to obtain the initial knowledge dimension fusion quantification value. Based on the number of abnormal indicators, the degree of abnormality, and their corresponding preset association weights, a fusion correction coefficient is constructed to correct and calibrate the initial knowledge dimension fusion quantification value, thereby obtaining the knowledge dimension fusion quantification value; and based on the knowledge dimension grading standard, the knowledge dimension fusion quantification value is mapped to the corresponding knowledge level.

8. The method for cross-modal fusion of multi-source heterogeneous big data processing and knowledge extraction as described in claim 7, characterized in that, The fusion correction coefficient is constructed as follows: The number of abnormal indicators is counted, the sum of the preset correlation weights of all abnormal indicators is calculated, and the average degree of abnormality of all abnormal indicators is calculated. Using the proportion of abnormal weights and the average degree of abnormality as core constraint factors, a construction logic that monotonically decreases as the core constraint factors increase is used to generate a fusion correction coefficient. When there are no abnormal indicators, the fusion correction coefficient is set to 1, and no decay correction is applied to the initial knowledge dimension fusion quantification value.

9. The method for cross-modal fusion of multi-source heterogeneous big data processing and knowledge extraction as described in claim 1, characterized in that, The hierarchical knowledge dimension system includes at least one knowledge dimension, and each knowledge dimension has a predefined set of associated data indicators that includes at least one data indicator. The multi-source heterogeneous structured database uses the identity identifier of the target object as the index, knowledge dimension as the main classification dimension, and data indicators as the sub-classification dimension. It performs structured archiving and storage according to the data indicator type to which the multi-source heterogeneous big data belongs.

10. A cross-modal fusion multi-source heterogeneous big data processing and knowledge extraction system, implemented based on the cross-modal fusion multi-source heterogeneous big data processing and knowledge extraction method according to any one of claims 1-9, characterized in that, It includes a structured management module, a big data processing module, a dynamic knowledge extraction module, and a knowledge fusion and evaluation module; The structured management module is used to assign identity identifiers to target objects to be processed, construct a hierarchical knowledge dimension system for the target objects, and predefine a set of data indicators associated with the knowledge dimensions; it also receives multi-source heterogeneous big data bound to the identity identifiers of the target objects in real time, and constructs a multi-source heterogeneous structured database of the hierarchical knowledge dimension system of the target objects. The big data processing module is used to perform cross-modal standardization processing on multi-source heterogeneous big data according to the modal type corresponding to each data indicator in the multi-source heterogeneous structured database, convert it into primary standardized indicator values, match the general baseline range of each data indicator, construct the personalized baseline model of the target object, and perform interval mapping secondary standardization processing on the primary standardized indicator values ​​to obtain the standardized indicator values ​​of each data indicator. The dynamic knowledge extraction module is used to respond to the knowledge extraction trigger by performing time-series validity screening on the standardized indicator values ​​of related data indicators, constructing a sequence of valid standardized indicator values, and performing differentiated time-series processing according to the indicator change type identifier to generate knowledge representation indicator values ​​corresponding to each data indicator, and constructing a dynamic knowledge indicator set for the current knowledge extraction. The knowledge fusion evaluation module is used to determine anomalies in the knowledge representation index values ​​in the dynamic knowledge index set based on the personalized baseline model, perform weighted fusion and correction calibration on all knowledge representation index values ​​under the knowledge dimension, generate the knowledge dimension fusion quantitative value corresponding to the knowledge dimension, and classify the knowledge level.