Production data fusion method, system and electronic device

CN122241532APending Publication Date: 2026-06-19CISDI INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CISDI INFORMATION TECH CO LTD
Filing Date
2026-04-17
Publication Date
2026-06-19

Smart Images

  • Figure CN122241532A_ABST
    Figure CN122241532A_ABST
Patent Text Reader

Abstract

This application relates to the field of production planning technology and provides a production data fusion method, system, and electronic device. This application clusters raw data from different data sources based on feature similarity, then normalizes the raw data within the same similar data cluster to generate standardized intermediate data. Next, it semantically fuses the intermediate data related to a single production theme according to indicators such as production time or production scenario, thereby obtaining production datasets corresponding to each production theme. This not only automatically identifies and integrates data with similar content but different formats or descriptions through clustering and normalization steps, ensuring the standardization of intermediate data, but also ensures that the production dataset is built around specific business needs through targeted semantic fusion based on production themes, avoiding the mixing of irrelevant data. This results in a high-value thematic dataset without data redundancy, improving data quality in two ways.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of production planning technology, specifically to a production data fusion method, system, and electronic equipment. Background Technology

[0002] In the current construction of smart mines, multiple independent information systems such as scheduling management, safety monitoring, and equipment management are usually deployed. During the operation of these systems, a large amount of structured, semi-structured, and unstructured data is generated. Due to the lack of unified standards and governance mechanisms, problems such as naming differences, redundant records, and low quality exist among the data, forming "information silos" that seriously restrict data sharing, integration, and value mining.

[0003] Currently, in the data fusion process, if the underlying raw data itself has quality problems such as naming conflicts, inconsistent units, and redundant records, these low-quality data will be directly brought into the fusion process, resulting in a large number of inconsistencies and errors. The fusion quality is poor and the credibility is low, making it difficult for users in different positions (such as safety officers, equipment engineers, and dispatchers) to directly use the fused data. They still need to spend effort to screen and build highly cohesive thematic datasets for specific business topics such as safety incidents, energy consumption analysis, or production efficiency. This indicates that the existing fused data has poor data quality and cannot directly meet the actual production application. Summary of the Invention

[0004] To provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is given below. This summary is not intended as a general commentary, nor is it intended to identify key / important components or describe the scope of protection of these embodiments, but rather as a prelude to the detailed description that follows.

[0005] In view of the shortcomings of the prior art described above, this application provides a production data fusion method, system and electronic device to improve the data quality of fused data.

[0006] This application provides a production data fusion method, comprising: acquiring raw data corresponding to multiple target data sources; clustering the raw data according to the feature similarity between them to obtain multiple similar data clusters, and normalizing the raw data in the same similar data cluster to obtain intermediate data; and performing semantic fusion on the intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, wherein the production indicators include production time and / or production scenario.

[0007] In one embodiment of this application, obtaining raw data corresponding to multiple target data sources includes: if the target data source includes a device data source, the raw data corresponding to the device data source includes at least one of device identifier, runtime, device current, and device load rate; if the target data source includes a security data source, the raw data corresponding to the security data source includes at least one of sensor identifier, gas concentration value, and temperature sampling value; if the target data source includes a personnel data source, the raw data corresponding to the personnel data source includes at least one of personnel identifier, location coordinates, and location timestamp; if the target data source includes a monitoring data source, the raw data corresponding to the monitoring data source includes at least one of camera identifier, image stream address, and image timestamp.

[0008] In one embodiment of this application, the method further includes: standardizing each of the original data using ETL (Extract, Transform, Load) methods, wherein the standardization process includes at least one of data cleaning, data format unification, data unit conversion, and data integration.

[0009] In one embodiment of this application, the method further includes at least one of the following: determining an integrity score based on the proportion of null data in the original data, wherein the proportion of null data in the original data is negatively correlated with the integrity score; determining an accuracy score based on the proportion of abnormal data in the original data, wherein the abnormal data includes original data located within a preset abnormal range; and determining a consistency score based on the proportion of differing data in the original data, wherein the differing data includes original data that is different from preset benchmark data.

[0010] In one embodiment of this application, the method further includes: pre-training a model based on a random forest model to obtain an anomaly data classification model, wherein the random forest model includes multiple decision tree sub-models; inputting the original data into the anomaly data classification model to determine whether the original data is anomaly data based on the output results of each decision tree sub-model.

[0011] In one embodiment of this application, semantic fusion is performed on intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, including at least one of the following: semantic fusion is performed on intermediate data corresponding to a target time to obtain a production dataset, wherein the target time includes production time within a preset time window; semantic fusion is performed on intermediate data corresponding to a target location to obtain a production dataset, wherein the target location includes coordinate positions within a preset production scene.

[0012] In one embodiment of this application, after semantic fusion of intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, the method further includes: establishing theme relationships between each production theme, wherein the theme relationships include spatial relationships and temporal relationships; establishing a spatiotemporal knowledge graph between the production datasets according to the theme relationships, and training a preset graph neural network model based on the spatiotemporal knowledge graph to obtain a risk inference model; and in response to an alarm event, using the risk inference model to determine the risk propagation path corresponding to the alarm event from the risk inference model.

[0013] In one embodiment of this application, after semantically fusing intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, the method further includes: constructing a production scenario model based on the production scenario, wherein the production scenario model includes equipment operating parameters corresponding to the production scenario; establishing a correspondence between the equipment operating parameters and the production dataset based on the production scenario; and using a preset operating data prediction model to predict the corresponding equipment operating parameters based on the production dataset at the current time node to obtain the expected operating parameters corresponding to the future time node, wherein the current time node is located before the future time node.

[0014] This application also provides a production data fusion system, comprising: an acquisition module for acquiring raw data corresponding to multiple target data sources; a merging module for clustering the raw data according to the feature similarity between them to obtain multiple similar data clusters, and normalizing the raw data in the same similar data cluster to obtain intermediate data; and a semantic fusion module for performing semantic fusion on the intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, wherein the production indicators include production time and / or production scenario.

[0015] This application also provides an electronic device, including: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory to cause the electronic device to perform the method described above.

[0016] The beneficial effects of this application are:

[0017] Raw data from different data sources is clustered using feature similarity. Then, the raw data within the same similar cluster is normalized to generate standardized intermediate data. Next, semantic fusion is performed on intermediate data related to a single production theme, based on indicators such as production time or production scenario, resulting in production datasets corresponding to each production theme. This approach not only automatically identifies and integrates data with similar content but different formats or descriptions through clustering and normalization, effectively solving fundamental quality issues at the raw data level such as naming conflicts, inconsistent units, and record redundancy, ensuring the standardization of intermediate data, but also ensures that production datasets are built around specific business needs through targeted semantic fusion based on production themes, avoiding the mixing of irrelevant data. This results in high-value thematic datasets free of data redundancy, improving data quality in two ways. Attached Figure Description

[0018] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are merely some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.

[0019] In the attached diagram: Figure 1 This is a flowchart illustrating a production data fusion method in an embodiment of this application; Figure 2 This is a flowchart illustrating another production data fusion method in an embodiment of this application; Figure 3 This is a schematic diagram of the structure of a production data fusion system according to an embodiment of this application; Figure 4 This is a schematic diagram of the structure of an electronic device in an embodiment of this application. Detailed Implementation

[0020] The following specific examples illustrate the implementation of this application. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. This application can also be implemented or applied through other different specific embodiments. Various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this application. In the absence of conflict, the following embodiments and features in the embodiments can be combined with each other.

[0021] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of this application. The drawings only show the components related to this application and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0022] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the present application. However, it will be apparent to those skilled in the art that embodiments of the present application may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the present application.

[0023] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate for the embodiments of this application described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion.

[0024] Unless otherwise stated, the term "multiple" means two or more.

[0025] In this application, the character " / " indicates that the objects before and after it are in an "or" relationship. For example, A / B means: A or B.

[0026] The term "and / or" describes an association between objects, indicating that three relationships can exist. For example, A and / or B means: A or B, or A and B.

[0027] Combination Figure 1 As shown, this application provides a production data fusion method, including: Step S101: Obtain the raw data corresponding to multiple target data sources; Step S102: Cluster the original data based on the feature similarity between them to obtain multiple similar data clusters. Then, normalize the original data in the same similar data cluster to obtain intermediate data. Step S103: Semantic fusion is performed on the intermediate data corresponding to the same production theme according to the production indicators to obtain the production dataset corresponding to each production theme. Among them, production indicators include production time and / or production scenario.

[0028] The production data fusion method provided in this application clusters raw data from different data sources based on feature similarity. Then, it normalizes the raw data within the same similar data cluster to generate standardized intermediate data. Finally, it semantically fuses the intermediate data related to a single production theme according to indicators such as production time or production scenario, thus obtaining production datasets corresponding to each production theme. This not only automatically identifies and integrates data with similar content but different formats or descriptions through clustering and normalization steps, effectively solving fundamental quality problems at the raw data level such as naming conflicts, inconsistent units, and record redundancy, ensuring the standardization of intermediate data, but also ensures that the production datasets are built around specific business needs through targeted semantic fusion based on production themes, avoiding the mixing of irrelevant data. This results in high-value thematic datasets without data redundancy, improving data quality in two ways.

[0029] Optionally, the original data corresponding to multiple target data sources can be obtained, including: if the target data source includes a device data source, the original data corresponding to the device data source includes at least one of device identifier, runtime, device current, and device load rate; if the target data source includes a security data source, the original data corresponding to the security data source includes at least one of sensor identifier, gas concentration value, and temperature sampling value; if the target data source includes a personnel data source, the original data corresponding to the personnel data source includes at least one of personnel identifier, location coordinates, and location timestamp; if the target data source includes a monitoring data source, the original data corresponding to the monitoring data source includes at least one of camera identifier, image stream address, and image timestamp.

[0030] In some embodiments, multi-source heterogeneous data from various business systems in mine production are collected. The data includes structured, semi-structured and unstructured data, and the data sources include, but are not limited to, equipment operation data, safety monitoring data, personnel positioning data, video surveillance data, environmental monitoring data, and transport vehicle data. A metadata model is established for each data source to record attributes such as field name, data type, collection period, system to which it belongs and security level, so as to achieve unified description and traceable management of the data sources.

[0031] In some embodiments, the collected raw data is shown in Table 1, including: equipment operation data, which is structured data, with a collection period of 1 minute, and raw data fields including equipment ID, running time (min), current (A), load rate (%), etc.; safety monitoring data, which is structured data, with a collection period of 10 seconds, and raw data fields including sensor ID, gas concentration (ppm, parts per million), temperature (°C); personnel positioning data, which is semi-structured data, with a collection period of 5 seconds, and raw data fields including personnel ID, positioning coordinates (X,Y), and timestamp; and video surveillance data, which is unstructured data, with a collection period of 1 second, and raw data fields including camera ID, video stream address, and timestamp.

[0032] Table 1

[0033] Optionally, the method further includes: standardizing the original data using ETL, wherein the standardization process includes at least one of data cleaning, data format unification, data unit conversion, and data integration.

[0034] In some embodiments, an ETL process is used to unify heterogeneous data. This involves establishing a field lookup table to map synonymous fields in different systems to a unified name, performing unit conversions for indicators such as temperature, pressure, and voltage to unify them to a standard unit system, adopting the ISO (International Organization for Standardization) 8601 standard to standardize the time format to resolve time zone differences and format inconsistencies, and using an enumerated value dictionary to unify coding rules for personnel roles, equipment types, process codes, etc., thereby outputting a standardized data table with consistent structure and unified semantics.

[0035] In some embodiments, unit conversion includes unifying indicators such as temperature, pressure, and voltage into a standard unit system.

[0036] In some embodiments, the original data is shown in Table 2, the standardization rules are shown in Table 3, and the original data is standardized according to the standardization rules to obtain the standardized original data as shown in Table 4.

[0037] Table 2

[0038] Table 3

[0039] Table 4

[0040] Optionally, the method further includes: determining an integrity score based on the proportion of null data in the original data, wherein the ratio of null data in the original data is negatively correlated with the integrity score; determining an accuracy score based on the proportion of outlier data in the original data, wherein outlier data includes original data located within a preset outlier range; and determining a consistency score based on the proportion of differential data in the original data, wherein differential data includes original data that differs from a preset baseline data.

[0041] In some embodiments, a data quality assessment engine is built based on standardized data to perform verification from dimensions such as completeness, accuracy, and consistency. Incompleteness is used to detect the proportion of missing and null values ​​in fields, accuracy is used to identify physical errors (e.g., the temperature parameter returned by the temperature sensor is -273°C, which is below absolute zero) and logical errors (e.g., the runtime is less than 0 seconds, the load rate is greater than 100%, etc.), and consistency is used to identify the same object under different descriptions.

[0042] In some embodiments, the total number of original data entries is 5000, and the number of null data entries is 250. The integrity score is then expressed as: Formula (1) In formula (1), Score for completeness.

[0043] In some embodiments, the total number of original data entries is 5000, and the number of outlier data entries is 400. The accuracy score is then expressed as: Formula (2) In formula (2), A score is given for accuracy.

[0044] In some embodiments, the total number of original data entries is 5000, and the number of discrepancy data entries is 300. The consistency score is then expressed as: Formula (3) In formula (3), A consistency score is given.

[0045] In some embodiments, the overall quality score of the raw data determined based on the integrity score, accuracy score, and consistency score is expressed as follows: Formula (4) In formula (4), , and These are the score weights corresponding to the completeness score, accuracy score, and consistency score, respectively.

[0046] In some embodiments, the overall quality score is calculated as follows: Q = 0.4 × 0.95 + 0.35 × 0.92 + 0.25 × 0.94 = 0.937.

[0047] Optionally, the method further includes: pre-training the model based on the random forest model to obtain an anomaly data classification model, wherein the random forest model includes multiple decision tree sub-models; inputting the original data into the anomaly data classification model to determine whether the original data is anomaly data based on the output results of each decision tree sub-model.

[0048] In some embodiments, data quality assessment includes integrity (detection of missing fields and null value ratios), accuracy (detection of physical and logical errors), and consistency (identification of the same object under different descriptions).

[0049] In some embodiments, a supervised classification model based on Random Forest is introduced for anomaly detection. Historical equipment operation data is used as the training set to extract features such as runtime, current fluctuation, and load rate. The anomaly probability is output through voting by multiple decision trees. When the probability is higher than a threshold, it is marked as an anomaly and submitted to manual review. The model supports periodic retraining to achieve dynamic optimization by combining the latest feedback samples.

[0050] In some embodiments, equipment operation data from the past 6 months is collected to obtain 20,000 historical data entries. Data features such as runtime, current fluctuation coefficient, load rate, and temperature change rate are extracted from the historical data to obtain a model training set. A random forest classification model is trained based on the model training set to obtain an anomaly data classification model, wherein the random forest classification model includes N=100 decision trees. If the collected raw data includes "Equipment ID: E012, runtime: 8 min, current fluctuation coefficient: 0.35, load rate: 120%, temperature change rate: 6.5℃ / min", this raw data is input into the anomaly data classification model. If 89 out of the 100 decision trees determine that the raw data is anomaly data, then the anomaly probability of the raw data is expressed as: Formula (5) = i(x) = 0.89 In formula (5), Let be the probability of anomaly in the original data, and hi(x) be the prediction result of the i-th decision tree. Since the probability of anomaly is 0.89, which is greater than the probability threshold of 0.8, the original data is marked as anomalous data.

[0051] In some embodiments, the abnormal data is corrected, and the abnormal data and the corrected data are added to the model training set to optimize the abnormal data classification model through model training.

[0052] In some embodiments, clustering is performed based on the feature similarity between the original data to obtain multiple similar data clusters, and the original data in the same similar data cluster is normalized to obtain intermediate data, including: establishing a unified master data management model, performing unique coding management on core entities such as personnel, equipment, and materials, and when naming differences or field conflicts are detected, using the K-Means clustering algorithm to calculate the similarity of multi-field feature vectors, automatically merging records with similarity higher than a preset threshold, and retaining merge logs and traceability information.

[0053] In some embodiments, feature similarity is calculated based on the original data after feature vectorization. Feature vectorization includes text similarity encoding of text fields and normalization of numerical fields to finally generate multidimensional feature vectors. Feature similarity includes, but is not limited to, cosine similarity, weighted similarity, or other vector similarity calculation methods.

[0054] In some embodiments, a portion of the raw data collected is shown in Table 5, where there are differences in job title, number, and department name, requiring determination of whether they belong to the same person.

[0055] Table 5

[0056] In some embodiments, the text fields (name, position, department) are encoded using BERT pre-trained word vectors, and the numbers are normalized to obtain feature vectors. These feature vectors include "Original Data 1 (0.92, 0.88, 0.95)", "Original Data 2 (0.91, 0.87, 0.94)", and "Original Data 3 (0.90, 0.85, 0.96)". The feature similarity between the feature vectors is calculated using the cosine similarity formula, where the cosine similarity formula is expressed as: Formula (6) And the original data that are similar to the centroid vectors of similar data clusters are determined using a clustering objective function, where the clustering objective function represents the... In the partitioning of clusters, the clustering result is optimized by minimizing the cosine distance between the sample vector and its cluster center vector, as follows: Formula (7) In formulas (6) and (7), K is the number of clusters. Let i be the feature vector of the i-th original data. Let j be the sample set of the j-th similar data cluster. Let be the center vector of the j-th similar data cluster. The cosine similarity between the original data and the centers of similar data clusters is denoted as [0,1], with a larger value indicating greater similarity. Based on the cosine similarity formula, the cosine similarity between original data 1 and original data 2 is calculated, and the results include:

[0057] In some embodiments, the K-Means clustering algorithm is used to classify the original data 1, original data 2 and original data 3 into the same similar data cluster. Through normalization, the record number of the original data in the same similar data cluster is unified as P001, the position is unified as safety officer, and the department is unified as safety supervision department.

[0058] Optionally, semantic fusion is performed on intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, including at least one of the following: semantic fusion is performed on intermediate data corresponding to the target time to obtain production datasets, wherein the target time includes production time within a preset time window; semantic fusion is performed on intermediate data corresponding to the target location to obtain production datasets, wherein the target location includes coordinate positions within a preset production scenario.

[0059] In some embodiments, multi-source data such as scheduling systems, security monitoring systems, and equipment platforms are semantically fused according to business themes to generate fused datasets including themes such as security events, energy consumption analysis, and production efficiency, and can be called in BI (Business Intelligence) platforms, large-screen visualization, and AI (Artificial Intelligence) model training.

[0060] In some embodiments, a portion of the intermediate data is shown in Table 6.

[0061] Table 6

[0062] In some embodiments, by matching multiple system events within the same time window (e.g., 1 minute), determining whether the events belong to the same scenario based on location coordinates or device binding relationships, and merging multiple events into a single comprehensive record, the resulting production dataset is shown in Table 7.

[0063] Table 7

[0064] In some embodiments, BI analysis is used to display the number, trend, and risk level distribution of security incidents in various regions over the past 30 days on a large screen in real time.

[0065] In some embodiments, the fused event data is used as training samples to establish a dangerous area prediction model, providing early warning of high-risk events 5 minutes in advance.

[0066] Combination Figure 2 As shown, this application provides a production data fusion method, including: Step S201: Obtain the raw data corresponding to multiple target data sources; Step S202: Standardize the raw data using ETL. Standardization processing includes at least one of the following: data cleaning, data format standardization, data unit conversion, and data integration. Step S203: Input the original data into the abnormal data classification model to determine whether the original data is abnormal data based on the output results of each decision tree sub-model; Step S204: Cluster the original data based on the feature similarity between them to obtain multiple similar data clusters. Then, normalize the original data in the same similar data cluster to obtain intermediate data. Step S205: Semantic fusion is performed on the intermediate data corresponding to the same production theme according to the production indicators to obtain the production dataset corresponding to each production theme. Among them, production indicators include production time and / or production scenario.

[0067] Optionally, after semantically fusing intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, the method further includes: establishing theme relationships between production themes, wherein the theme relationships include spatial relationships and temporal relationships; establishing a spatiotemporal knowledge graph between production datasets according to theme relationships, and training a preset graph neural network model based on the spatiotemporal knowledge graph to obtain a risk inference model; and in response to an alarm event, using the risk inference model to determine the risk propagation path corresponding to the alarm event from the risk inference model.

[0068] In some embodiments, based on existing master data, geological bodies (e.g., faults, folds, coal seams, goafs, etc.) and spatial locations (e.g., roadway coordinates, working faces) are used as entity nodes, and thematic relationships are established. Spatial relationships include "adjacent," "containing," "through," etc., and temporal relationships include "prior to," "triggered," etc. Production datasets are used as event nodes and associated with entity nodes to construct a dynamic spatiotemporal knowledge graph. For example, the spatiotemporal knowledge graph includes "blasting event E1," "blasting event E1" is triggered at "time T1," "blasting time E1" is located at "location P1," "location P1" is adjacent to "fault F1," "blasting event E2" is located at "fault F1," etc. A graph neural network is used to learn and infer risk propagation paths from the spatiotemporal knowledge graph. When a new alarm event (e.g., "gas exceeding limit W1") occurs, the node relationships of the alarm event are traversed in reverse to find other event nodes that precede it in time and are spatially associated (e.g., "blasting event," "geological anomaly," etc.), and a complete risk propagation path is automatically generated.

[0069] Optionally, after semantically fusing intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, the method further includes: constructing a production scenario model based on the production scenario, wherein the production scenario model includes equipment operating parameters corresponding to the production scenario; establishing a correspondence between equipment operating parameters and production datasets based on the production scenario; and using a preset operating data prediction model to predict the corresponding equipment operating parameters based on the production dataset at the current time node to obtain the expected operating parameters corresponding to the future time node, wherein the current time node is located before the future time node.

[0070] In some embodiments, a gridded production scenario model is constructed based on geological exploration data (e.g., borehole data, channel wave seismic data, rock strength, gas content, etc.). The operating parameters in the production dataset are mapped to the production scenario model according to the production scenario. A deep learning model is used to train the model based on the production scenario model and the production dataset to learn the nonlinear mapping relationship between the geological exploration data and the operating parameters, thereby obtaining an operational data prediction model. In actual production, the operational data prediction model recommends the expected operational parameters for the next stage based on the real-time production scenario model and the production dataset.

[0071] Combination Figure 3 As shown, this application provides a production data fusion system, including an acquisition module 301, a merging module 302, and a semantic fusion module 303.

[0072] The acquisition module 301 is used to acquire the raw data corresponding to multiple target data sources.

[0073] The merging module 302 is used to cluster the original data based on the feature similarity between them to obtain multiple similar data clusters, and then normalize the original data in the same similar data cluster to obtain intermediate data.

[0074] The semantic fusion module 303 is used to perform semantic fusion on intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme. The production indicators include production time and / or production scenario.

[0075] The production data fusion system provided in this application clusters raw data from different data sources based on feature similarity. Then, it normalizes the raw data within the same similar data cluster to generate standardized intermediate data. Finally, it semantically fuses the intermediate data related to a single production theme according to indicators such as production time or production scenario, thus obtaining production datasets corresponding to each production theme. This not only automatically identifies and integrates data with similar content but different formats or descriptions through clustering and normalization steps, effectively solving fundamental quality problems at the raw data level such as naming conflicts, inconsistent units, and record redundancy, ensuring the standardization of intermediate data, but also ensures that the production datasets are built around specific business needs through targeted semantic fusion based on production themes, avoiding the mixing of irrelevant data. This results in high-value thematic datasets without data redundancy, improving data quality in two ways.

[0076] This application also provides an electronic device, including: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the electronic device performs the above-described method.

[0077] Figure 4 A schematic diagram of a computer system suitable for implementing the embodiments of this application is shown. It should be noted that... Figure 4 The computer system 400 of the electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0078] like Figure 4As shown, the computer system 400 includes a Central Processing Unit (CPU) 401, which can perform various appropriate actions and processes, such as executing the methods described in the above embodiments, based on programs stored in Read-Only Memory (ROM) 402 or programs loaded from storage portion 408 into Random Access Memory (RAM) 403. The RAM 403 also stores various programs and data required for system operation. The CPU 401, ROM 402, and RAM 403 are interconnected via a bus 404. An Input / Output (I / O) interface 405 is also connected to the bus 404.

[0079] The following components are connected to I / O interface 405: an input section 406 including a keyboard, mouse, etc.; an output section 407 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 408 including a hard disk, etc.; and a communication section 409 including a network interface card such as a LAN (Local Area Network) card, modem, etc. The communication section 409 performs communication processing via a network such as the Internet. A drive 410 is also connected to I / O interface 405 as needed. A removable medium 411, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 410 as needed so that computer programs read from it can be installed into storage section 408 as needed.

[0080] The electronic device disclosed in this embodiment includes a processor, a memory, a transceiver, and a communication interface. The memory and the communication interface are connected to the processor and the transceiver and enable communication between them. The memory is used to store computer programs, the communication interface is used for communication, and the processor and transceiver are used to run the computer programs, causing the electronic device to perform the various steps of the above method. The above description and drawings fully illustrate the embodiments of this disclosure to enable those skilled in the art to practice them. Other embodiments may include structural, logical, electrical, procedural, and other changes. The embodiments represent only possible variations. Unless explicitly required, individual components and functions are optional, and the order of operation may vary. Parts and subsamples of some embodiments may be included in or replace parts and subsamples of other embodiments. Moreover, the terminology used in this application is only for describing embodiments and is not intended to limit the claims. As used in the description of embodiments and claims, the singular forms “a,” “an,” and “the” are intended to equally include the plural forms unless the context clearly indicates otherwise. Similarly, the term “and / or” as used in this application means including one or more of the associated listed items and all possible combinations thereof. Additionally, when used in this application, the term "comprise" and its variations "comprises" and / or "comprising" refer to the presence of stated subsamples, wholes, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other subsamples, wholes, steps, operations, elements, components, and / or groups thereof. Without further limitations, an element defined by the phrase "comprising a..." does not exclude the presence of other identical elements in the process, method, or apparatus that includes the element. In this document, each embodiment may focus on the differences from other embodiments, and similar or identical parts between embodiments can be referred to mutually. For methods, products, etc., disclosed in the embodiments, if they correspond to the method section disclosed in the embodiments, the relevant parts can be referred to the description of the method section.

[0081] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0082] The methods and products (including but not limited to devices and equipment) disclosed in the embodiments herein can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For instance, the division of units may be merely a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some sub-samples may be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical, or other forms. Units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to implement this embodiment according to actual needs. Furthermore, the functional units in this application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

[0083] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of the systems, methods, and computer program products according to this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. In the descriptions corresponding to the flowcharts and block diagrams in the accompanying drawings, the operations or steps corresponding to different blocks may also occur in a different order than those disclosed in the description; sometimes there is no specific order between different operations or steps. For example, two consecutive operations or steps may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. Each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

Claims

1. A method for fusion of production data, characterized in that, include: Obtain raw data from multiple target data sources; Clustering is performed based on the feature similarity between the original data to obtain multiple similar data clusters. Then, the original data in the same similar data cluster is normalized to obtain intermediate data. Semantic fusion is performed on intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme. The production indicators include production time and / or production scenario.

2. The method according to claim 1, characterized in that, Retrieve raw data from multiple target data sources, including: If the target data source includes a device data source, then the original data corresponding to the device data source includes at least one of device identifier, runtime, device current, and device load rate; If the target data source includes a secure data source, then the original data corresponding to the secure data source includes at least one of sensor identifier, gas concentration value, and temperature sampling value; If the target data source includes a personnel data source, then the original data corresponding to the personnel data source includes at least one of personnel identifier, location coordinates, and location timestamp; If the target data source includes a monitoring data source, then the original data corresponding to the monitoring data source includes at least one of camera identifier, image stream address, and image timestamp.

3. The method according to claim 1, characterized in that, The method further includes: The original data are standardized using ETL, wherein the standardization process includes at least one of data cleaning, data format unification, data unit conversion, and data integration.

4. The method according to claim 1, characterized in that, The method further includes at least one of the following: The integrity score is determined based on the proportion of null data in the original data, wherein the proportion of null data in the original data is negatively correlated with the integrity score. An accuracy score is determined based on the proportion of abnormal data in the original data, wherein the abnormal data includes the original data located within a preset abnormal range. A consistency score is determined based on the proportion of discrepancies in the original data, wherein the discrepancies include original data that differs from the preset benchmark data.

5. The method according to claim 1, characterized in that, The method further includes: An anomaly data classification model is obtained by pre-training a random forest model, wherein the random forest model includes multiple decision tree sub-models; The original data is input into the abnormal data classification model to determine whether the original data is abnormal data based on the output results of each decision tree sub-model.

6. The method according to claim 1, characterized in that, Semantic fusion is performed on intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, including at least one of the following: Semantic fusion is performed on the intermediate data corresponding to the target time to obtain the production dataset, wherein the target time includes the production time within a preset time window; Semantic fusion is performed on the intermediate data corresponding to the target location to obtain the production dataset, wherein the target location includes the coordinate position located within the preset production scene.

7. The method according to claim 1, characterized in that, After semantically fusing intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, the method further includes: Establish thematic relationships between the various production themes, wherein thematic relationships include spatial relationships and temporal relationships; A spatiotemporal knowledge graph is established between the production datasets according to the topic relationships, and a preset graph neural network model is trained based on the spatiotemporal knowledge graph to obtain a risk reasoning model; In response to an alarm event, the risk propagation path corresponding to the alarm event is determined from the risk reasoning model.

8. The method according to claim 1, characterized in that, After semantically fusing intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme, the method further includes: A production scenario model is constructed based on the production scenario, wherein the production scenario model includes the equipment operating parameters corresponding to the production scenario; Establish a correspondence between the equipment operating parameters and the production dataset based on the production scenario; Using a pre-defined operational data prediction model, the corresponding equipment operating parameters are predicted based on the production dataset at the current time point to obtain the expected operating parameters for future time points, wherein the current time point is located before the future time point.

9. A production data fusion system, characterized in that, include: The acquisition module is used to acquire raw data from multiple target data sources. The merging module is used to cluster the original data based on the feature similarity between them to obtain multiple similar data clusters, and to normalize the original data in the same similar data cluster to obtain intermediate data. The semantic fusion module is used to perform semantic fusion on intermediate data corresponding to the same production theme according to production indicators to obtain production datasets corresponding to each production theme. The production indicators include production time and / or production scenario.

10. An electronic device, characterized in that, include: Processor and memory; The memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory to cause the electronic device to perform the method as described in any one of claims 1-8.