A multi-source enterprise data fusion method and device based on conflict reduction

By constructing data source fusion metadata and cross-source ontology mapping, and combining RDF graph indexing with a multi-perspective deep fusion model, the problem of insufficient adaptability in multi-source data fusion is solved, and high-accuracy data fusion is achieved in complex semantics and dynamic scenarios.

CN122241576APending Publication Date: 2026-06-19HANGZHOU ANSHU ENTERPRISE CONSULTING CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU ANSHU ENTERPRISE CONSULTING CO LTD
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing multi-source data fusion solutions lack adaptive adjustment capabilities when the quality of data sources changes dynamically or business scenarios are frequently adjusted, leading to deviations in fusion results. Furthermore, methods based on similarity calculation are prone to mismatches or omissions when dealing with complex semantic expressions and multi-format fields.

Method used

By constructing data source fusion metadata, using natural language processing and structured mapping techniques to generate RDF resource sets, cross-source ontology mapping and RDF graph indexing are performed. Combined with a multi-perspective mid-term deep fusion model, conflicts such as multiple values, homonyms, and inconsistent sources are identified and reduced. Semantic similarity thresholds and weighted rule judgment strategies are used to reduce conflicts.

Benefits of technology

It achieves stability and consistency of fusion results under complex semantics and dynamic scenarios, reduces the risk of mismatch and missed match, and improves the accuracy and reliability of multi-source data fusion.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241576A_ABST
    Figure CN122241576A_ABST
Patent Text Reader

Abstract

This disclosure provides a method and apparatus for multi-source enterprise data fusion based on conflict reduction, relating to the field of data processing. The method includes: acquiring heterogeneous enterprise data; performing unified processing through natural language processing and structured mapping techniques to construct a local RDF resource set; performing cross-source ontology mapping on the local RDF resources to form a global mapping table; performing RDF graph indexing and subgraph matching based on the global mapping table to determine a set of candidate fusion locations; extracting structured fields and textual semantic features of the candidate fusion content and quantizing and encoding them to construct a unified feature space representation; combining feature representation and fusion meta-information to identify conflicts such as multi-valued, homonymous, and inconsistent sources, and determining a conflict reduction field set through semantic similarity thresholds and weight rules; inputting the field set into a multi-view intermediate deep fusion model to generate unified enterprise entity records; and finally converting the data into fused RDF resources based on the global mapping table and writing them into a result database.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of data processing, and in particular to a method and apparatus for multi-source enterprise data fusion based on conflict reduction. Background Technology

[0002] As enterprises deepen their informatization and digital transformation, they continuously generate massive amounts of heterogeneous data in production operations, supply chain management, financial management, customer relationship management, risk control, and external cooperation. This data typically originates from different business systems, organizational departments, and external data platforms, exhibiting characteristics such as diverse sources, inconsistent structures, varying update frequencies, and inconsistent semantic expressions. To achieve enterprise-wide data sharing and intelligent decision support, it is necessary to integrate and process multi-source data.

[0003] In existing technologies, multi-source data fusion is typically achieved through data cleaning, entity alignment, field mapping, and rule matching. Common methods include exact matching based on primary keys or unique identifiers, fuzzy matching based on similarity calculations, and entity disambiguation techniques combined with machine learning models. Regarding conflict resolution, some solutions employ priority strategies (such as setting an authoritative data source), timestamp-first strategies (selecting the latest data), or majority decision-making methods based on voting mechanisms to determine the final retained data.

[0004] However, some existing multi-source data fusion solutions still rely heavily on fixed priorities or rule-based conflict resolution strategies. While these methods are applicable to scenarios with stable data source quality and well-defined business rules, they lack adaptive adjustment capabilities when data source quality changes dynamically or business scenarios are frequently adjusted, potentially leading to deviations in the fusion results. Furthermore, entity matching methods based on similarity calculations still suffer from limited matching accuracy when dealing with complex semantic expressions, multi-format fields, or unstructured data, easily resulting in false or missed matches, thus affecting the consistency and reliability of the fusion results. Summary of the Invention

[0005] This invention discloses a method and device for multi-source enterprise data fusion based on conflict reduction, so as to at least solve the above-mentioned technical problems existing in the prior art.

[0006] According to a first aspect disclosed in this invention, a method for multi-source enterprise data fusion based on conflict reduction is provided, comprising: S1: Acquire enterprise data from multiple heterogeneous data sources and generate data source fusion metadata for the enterprise data; S2: Process the enterprise data using natural language processing and structured mapping techniques to construct a local RDF resource set; S3: Perform cross-source ontology mapping on the local RDF resource set to generate a global mapping table about the global ontology and the local ontology; S4: Based on the global mapping table, perform RDF graph indexing and subgraph matching on the local RDF resource set to determine the candidate fusion location set; S5: Extract the structured field features and textual semantic features contained in the cross-source candidate fusion content corresponding to the candidate fusion position set, and perform vectorized encoding to form a unified feature space representation; S6: Based on the unified feature space representation and the data source fusion meta-information, identify conflicts such as multiple values, homonyms, and inconsistent sources, and use semantic similarity threshold determination strategy and weight rule determination strategy to determine the conflict reduction field set; S7: Input the conflict reduction field set into the pre-set multi-perspective mid-term deep fusion model according to the data source perspective, learn the marginal representations of each perspective and fuse them to obtain a joint representation, and generate a unified enterprise entity record; S8: Using the global mapping table, the unified enterprise entity record is converted into a fused RDF resource and written into the fusion result database. The candidate fusion location set and the data source fusion metadata are used as traceability indexes to establish an association with the fused RDF resource to obtain a unified query output fusion result.

[0007] In one possible implementation, the data source fusion metadata specifically includes: Data source identifier, data source timestamp, data source field, data source text location, data source access level, and initial value of data source source credibility.

[0008] In one possible implementation, S2 specifically includes: S201: According to the data format, the enterprise data is divided into structured data, semi-structured data and unstructured data, and an RDF assembly unit with triples as the basic carrier form is generated for each data. S202: Perform R2RML mapping on the structured data to generate structured RDF sub-resources; S203: Perform natural language processing extraction on the semi-structured data and the unstructured data to generate text RDF sub-resources; S204: Combine the structured RDF sub-resource with the text RDF sub-resource using ternary combinations, and construct an RDF graph model from the combined set of ternary combinations to form a local RDF graph resource that can be used for subsequent indexing and matching; S205: Based on the local RDF graph resource, generate a preset dimension semantic vector annotation for each RDF triple, and write the preset dimension semantic vector annotation back to the corresponding triple as an annotation attribute to obtain a local RDF resource with semantic vector annotation; S206: Encapsulate the local RDF resources according to the data source dimension to construct the local RDF resource set.

[0009] In one possible implementation, S3 specifically includes: S301: Extract the local ontology elements corresponding to each data source in the local RDF resource set to form a set of elements to be mapped; S302: Perform similarity detection, including concept similarity, attribute similarity, relationship similarity and data format similarity, on the set of elements to be mapped, and generate a set of candidate mapping pairs; S303: Perform a merging operation on local ontology items that express the same meaning in the candidate mapping pair set to obtain a global ontology candidate set; S304: Perform conflict resolution mapping on local ontology items with the same or similar names but different meanings in the candidate mapping pair set, map the current local ontology item to different global ontology identifiers, and output a set of conflict differentiation rules; S305: Introduce an expert knowledge mapping channel to complete mapping pairs that cannot be stably determined through similarity detection, and format the mapping completion results and automatic mapping results in a unified manner to obtain expert mapping results; S306: Based on the global ontology candidate set, the conflict differentiation rule set, and the expert mapping result, generate a global mapping table between the global ontology and the local ontology.

[0010] In one possible implementation, S4 specifically includes: S401: Based on the global mapping table, the local RDF triples are normalized and rewritten to generate matching index units; S402: Based on the set of triples corresponding to the matching index units, construct an RDF graph model; S403: Partition the RDF graph model according to themes to generate multiple non-overlapping node clusters; S404: Using the node cluster as leaf nodes, merge them in pairs from bottom to top to generate parent nodes, and record the index description information obtained by aggregating the leaf nodes in the parent nodes to form a tree index structure for reducing the retrieval space. S405: Calculate the target distance between the matching index unit and the tree index node in the tree index structure, compare the target distance with the radius recorded by the tree index node, and determine the subgraph covered by the tree index node whose target distance is less than the radius as the candidate retrieval subgraph range; S406: Perform two-stage subgraph matching within the candidate retrieval subgraph to determine the candidate fusion position set.

[0011] In one possible implementation, S5 specifically includes: S501: Extract the candidate fusion position and combine it with the corresponding cross-source candidate fusion content, and organize the cross-source candidate fusion content according to the data source to form structured fields and text semantic fragments for subsequent feature extraction; S502: The structured fields are vectorized and encoded to generate structured features for each data source; S503: The text semantic segments are vectorized and encoded to generate text semantic features from each data source; S504: The structured features and the text semantic features are concatenated according to the fusion position to form a cross-source candidate fusion feature representation; S505: Using the optimal transmission method, the cross-source candidate fusion feature representation is aligned and mapped to form the unified feature space representation.

[0012] In one possible implementation, S6 specifically includes: S601: Based on the unified feature space representation and the data source fusion metadata, construct a conflict determination unit according to the fusion position; S602: The conflict determination unit performs multi-valued conflict identification, determines multi-valued conflicts, and generates a syntax fusion candidate set based on the multi-valued conflicts; S603: Perform syntax fusion on the syntax fusion candidate set to obtain a set of redundant fields; S604: Based on the aforementioned set of redundant fields, identify homonymous conflicts and generate a set of semantic adjudication pairs; S605: Apply the semantic similarity threshold determination strategy to the set of semantic adjudication pairs to reduce conflicts, and obtain a set of semantic reduction fields; S606: Identify source inconsistency conflicts based on the semantic reduction field set, and make a decision using the weight rule judgment strategy to determine the conflict reduction field set.

[0013] In one possible implementation, S7 specifically includes: S701: The conflict reduction field set is split into multiple perspective input branches according to the data source perspective, and each perspective input branch corresponds to a business domain data source group; S702: Extract and encode features from each of the business domain data source groups to obtain marginal representations from each perspective; S703: Construct a multi-view weighted objective function based on each of the aforementioned marginal representations, and determine the contribution weights of multiple views under the Lagrange normalization constraint; S704: Based on the contribution weights of each perspective, the marginal representations are weighted and fused to obtain a joint representation; S705: Based on the joint representation, generate the unified enterprise entity record.

[0014] According to a second aspect disclosed in this invention, an electronic device is provided, comprising: At least one processor; and a memory communicatively connected to said at least one processor; The memory stores instructions that can be executed by the at least one processor, which, when executed by the at least one processor, enables the at least one processor to perform the method described in this disclosure.

[0015] According to a third aspect of the present invention, a non-transitory computer-readable storage medium is provided storing computer instructions for causing the computer to perform the methods described herein.

[0016] Compared with existing technologies, the multi-source enterprise data fusion method based on conflict reduction disclosed in this invention has the following beneficial effects: By constructing a data source fusion metadata model, source attributes, temporal features, and credibility are uniformly modeled to provide dynamic decision-making basis for conflict determination. Based on this, natural language processing and cross-source ontology mapping are used to transform heterogeneous and unstructured data into semantically aligned RDF representations, reducing matching biases caused by field differences and semantic ambiguities at the source. Furthermore, RDF graph indexing and subgraph matching are combined to strengthen structural consistency constraints, uniformly encoding structured features and textual semantic features. Semantic similarity thresholds and weight rules are used to identify and reduce multi-value and source conflicts. Simultaneously, a multi-view, mid-term deep fusion model is introduced for joint modeling, and a result traceability and association mechanism is established to achieve adaptive optimization of conflict handling, effectively reducing the risk of mismatches and missed matches, and improving the stability and consistency of fusion results under complex semantics and dynamic scenarios.

[0017] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0018] The above and other objects, features, and advantages of this disclosure will become readily apparent from the following detailed description of exemplary embodiments, taken in conjunction with the accompanying drawings. Several embodiments of this disclosure are illustrated in the drawings by way of example and not limitation, in which: In the accompanying drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

[0019] Figure 1 A schematic diagram illustrating the implementation process of a multi-source enterprise data fusion method based on conflict reduction according to an embodiment of this disclosure is shown. Figure 2 A schematic diagram of the composition structure of an electronic device according to an embodiment of the present disclosure is shown. Detailed Implementation

[0020] To make the objectives, features, and advantages of this disclosure more apparent and understandable, the technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this disclosure, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.

[0021] The following description, in conjunction with the accompanying drawings, details the multi-source enterprise data fusion method based on conflict reduction provided by the embodiments of the present invention through specific implementations and application scenarios.

[0022] Reference manual attached Figure 1 The illustration shows a schematic diagram of the implementation process of a multi-source enterprise data fusion method based on conflict reduction according to an embodiment of the present disclosure.

[0023] This invention provides a method for multi-source enterprise data fusion based on conflict reduction, which may include the following steps: S1: Acquire enterprise data from multiple heterogeneous data sources and generate data source fusion metadata for the enterprise data.

[0024] In one possible implementation, the data source fusion metadata specifically includes: Data source identifier, data source timestamp, data source field, data source text location, data source access level, and initial value of data source source credibility.

[0025] Specifically, the data acquisition process, based on the enterprise's internal authorization mechanism, allows for the acquisition of relational database data within an internal enterprise setting through database access control systems. Once data acquisition is complete, data from different sources is uniformly received and registered, and data source fusion metadata is generated for each data entry. This fusion metadata is bound to the original data and stored together, providing foundational support for subsequent ontology mapping, conflict resolution, and fusion tracing.

[0026] S2: Processes enterprise data using natural language processing and structured mapping technologies to build a local RDF resource collection.

[0027] Natural Language Processing (NLP) is a technology that uses computers to analyze, understand, and structurally represent human natural language. It typically includes processes such as word segmentation, part-of-speech tagging, named entity recognition, relation extraction, semantic representation, and vectorized encoding. In multi-source enterprise data fusion scenarios, NLP technology is mainly used to extract semantic elements from unstructured or semi-structured data such as contract texts, work order records, email content, and announcement descriptions, transforming entity, attribute information, and relationships between entities in the text into a computable structured representation.

[0028] Structured mapping is a method for converting existing structured data schemas (such as relational database table structures, field definitions, or data models) into a unified semantic representation framework. It typically establishes a correspondence between data table fields and concepts, attributes, or relationships in the target ontology through mapping rules. In enterprise data fusion scenarios, structured mapping is usually based on predefined field-ontology mapping rules or mapping languages ​​(such as R2RML) to convert database records into a unified triple form or graph model representation, thereby eliminating differences in field naming and data structure between different systems and achieving a unified semantic expression of data across systems.

[0029] In one possible implementation, S2 specifically includes: S201: Based on the data format, enterprise data is divided into structured data, semi-structured data, and unstructured data, and an RDF assembly unit with triples as the basic carrier is generated for each data.

[0030] Specifically, S202: Perform R2RML mapping on structured data to generate structured RDF sub-resources.

[0031] Specifically, for structured data, based on the mapping rules between relational database fields and entities / attributes, R2RML is used to convert relational database records into RDF triples and write them into structured RDF sub-resources. The data source fusion metadata from step S1 is then bound to the generated triples to form a traceable structured RDF sub-resource.

[0032] S203: Perform natural language processing extraction on semi-structured and unstructured data to generate text RDF sub-resources.

[0033] Specifically, for semi-structured / unstructured data sets, a natural language processing workflow is used to extract RDF-compatible semantic elements and transcribe them into RDF triples, which are then written into text RDF sub-resources: candidate entities are extracted from the text and determined as the subject / object of the triples; the relationships or field semantics between entities are extracted as predicates; and the extraction results are uniformly organized into RDF triples and written into local RDF resources to achieve unified RDF storage of multi-source data.

[0034] S204: Combine structured RDF sub-resources with text RDF sub-resources into triples, and construct an RDF graph model from the combined triple set to form a local RDF graph resource that can be used for subsequent indexing and matching.

[0035] Specifically, the RDF graph model is represented as =( , , ), G This represents the constructed RDF graph model. V It is a set of vertices consisting of subject and object. E Let be the set of edges represented by the predicate. =, - .+, - ., L Represents a set of label functions. L v This represents the set of vertex labeling functions used to label vertices. V Each node in the process is assigned a semantic label. L p This represents a set of edge label functions used to label edges. E Each relation in the table is assigned a semantic label.

[0036] S205: Based on the local RDF graph resource, generate a preset dimension semantic vector annotation for each RDF triple, and write the preset dimension semantic vector annotation back to the corresponding triple as an annotation attribute to obtain the local RDF resource with semantic vector annotation.

[0037] The pre-defined semantic vector annotations are obtained through a vectorization process, which includes: processing RDF triples. S,P,O The `subject` text tag (a text sequence formed by concatenating the entity name, entity alias, entity description text, or their attribute fields bound to the `subject` resource identifier), the `predicate` semantic tag (relationship name, relationship description), and the `object` text tag (the attribute value text, entity name, or its semantic description text corresponding to the `object` resource identifier) ​​are used to construct a text sequence of triples. This triple text sequence is then input into the Word2Vec vectorization model to output... k A semantic vector. And will... k The semantic vector is written back as an annotation attribute of the RDF triple, forming a local RDF resource with semantic vector annotation.

[0038] S206: Encapsulate local RDF resources according to the data source dimension to build a local RDF resource collection.

[0039] Specifically, the generated local RDF triples are grouped and organized according to the data source identifier. RDF graph resources, semantic vector annotations and corresponding data source fusion metadata belonging to the same data source are uniformly encapsulated to form independent data source-level RDF resource units. Then, each data source-level RDF resource unit is indexed and organized according to the data source number to construct a local RDF resource set.

[0040] It should be noted that the binding relationship between local RDF resources and data source fusion metadata is preserved.

[0041] In this embodiment of the invention, by combining natural language processing technology with structured mapping technology, a unified RDF representation of multi-source heterogeneous enterprise data is achieved. This enables structured and unstructured data to be organized and stored within the same semantic framework, thereby solving the problems of inconsistent representation and semantic alignment of different data formats in existing technologies. Simultaneously, by binding data source fusion metadata with triples and generating semantic vector annotations for each triple, local RDF resources not only possess graph representation capabilities at the structural level but also measurable feature expression capabilities at the semantic level. This provides a unified and computable foundation for subsequent ontology mapping, subgraph matching and location, and conflict reduction, thereby improving the accuracy of the multi-source data fusion process.

[0042] S3: Perform cross-source ontology mapping on the local RDF resource collection to generate a global mapping table about the global ontology and the local ontology.

[0043] Cross-source ontology mapping refers to the technical process of semantic alignment and unified modeling of local concepts, attributes and relationships from different data sources during multi-source data fusion. Its core lies in identifying ontology elements in different data sources that express the same business meaning or similar semantics, mapping them to a unified global ontology identifier, and distinguishing and standardizing ontology items with the same name but different meanings or semantic conflicts.

[0044] The global mapping table refers to the unified semantic alignment result data structure generated after the cross-source ontology mapping is completed. It is used to record the correspondence between local ontology items (including concepts, attributes and relations) of each data source and the global ontology identifier.

[0045] In one possible implementation, S3 specifically includes: S301: Extract the local ontology elements corresponding to each data source in the local RDF resource collection to form a set of elements to be mapped.

[0046] Specifically, local ontology elements include at least concepts, attributes, and predication / relationship, which are organized into a "set of elements to be mapped" as a unified input for subsequent similarity detection and mapping decisions.

[0047] S302: Perform similarity detection on the set of elements to be mapped, including concept similarity, attribute similarity, relationship similarity, and data format similarity, and generate a set of candidate mapping pairs.

[0048] Concept similarity is a metric used to measure whether two concepts from different data sources express the same or similar business meanings at the semantic level. It can be calculated based on the string similarity of concept names, the distance between hierarchical relationships of concepts in the ontology hierarchy, and the similarity between semantic vector representations. Attribute similarity is a metric used to measure the consistency of attribute items in semantic meaning and functional purpose across different data sources. Relationship similarity is a metric used to measure the closeness of relations or predicates in semantic expression across different data sources. It can be calculated based on the textual similarity of relation names, the matching degree of the concept pairs (subject type and object type) connected by the relation, and the distance between relation semantic vectors. Data format similarity is a metric used to measure the consistency of data items in structural form or data type across different data sources. It typically includes comparisons of data types (e.g., strings, integers, dates), length ranges, value distribution characteristics, and format patterns (e.g., date formats, encoding rules).

[0049] S303: Perform a merging operation on local ontology items that express the same meaning in the candidate mapping pair set to obtain a global ontology candidate set.

[0050] Specifically, local concepts / attributes / relationships from different data sources but with consistent semantics are mapped to the same global ontology identifier, and redundant items are eliminated. During merging, a stable global identifier is generated for each global ontology candidate, and its corresponding local source set is recorded, providing a structured basis for subsequent tracing and conflict adjudication.

[0051] S304: Perform conflict resolution mapping on local ontology items with the same or similar names but different meanings in the candidate mapping set, map the current local ontology item to different global ontology identifiers, and output a set of conflict differentiation rules.

[0052] Among them, conflict resolution mapping refers to the process in which, when local ontology items in the candidate mapping pair set have the same or similar names, but semantic verification reveals that their actual business meanings differ, synonym merging is not performed, but rather a distinguishing mapping process is performed.

[0053] Specifically, firstly, semantic conflict detection is performed on candidate mapping pairs, based on at least one or more of the following conditions: whether the conceptual context to which the local ontology item belongs is consistent, whether the subject or object types they connect are consistent, whether there are significant differences in the distribution characteristics of their attribute values, and whether the distance between their semantic vectors exceeds a preset semantic separation threshold. When the conflict determination conditions are met, different global ontology identifiers are assigned to each conflicting local ontology item, and distinction rules are generated simultaneously. These distinction rules are used to limit the data source, business domain, conceptual context, or type constraints under which the item should be mapped to the corresponding global identifier. The distinction rules are stored in the form of structured conditional expressions and written into the global mapping table.

[0054] It should be noted that while mapping local ontology items to different global ontology identifiers, distinguishing rules are generated for them to ensure their distinguishability under the global ontology.

[0055] S305: Introduce an expert knowledge mapping channel to complete mapping pairs that cannot be stably determined through similarity detection, and format the completed mapping results and the automatic mapping results in a unified format to obtain the expert mapping results.

[0056] Among them, the expert knowledge mapping channel refers to a manual or rule-driven supplementary mapping mechanism set up in the process of cross-source ontology mapping to solve the problem that automatic similarity calculation cannot stably determine the mapping relationship or that there is semantic ambiguity.

[0057] Specifically, when the similarity results of candidate mapping pairs are in an uncertain range, or when the automatically determined results deviate from the business semantics, the relevant local ontology items are semantically confirmed and mapped by means of a preset expert rule base, business dictionary, industry standard ontology, or authorized domain expert configuration interface. The completed results are written to the global mapping table in a structured form, and their source channel and priority are marked for use in subsequent normalization rewriting and matching stages.

[0058] S306: Based on the global ontology candidate set, the conflict differentiation rule set, and the expert mapping results, generate a global mapping table between the global ontology and the local ontology.

[0059] It should be noted that the global mapping table includes: the mapping relationship between local concept / attribute / relation identifiers and global ontology identifiers, equivalence class information and source list generated by synonym merging, and distinction rules and applicable conditions generated by distinguishing homonyms. The global mapping table is output to the subsequent step S4 for normalized indexing and subgraph matching and positioning.

[0060] Specifically, a unique GlobalID is assigned to each global ontology candidate and an index is created. The local ontology items obtained in step S303 are merged according to GlobalID to form equivalence classes and written into the equivalence mapping. The differentiation rules from step S304 are attached to the corresponding GlobalIDs according to data source / business domain / context conditions and written into the condition mapping. The expert mappings from step S305 are written according to priority; new condition mappings can be added and conflict markers can be retained. Finally, all mapping record fields are standardized, and a global mapping table containing local ontology item identifiers, types, GlobalIDs, mapping types, applicable conditions, and source channels is output.

[0061] In this embodiment of the invention, cross-source ontology mapping unifies the mapping of concepts, attributes, and relationships from different systems to a global identifier, eliminating the semantic fragmentation caused by naming differences. Simultaneously, a synonym merging mechanism reduces redundant expressions, making semantic expression more focused and standardized.

[0062] Furthermore, by supplementing the expert knowledge mapping channel, the shortcomings of automatic similarity calculation in complex business scenarios can be compensated for, thereby improving the stability and business adaptability of the mapping results. S4: Based on the global mapping table, perform RDF graph indexing and subgraph matching on the local RDF resource set to determine the set of candidate fusion locations.

[0063] In one possible implementation, S4 specifically includes: S401: Normalize and rewrite local RDF triples based on the global mapping table to generate matching index cells.

[0064] Specifically, based on the global mapping table, the triples in the local RDF resource collection are...<S,P,O> Performing normalized rewriting of predicates and entity identifiers maps cross-source local predicates / concepts to global ontology identifiers, thereby generating "matchable index units" for subsequent indexing and matching. A matchable normalized unit includes at least: a normalized subject identifier, a normalized predicate identifier, a normalized object identifier, and the data source fusion metadata bound to it.

[0065] It should be noted that the matchable index unit is used for cross-source matching and location.

[0066] S402: Construct an RDF graph model based on the set of triples corresponding to the matching index units.

[0067] It should be noted that the constructed RDF graph model is used to transform data query processing into a subgraph matching problem on a large graph.

[0068] S403: Partition the RDF graph model according to themes to generate multiple non-overlapping node clusters.

[0069] It should be noted that the subject usually carries the "entity primary key / master data identifier". Clustering by subject can quickly converge the query space to the entity-level candidate subgraph, providing a more stable candidate range for subsequent fusion and location positioning.

[0070] S404: Using node clusters as leaf nodes, merge them in pairs from bottom to top to generate parent nodes, and record the index description information obtained from the aggregation of leaf nodes in the parent nodes to form a tree index structure for reducing the retrieval space.

[0071] S405: Calculate the target distance between the matching index cell and the tree index node in the tree index structure, compare the target distance with the radius recorded by the tree index node, and determine the subgraph covered by the tree index node whose target distance is less than the radius as the candidate retrieval subgraph range.

[0072] S406: Perform two-stage subgraph matching within the candidate retrieval subgraph to determine the set of candidate fusion locations.

[0073] Specifically, first, similarity matching is performed on the subject to obtain subject matching results. If the subject matches successfully, object similarity is further calculated and a threshold is determined. If the object similarity exceeds a preset threshold, the matching result is determined as the fusion position, and the fusion positions are summarized to form a candidate fusion position set.

[0074] In this embodiment of the invention, a global mapping table is generated in step S3, and normalization rewriting and two-stage subgraph matching are performed on the triples in step S4, thereby accurately determining the set of candidate fusion positions at the fusion front end, significantly reducing the cross-source matching space and avoiding erroneous fusion.

[0075] S5: Extract the structured field features and textual semantic features contained in the cross-source candidate fusion content corresponding to the candidate fusion position set, and perform vectorized encoding to form a unified feature space representation.

[0076] In one possible implementation, S5 specifically includes: S501: Extract the cross-source candidate fusion content corresponding to the candidate fusion position set, and organize the cross-source candidate fusion content according to the data source to form structured fields and text semantic fragments for subsequent feature extraction.

[0077] S502: Vectorize and encode structured fields to generate structured features for each data source.

[0078] Specifically, structured fields are categorized and encoded according to their type. Categorical fields are vectorized using one-hot encoding or embedded encoding. Numerical fields are processed for missing values ​​and normalized before being directly used as numerical features. Date or coded fields have their structured sub-features extracted and then represented numerically. Subsequently, the vectors of each structured field at the same fusion location are concatenated in a preset order or their dimensions are unified through linear mapping to form the structured feature vector of the corresponding data source.

[0079] S503: Vectorize and encode text semantic segments to generate text semantic features from various data sources.

[0080] Specifically, the text semantic segments extracted in step S501 are preprocessed, including word segmentation, stop word removal, and necessary standardization. Then, the processed text is input into a preset semantic encoding model (such as Word2Vec) for vectorization to obtain fixed-dimensional text semantic vectors. If the same fusion position contains multiple text segments, the multiple text vectors are averaged or weighted and fused to generate text semantic feature vectors from the corresponding data source.

[0081] S504: Concatenate structured features and textual semantic features according to the fusion position to form a cross-source candidate fusion feature representation.

[0082] S505: Employs the optimal transmission method to align and map the cross-source candidate fusion feature representations, forming a unified feature space representation.

[0083] It should be noted that Optimal Transport (OT) is a mathematical method for measuring and aligning the differences between two probability distributions. Its core idea is to find a "minimum cost quality transfer scheme" that maps one distribution to another with the minimum total cost, given a cost function.

[0084] Specifically, using the determined candidate fusion position as the minimum alignment granularity, candidate fusion feature representations from different data sources at the same fusion position are constructed into an aligned sample set. A cost matrix C is constructed based on the structured feature distance and the text semantic feature distance, where the elements in the cost matrix C represent the alignment cost between samples from any data source and samples from another data source. Based on the cost matrix C, an optimal transport solution with an entropy regularization term is used to obtain the alignment mapping matrix. The alignment mapping matrix is ​​then used to map the cross-source candidate fusion feature representations to a unified feature space, outputting a unified feature space representation for subsequent conflict determination and conflict reduction.

[0085] In this embodiment of the invention, by vectorizing and encoding structured fields and text semantic fragments respectively and concatenating them at the fusion location granularity, different data formats can be uniformly converted into computable numerical feature representations, placing fields that were originally not directly comparable with text content in the same feature space. Furthermore, by constructing a unified feature space representation through an optimal transmission alignment mechanism, the distribution differences between different data sources can be minimized while maintaining the internal structural information of each data source, avoiding feature misalignment or bias propagation problems caused by simple concatenation. Therefore, subsequent conflict identification and conflict mitigation steps can be based on feature representations with consistent distribution and uniform scale, thereby improving the stability and accuracy of conflict determination.

[0086] S6: Based on the unified feature space representation and data source fusion meta-information, identify conflicts such as multiple values, homonyms, and inconsistent sources, and use semantic similarity threshold judgment strategy and weight rule judgment strategy to determine the set of conflict reduction fields.

[0087] The semantic similarity threshold determination strategy refers to a method for judging the semantic consistency of candidate conflicting fields or triples based on distance or similarity indicators in the semantic vector space. Specifically, the text representation of the field value or triple to be judged is transformed into a fixed-dimensional semantic vector. The semantic similarity value is obtained by calculating the cosine similarity or Euclidean distance between the vectors. When the similarity is higher than a preset threshold, it is determined that they express the same or highly similar semantics and are merged to retain one. When the similarity is lower than the preset threshold, it is determined that their semantics are different and are distinguished or both are retained.

[0088] The weighted rule determination strategy refers to a decision-making mechanism that, when semantics cannot be directly determined or there are inconsistencies in the sources, evaluates candidate fields based on multi-dimensional business rules and outputs a decision result. Specifically, for candidate fields from different data sources at the same fusion location, a weight calculation model is constructed based on factors such as the initial value of data source credibility, timestamp freshness, business domain priority, or system level. Each candidate field is assigned a comprehensive weight and ranked. Based on the weight ranking result, the field with the highest weight is selected as the primary field, or a set of primary and secondary fields is generated.

[0089] In one possible implementation, S6 specifically includes: S601: Based on the unified feature space representation and data source fusion metadata, a conflict determination unit is constructed according to the fusion position.

[0090] It should be noted that the conflict resolution unit refers to a structured decision object constructed during the conflict reduction phase in order to systematically compare and adjudicate candidate fields or triples from different data sources at the same fusion location.

[0091] Specifically, using the determined fusion location as the organizational granularity, the unified feature space representations from different data sources at that location are aggregated, and the corresponding data source fusion metadata, including data source identifier, timestamp, and initial confidence value, are bound together to form a structured conflict determination unit.

[0092] S602: Perform multi-valued conflict identification on the conflict determination unit, determine the multi-valued conflict, and generate a grammar fusion candidate set based on the multi-valued conflict.

[0093] S603: Perform syntax fusion on the candidate set of syntax fusion to obtain a set of redundant fields.

[0094] Specifically, the field values ​​in the syntactic fusion candidate set are standardized, including unifying capitalization, removing invalid characters, standardizing date or encoding formats, and converting numerical units. Based on this standardization, duplicate or equivalent expressions are identified through string matching or rule matching. Syntactically consistent field values ​​are merged, retaining only one standardized representation, while recording the source information. Difference values ​​that do not meet the semantic conflict criteria are temporarily retained, ultimately resulting in a de-redundant field set with duplicate expressions removed and formatted uniformly.

[0095] S604: Identify homonymous conflicts based on the set of redundant fields and generate a set of semantic adjudication pairs.

[0096] Specifically, based on the obtained set of redundant fields, the fields are grouped according to their names or predicate identifiers. Semantic comparison is performed on fields with the same or similar names but different sources. The semantic differences are determined by combining their unified feature space representation and contextual information. For field pairs that are determined to have potential semantic differences, they are organized into a set of semantic adjudication pairs, and the corresponding data source identifiers and feature representations are recorded for subsequent semantic similarity threshold determination and conflict reduction processing.

[0097] S605: Apply a semantic similarity threshold judgment strategy to the semantic adjudication pair set to reduce conflicts, and obtain a semantic reduction field set.

[0098] Optionally, the semantic similarity threshold determination strategy includes: for RDF triples , , generate k The semantic vectors are processed and their cosine similarity is calculated. If the cosine similarity exceeds a preset threshold, one vector is retained; otherwise, both vectors are retained.

[0099] S606: Identify inconsistencies in the source of conflict based on the semantic reduction field set, and use a weighted rule judgment strategy to adjudicate and determine the conflict reduction field set.

[0100] It should be noted that the conflict reduction field set targets two categories: attribute value conflicts and relation assertion conflicts. Attribute value conflicts correspond to multi-source inconsistencies in the values ​​of the `object` element in a triple, while relation assertion conflicts correspond to multi-source inconsistencies in the semantic references of the `predicate` or `object` element in a triple. For relation assertion conflicts, a semantic adjudication method consistent with the semantic similarity threshold determination strategy is used to determine whether the `predicate` / `object` should be synonymously merged or coexisting. The adjudication result is output to the conflict reduction field set in the form of a normalized triple, thus ensuring that the conflict reduction field set can simultaneously drive subsequent steps and provide a foundation.

[0101] Optionally, the weighting rule determination strategy may be based at least on the initial value of source credibility, timestamp freshness and business domain priority to weight and sort the candidate fields, and retain the field value with the highest weight or generate a set of master and slave fields.

[0102] In this embodiment of the invention, in step S5, the invention introduces an optimal transmission alignment mechanism with candidate fusion position as the smallest granularity to form a unified feature space representation, and in step S6, it constructs a dual-strategy conflict resolution process with semantic similarity threshold determination strategy and weight rule determination strategy, thereby achieving unified reduction of conflicts such as multiple values, homonyms, and inconsistent sources.

[0103] S7: Input the conflict reduction field set into the pre-set multi-view mid-term deep fusion model according to the data source perspective, learn the marginal representations of each perspective and fuse them to obtain a joint representation, and generate a unified enterprise entity record.

[0104] Among them, the multi-view intermediate deep fusion model is a model structure within a deep learning framework that branches and models information from different data sources or feature perspectives, and then fuses them in the intermediate layers of the network. Its basic idea is: first, an independent feature extraction branch is constructed for each data source perspective, and marginal representations are learned for each perspective input separately. Then, in the intermediate layers of the network, the marginal representations of each perspective are concatenated, weighted, or fused with attention to learn a joint representation across perspectives. Finally, a unified entity model or output is completed based on the joint representation.

[0105] In one possible implementation, S7 specifically includes: S701: Split the conflict reduction field set into multiple perspective input branches according to the data source perspective, with each perspective input branch corresponding to a business domain data source group.

[0106] S702: Extract and encode features from each business domain data source group to obtain marginal representations from each perspective.

[0107] Marginal representation refers to the feature representation results learned for each independent perspective input in a multi-view or multi-data source fusion model. That is, the representation vector formed by the data of the perspective itself through the feature extraction network without considering information from other perspectives.

[0108] S703: Construct a multi-perspective weighted objective function based on each marginal representation, and determine the contribution weights of multiple perspectives under the Lagrange normalization constraint.

[0109] It should be noted that the multi-view weighted objective function and the Lagrange normalization constraint are existing technologies, and will not be elaborated upon here.

[0110] S704: Based on the contribution weights of each perspective, the marginal representations are weighted and fused to obtain a joint representation.

[0111] S705: Generate unified enterprise entity records based on joint representation.

[0112] Specifically, the obtained joint representation vector is input into a preset attribute generation layer or classification regression layer to predict or select each target attribute. For categorical attributes, the optimal category result is output through the Softmax layer; for numerical attributes, the corresponding numerical estimate is output through the regression layer; and for relational attributes, the association between them and other entities is determined through the relation determination layer. Subsequently, the normalized attribute values ​​and relation sets obtained from the prediction or determination are bound to the corresponding global entity identifier to generate a unified enterprise entity record containing the entity identifier, attribute set, and relation set.

[0113] It should be noted that a unified enterprise entity record includes at least a global entity identifier and a set of normalized attributes and relationships associated with the global entity identifier, which serve as input for subsequent integration of RDF resource writing.

[0114] In this embodiment of the invention, the generated conflict reduction field set is input into the multi-view mid-term deep fusion model from the perspective of the data source. Under the Lagrange normalization constraint, the perspective contribution weight is determined and a joint representation is generated, thereby avoiding conflict noise from entering the model learning process and improving the consistency of unified enterprise entity records.

[0115] S8: Using a global mapping table, the unified enterprise entity records are converted into fused RDF resources and written into the fusion result database. The candidate fusion location set and the data source fusion metadata are used as traceability indexes to establish a relationship with the fused RDF resources, and a unified query output fusion result is obtained.

[0116] Specifically, the generated unified enterprise entity records are read, and their attributes and relationships are normalized and mapped based on the global mapping table generated in step S3, converting them into RDF triples. Subject, Predicate, Object The format is as follows: Subject is the global entity identifier, Predicate is the normalized global relation identifier, and Object is the normalized attribute value or associated entity identifier. The fused RDF triples are then written to the fusion result database. Simultaneously, the candidate fusion location set obtained in step S4 and the data source fusion metadata generated in step S1 are used as traceability indexes and associated with the corresponding fused RDF resources. This enables traceable location of the original source and fusion process while uniformly querying and outputting the fusion results.

[0117] In this embodiment of the invention, by converting unified enterprise entity records into fused RDF resources based on a global mapping table and writing them into the fusion result library, centralized storage and standardized expression of cross-source data under a unified semantic framework can be achieved, thereby supporting unified query output oriented towards global entity identifiers. Simultaneously, by establishing a connection between the candidate fusion location set and the data source fusion metadata as a traceability index and the fused RDF resources, the fusion results at the output layer possess the capabilities of traceable source, reproducible path, and interpretable adjudication process. This effectively solves the problems of difficulty in tracing the source of fusion results, difficulty in defining responsibility, and difficulty in data auditing in existing technologies, improving the transparency and credibility of multi-source enterprise data fusion systems.

[0118] According to the embodiments disclosed herein, the present invention also provides an electronic device and a readable storage medium.

[0119] Figure 2 A schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the invention disclosed herein is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention disclosed herein and / or claimed.

[0120] like Figure 2 As shown, device 800 includes a computing unit 801, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 802 or a computer program loaded from storage unit 808 into random access memory (RAM) 803. RAM 803 may also store various programs and data required for the operation of device 800. The computing unit 801, ROM 802, and RAM 803 are interconnected via bus 804. Input / output (I / O) interface 805 is also connected to bus 804.

[0121] Multiple components in device 800 are connected to I / O interface 805, including: input unit 806, such as keyboard, mouse, etc.; output unit 807, such as various types of monitors, speakers, etc.; storage unit 808, such as disk, optical disk, etc.; and communication unit 809, such as network card, modem, wireless transceiver, etc. Communication unit 809 allows device 800 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0122] The computing unit 801 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as a conflict-reduction-based multi-source enterprise data fusion method. For example, in some embodiments, the conflict-reduction-based multi-source enterprise data fusion method can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and / or installed on device 800 via ROM 802 and / or communication unit 809. When the computer program is loaded into RAM 803 and executed by the computing unit 801, one or more steps of the conflict-reduction-based multi-source enterprise data fusion method described above can be performed. Alternatively, in other embodiments, computing unit 801 may be configured by any other suitable means (e.g., by means of firmware) to perform a conflict reduction-based multi-source enterprise data fusion method.

[0123] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0124] Program code for implementing the methods disclosed in this invention may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a standalone software package, or entirely on a remote machine or server.

[0125] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0126] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0127] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0128] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, servers in distributed systems, or servers incorporating blockchain technology.

[0129] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this invention disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0130] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this disclosure, "a plurality of" means two or more, unless otherwise explicitly specified.

[0131] The above description is merely a specific embodiment of the present invention, but the scope of protection disclosed in the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection disclosed in the present invention. Therefore, the scope of protection disclosed in the present invention should be determined by the scope of the claims.

Claims

1. A multi-source enterprise data fusion method based on conflict reduction, characterized in that, The method includes: S1: Acquire enterprise data from multiple heterogeneous data sources and generate data source fusion metadata for the enterprise data; S2: Process the enterprise data using natural language processing and structured mapping techniques to construct a local RDF resource set; S3: Perform cross-source ontology mapping on the local RDF resource set to generate a global mapping table about the global ontology and the local ontology; S4: Based on the global mapping table, perform RDF graph indexing and subgraph matching on the local RDF resource set to determine the candidate fusion location set; S5: Extract the structured field features and textual semantic features contained in the cross-source candidate fusion content corresponding to the candidate fusion position set, and perform vectorized encoding to form a unified feature space representation; S6: Based on the unified feature space representation and the data source fusion meta-information, identify conflicts such as multiple values, homonyms, and inconsistent sources, and use semantic similarity threshold determination strategy and weight rule determination strategy to determine the conflict reduction field set; S7: Input the conflict reduction field set into the pre-set multi-perspective mid-term deep fusion model according to the data source perspective, learn the marginal representations of each perspective and fuse them to obtain a joint representation, and generate a unified enterprise entity record; S8: Using the global mapping table, the unified enterprise entity record is converted into a fused RDF resource and written into the fusion result database. The candidate fusion location set and the data source fusion metadata are used as traceability indexes to establish an association with the fused RDF resource to obtain a unified query output fusion result.

2. The multi-source enterprise data fusion method based on conflict reduction according to claim 1, characterized in that, The data source fusion metadata specifically includes: Data source identifier, data source timestamp, data source field, data source text location, data source access level, and initial value of data source source credibility.

3. The multi-source enterprise data fusion method based on conflict reduction according to claim 1, characterized in that, S2 specifically includes: S201: The enterprise data is divided into structured data, semi-structured data and unstructured data according to the data format, and an RDF assembly unit with triples as the basic carrier form is generated for each data. S202: Perform R2RML mapping on the structured data to generate structured RDF sub-resources; S203: Perform natural language processing extraction on the semi-structured data and the unstructured data to generate text RDF sub-resources; S204: Combine the structured RDF sub-resource with the text RDF sub-resource using ternary combinations, and construct an RDF graph model from the combined set of ternary combinations to form a local RDF graph resource that can be used for subsequent indexing and matching; S205: Based on the local RDF graph resource, generate a preset dimension semantic vector annotation for each RDF triple, and write the preset dimension semantic vector annotation back to the corresponding triple as an annotation attribute to obtain a local RDF resource with semantic vector annotation; S206: Encapsulate the local RDF resources according to the data source dimension to construct the local RDF resource set.

4. The multi-source enterprise data fusion method based on conflict reduction according to claim 1, characterized in that, S3 specifically includes: S301: Extract the local ontology elements corresponding to each data source in the local RDF resource set to form a set of elements to be mapped; S302: Perform similarity detection, including concept similarity, attribute similarity, relationship similarity and data format similarity, on the set of elements to be mapped, and generate a set of candidate mapping pairs; S303: Perform a merging operation on local ontology items that express the same meaning in the candidate mapping pair set to obtain a global ontology candidate set; S304: Perform conflict resolution mapping on local ontology items with the same or similar names but different meanings in the candidate mapping pair set, map the current local ontology item to different global ontology identifiers, and output a set of conflict differentiation rules; S305: Introduce an expert knowledge mapping channel to complete mapping pairs that cannot be stably determined through similarity detection, and format the mapping completion results and automatic mapping results in a unified manner to obtain expert mapping results; S306: Based on the global ontology candidate set, the conflict differentiation rule set, and the expert mapping result, generate a global mapping table between the global ontology and the local ontology.

5. The multi-source enterprise data fusion method based on conflict reduction according to claim 1, characterized in that, S4 specifically includes: S401: Based on the global mapping table, the local RDF triples are normalized and rewritten to generate matching index units; S402: Based on the set of triples corresponding to the matching index units, construct an RDF graph model; S403: Partition the RDF graph model according to themes to generate multiple non-overlapping node clusters; S404: Using the node cluster as leaf nodes, merge them in pairs from bottom to top to generate parent nodes, and record the index description information obtained by aggregating the leaf nodes in the parent nodes to form a tree index structure for reducing the retrieval space. S405: Calculate the target distance between the matching index unit and the tree index node in the tree index structure, compare the target distance with the radius recorded by the tree index node, and determine the subgraph covered by the tree index node whose target distance is less than the radius as the candidate retrieval subgraph range; S406: Perform two-stage subgraph matching within the candidate retrieval subgraph to determine the candidate fusion position set.

6. The multi-source enterprise data fusion method based on conflict reduction according to claim 1, characterized in that, S5 specifically includes: S501: Extract the cross-source candidate fusion content corresponding to the candidate fusion position set, and organize the cross-source candidate fusion content according to the data source to form structured fields and text semantic fragments for subsequent feature extraction; S502: The structured fields are vectorized and encoded to generate structured features for each data source; S503: The text semantic segments are vectorized and encoded to generate text semantic features from each data source; S504: The structured features and the text semantic features are concatenated according to the fusion position to form a cross-source candidate fusion feature representation; S505: Using the optimal transmission method, the cross-source candidate fusion feature representation is aligned and mapped to form the unified feature space representation.

7. The multi-source enterprise data fusion method based on conflict reduction according to claim 1, characterized in that, S6 specifically includes: S601: Based on the unified feature space representation and the data source fusion metadata, construct a conflict determination unit according to the fusion position; S602: The conflict determination unit performs multi-valued conflict identification, determines multi-valued conflicts, and generates a syntax fusion candidate set based on the multi-valued conflicts; S603: Perform syntax fusion on the syntax fusion candidate set to obtain a set of redundant fields; S604: Based on the aforementioned set of redundant fields, identify homonymous conflicts and generate a set of semantic adjudication pairs; S605: Apply the semantic similarity threshold determination strategy to the set of semantic adjudication pairs to reduce conflicts, and obtain a set of semantic reduction fields; S606: Identify source inconsistency conflicts based on the semantic reduction field set, and make a decision using the weight rule judgment strategy to determine the conflict reduction field set.

8. The multi-source enterprise data fusion method based on conflict reduction according to claim 1, characterized in that, Specifically, S7 includes: S701: The conflict reduction field set is split into multiple perspective input branches according to the data source perspective, and each perspective input branch corresponds to a business domain data source group; S702: Extract and encode features from each of the business domain data source groups to obtain marginal representations from each perspective; S703: Construct a multi-view weighted objective function based on each of the aforementioned marginal representations, and determine the contribution weights of multiple views under the Lagrange normalization constraint; S704: Based on the contribution weights of each perspective, the marginal representations are weighted and fused to obtain a joint representation; S705: Based on the joint representation, generate the unified enterprise entity record.

9. An electronic device, comprising: include: At least one processor; and a memory communicatively connected to the at least one processor; The memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the conflict reduction-based multi-source enterprise data fusion method according to any one of claims 1-8.

10. A non-transitory computer-readable storage medium having stored thereon computer instructions, wherein, The computer instructions are used to cause the computer to execute the conflict reduction-based multi-source enterprise data fusion method according to any one of claims 1-8.