A knowledge graph construction method and system based on data fusion
By constructing traditional databases, graph databases, and deep learning environments to process structured and unstructured data, the problems of limited data volume and low accuracy in knowledge graphs are solved, and high-precision knowledge graph construction is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2022-10-26
- Publication Date
- 2026-06-12
AI Technical Summary
In existing knowledge graph construction, the limited amount of data, low accuracy, and outdated and distorted data result in low information value and make it difficult to meet the thematic mining and analysis needs of specific users or application scenarios.
By constructing traditional databases, graph databases, and deep learning environments, structured and unstructured data are filtered and processed. Deep learning models are used for data annotation, preprocessing, and disambiguation to achieve data fusion and ultimately generate accurate knowledge graphs.
It generates accurate and continuously updated knowledge graphs, improving data volume and accuracy, and meeting the information needs of specific users.
Smart Images

Figure CN115618016B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, and in particular to a method and system for constructing a knowledge graph based on data fusion. Background Technology
[0002] The timeliness of information collection and acquisition, the accuracy of analysis and processing, and the effectiveness of distribution and application directly impact the effectiveness of national strategic planning and the defense and military system. Currently, the international internet is highly integrated with political, economic, social, and military fields. Strategic plans, research reports, and policy recommendations released by government departments, organizations, and expert think tanks, as well as open-source information accessible to anyone through portals, social media, and online media, have become important sources of information for various countries. Currently, even encyclopedia data and related online public account data already contain a large amount of knowledge related to specific fields. While the collection, acquisition, filtering, processing, subscription, and distribution of this open-source information can be efficiently completed using automated methods, thematic mining and analysis targeting specific users or application scenarios still mainly relies on manual labor. The strong or weak correlations existing in the information data are easily overlooked by analysts, directly resulting in low information value and limited usability. Therefore, how to utilize open-source data to construct a highly structured knowledge graph has become an urgent research topic.
[0003] In the past, knowledge graph construction often only extracted information from the structured data, ignoring the vast amount of unstructured data. Current knowledge graph construction often directly utilizes related technologies for information extraction without considering the existing structured data, easily leading to a large amount of erroneous or contradictory information in the constructed knowledge graph.
[0004] Therefore, providing a data fusion-based knowledge graph construction method and system that can effectively solve the problems of limited data volume, low accuracy, and outdated and distorted data in the knowledge graph construction process is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0005] The purpose of this invention is to provide a knowledge graph construction method based on data fusion. This method has clear logic, is simple to operate, and can effectively solve the technical problems of limited data volume and low accuracy in existing knowledge graph construction. The system also has the same beneficial effects.
[0006] Based on the above objectives, the technical solution provided by the present invention is as follows:
[0007] A knowledge graph construction method based on data fusion includes the following steps:
[0008] S1. Construct traditional databases, graph databases, and deep learning environments;
[0009] S2. Obtain the object from the traditional database;
[0010] S3. Filter the objects to obtain first data and second data, and store them respectively;
[0011] S4. Obtain third-party data based on key information from the official account;
[0012] S5. Process the second data and the third data to generate the fourth data and the fifth data;
[0013] S6. Process the fourth data and the fifth data to generate and store the sixth data;
[0014] S7. Process the sixth data and the first data to generate a knowledge graph;
[0015] S8. Repeat steps S2 to S7 until the knowledge graph is complete.
[0016] Preferably, step S2 specifically comprises:
[0017] The traditional database is filtered according to preset keywords to obtain the object.
[0018] Preferably, step S3 includes the following steps:
[0019] The objects are filtered according to the programming language to obtain the first data and the second data;
[0020] The first data is stored in the graph database in a first format;
[0021] The second data is stored in the traditional database.
[0022] Preferably, step S4 specifically comprises:
[0023] Obtain key information from the official account;
[0024] The third data is obtained by filtering each public account based on the public account information and the preset keywords.
[0025] Preferably, before step S5, the method further includes:
[0026] Define entity type, entity relationship, and entity attribute respectively.
[0027] Preferably, step S5 specifically comprises:
[0028] Based on the defined entity type, entity relationship, and entity attribute, the second data and the third data are labeled to generate the fourth data and the fifth data;
[0029] The fourth data is a labeled dataset, and the fifth data is an unlabeled dataset.
[0030] Preferably, step S6 includes the following steps:
[0031] Training the fourth data;
[0032] Preprocess the fifth data;
[0033] Based on the defined entity type, entity relationship, and entity attribute, the preprocessed fifth data of the trained fourth data is extracted to generate the sixth data and stored in the graph database.
[0034] Preferably, step S7 includes the following steps:
[0035] Disambiguation is performed on the sixth data;
[0036] The sixth data after disambiguation and the first data are fused to generate a knowledge graph.
[0037] A knowledge graph construction system based on data fusion includes:
[0038] Modules for building traditional databases, graph databases, and deep learning environments;
[0039] The acquisition module is used to acquire objects from the traditional database.
[0040] The acquisition module is also used to obtain third-party data based on key information from the official account;
[0041] A filtering module is used to filter the objects to obtain first data and second data;
[0042] A storage module is used to store the first data and the second data respectively;
[0043] The storage module is also used to store sixth data;
[0044] A processing module is used to process the second data and the third data to generate the fourth data and the fifth data;
[0045] The processing module is also used to process the fourth data and the fifth data to generate the sixth data;
[0046] The processing module is also used to process the sixth data and the first data to generate a knowledge graph.
[0047] Preferably, it further includes:
[0048] The definition module is used to define entity types, entity relationships, and entity attributes;
[0049] The processing module includes: a labeling submodule, a training submodule, a preprocessing submodule, an extraction module, a disambiguation submodule, and a data fusion submodule;
[0050] The annotation submodule is used to annotate the second data and the third data;
[0051] The training submodule is used to train the fourth data;
[0052] The preprocessing submodule is used to preprocess the fifth data;
[0053] The extraction submodule is used to extract the trained fourth data and the preprocessed fifth data;
[0054] The disambiguation submodule is used to disambiguate the sixth data;
[0055] The data fusion submodule is used to fuse the disambiguated sixth data and the first data to generate a knowledge graph.
[0056] This invention provides a knowledge graph construction method based on data fusion. It involves constructing a traditional database, a graph database, and a deep learning environment; acquiring objects from the traditional database; selecting and storing first and second data from the objects; acquiring third data from a public account; processing the second and third data to generate fourth and fifth data; processing the fourth and fifth data again to generate and store sixth data; and processing the sixth data and the first data to generate the knowledge graph. The first, second, third, and sixth data are continuously updated until the knowledge graph is complete.
[0057] This method integrates traditional databases, graph databases, and public account data to construct a continuously updated knowledge graph. During the construction process, erroneous or irrelevant information is filtered out, and the data accuracy is improved through three processing steps. The accurate and massive knowledge graph can provide visitors with precise, effective, and novel relevant knowledge.
[0058] This invention also provides a knowledge graph construction system based on data fusion. This system and the method share the same technical concept, and therefore this system should have the same beneficial effects as the method. It will not be described in detail here. Attached Figure Description
[0059] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0060] Figure 1 A flowchart of a knowledge graph construction method based on data fusion is provided as an embodiment of the present invention;
[0061] Figure 2 This is a flowchart of step S6 provided in an embodiment of the present invention;
[0062] Figure 3 This is a flowchart of step S7 provided in an embodiment of the present invention;
[0063] Figure 4 A schematic diagram of a knowledge graph construction system based on data fusion is provided in an embodiment of the present invention.
[0064] Figure 5 This is a schematic diagram illustrating the specific structure of a knowledge graph construction system based on data fusion, as provided in an embodiment of the present invention. Detailed Implementation
[0065] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0066] The embodiments of this invention are written in a progressive manner.
[0067] This invention provides a knowledge graph construction method based on data fusion. It primarily addresses the technical problems in existing knowledge graph construction processes, such as limited data volume, low accuracy, and outdated or distorted data.
[0068] To aid in understanding this invention, embodiments are illustrated using the construction of a military knowledge graph as an example, as detailed below:
[0069] like Figure 1 As shown, a knowledge graph construction method based on data fusion includes the following steps:
[0070] S1. Construct traditional databases, graph databases, and deep learning environments;
[0071] S2. Retrieve objects from traditional databases;
[0072] S3. Filter the objects to obtain the first data and the second data, and store them separately;
[0073] S4. Obtain third-party data based on key information from the official account;
[0074] S5. Process the second and third data to generate the fourth and fifth data;
[0075] S6. Process the fourth and fifth data to generate the sixth data and store it;
[0076] S7. Process the sixth and first data to generate a knowledge graph;
[0077] S8. Repeat steps S2 to S7 until the knowledge graph is complete.
[0078] In step S1, traditional databases, graph databases, and deep learning environments such as TensorFlow are constructed. The traditional databases and graph databases are ordinary open-source databases.
[0079] In step S2, military knowledge objects are retrieved from a traditional database;
[0080] In step S3, the first data (i.e., structured data) and the second data (i.e., unstructured data) are selected from the military knowledge objects. The first data is stored in the graph database, and the second data is stored in the traditional database.
[0081] In step S4, key information from military knowledge-related public accounts is obtained, and third data (i.e., unstructured data from public accounts) is obtained based on this key information.
[0082] In step S5, the second and third data are processed, that is, the unstructured data needs to be integrated and classified to generate the fourth and fifth data.
[0083] In step S6, the fourth and fifth data are processed respectively, that is, the unstructured data is processed to generate the sixth data (processed structured data).
[0084] In step S7, the sixth data is stored in the graph database and processed together with the first data to generate a military knowledge graph;
[0085] In step S8, steps S2 to S7 are repeated to continuously add new content to the knowledge graph, ensuring that the amount of data increases, the accuracy is high, and the data is not outdated or distorted, until the knowledge graph is perfected.
[0086] Preferably, step S2 specifically includes:
[0087] Filter traditional databases using preset keywords to obtain objects.
[0088] In practical applications, SQL is used to filter entries in traditional databases, retrieving encyclopedia entries containing keywords such as "fighter jet" and "submarine," thereby constructing potential military knowledge objects. The pre-defined "keywords" can be specific images from military knowledge familiar to the general public, such as aircraft, weapons, radar, and electronic communication equipment. The retrieved objects are publicly available texts such as articles, patents, and news containing the aforementioned "keywords."
[0089] Preferably, step S3 includes the following steps:
[0090] Filter objects according to programming language to obtain first and second data;
[0091] Store the first data in the graph database in the first format;
[0092] The second data is stored in a traditional database.
[0093] In practical application, web crawler scripts written in languages such as Python are used to scrape all content from encyclopedia entries targeting all potential military knowledge objects, thereby obtaining open-source encyclopedia data. The content of each entry is saved in a first format (JSON). In this format, the research and development unit and service date are used as attribute names in the JSON; the specific research and development unit name and service date value are used as attribute values. This constitutes the first data (structured encyclopedia data), which is directly saved to the graph database using the graph database's interface functions. In addition, the content obtained by the crawler also includes a large amount of descriptive second data (unstructured encyclopedia data), such as technical characteristics and power systems. For each entry, the text of this unstructured data is saved separately to a traditional database for subsequent processing.
[0094] Preferably, step S4 specifically includes:
[0095] Obtain key information from the official account;
[0096] We obtain third-party data by filtering WeChat official accounts based on their information and preset keywords.
[0097] In practical application, it's necessary to register a WeChat Official Account, log in to the account on a webpage, select the desired account to crawl, and click "Search" to obtain key information such as user-agent, URL, cookie, token, and fakeid. This information can then be used with a web crawler to obtain the name and URL of each article from that account. The article content can then be retrieved using the previously written crawler script via the article's URL. This third-party data (unstructured data) is then stored in a traditional database according to the article name.
[0098] Preferably, before step S5, the method further includes:
[0099] Define entity type, entity relationship, and entity attribute respectively.
[0100] Before processing the second and third data (unstructured data), entity types need to be defined, categorized hierarchically as aircraft, weapons, radar, electronic communication equipment, electronic countermeasures equipment, etc. For entity relationship extraction, relationships such as system and device, system deployment unit, and system-equipped weapons are defined. For attribute value extraction, common attribute values such as development time, Chinese name, English name, country, and development unit are defined.
[0101] Preferably, step S5 specifically includes:
[0102] Based on the defined entity types, entity relationships, and entity attributes, the second and third data are labeled to generate the fourth and fifth data.
[0103] The fourth data set is the labeled dataset, and the fifth data set is the unlabeled dataset.
[0104] In practical application, for the second and third data sets, brat is used for entity annotation, resulting in an AN file containing the position of each entity in the text and the relationship type between them. The AN file uses BIO annotation to annotate the text; that is, it generates a BIO-annotated text file from the AN file generated by brat and the original text file. When annotating relationships, each line in the AN file describes a single statement. Specifically, each line contains a word-by-word expanded array, the start and end positions of the two entities, and a textual description of the relationship between the entities.
[0105] It should be noted that BART is an annotation tool used in the knowledge graph construction process. BIO annotation refers to labeling each element as "BX", "IX", or "O". "BX" indicates that the element belongs to a segment of type X and is at the beginning of that segment; "IX" indicates that the element belongs to a segment of type X and is in the middle of that segment; and "O" indicates that it does not belong to any type.
[0106] like Figure 2 As shown, preferably, step S6 includes the following steps:
[0107] A1. Training the fourth dataset;
[0108] A2. Preprocess the fifth data;
[0109] A3. Based on the defined entity types, entity relationships, and entity attributes, extract the preprocessed fifth data from the trained fourth data to generate the sixth data and store it in the graph database.
[0110] In practical applications, the fourth data (labeled dataset) is fed into a pre-trained model in a deep learning environment for training to obtain the trained fourth data. The remaining large amount of unlabeled third and fourth data is preprocessed to remove statements that cannot be parsed or decoded, resulting in the preprocessed fifth data (unlabeled dataset). Subsequently, knowledge extraction is performed using entity extraction models, relation extraction models, and attribute extraction models.
[0111] The specific implementation is as follows: The entity extraction module consists of a first input layer, a first BERT layer, a BiLSTM layer, and a linear chain CRF layer. The BERT + BiLSTM + CRF model operates as follows: First, the text is input into the BERT pre-trained language model to obtain the corresponding word vectors. Then, the word vectors are input into the BiLSTM for further processing. Finally, the results from the BiLSTM are input into the CRF for decoding to obtain a labeled sequence. The relation extraction process is roughly as follows: First, the input to the BERT model is fine-tuned to obtain the features of entities and sentences. Then, the pre-labeled data is used for further fine-tuning to obtain the required word vectors. The word vectors are then input into a fully connected layer and then into a Softmax layer for relation classification. Finally, the sixth data (the three-dimensional data set generated after extraction) is obtained and stored in the graph database.
[0112] It's important to note that BERT stands for Bidirectional Encoder Representation from Transformers, a pre-trained language representation model. It emphasizes a departure from traditional unidirectional language models or shallow concatenation of two unidirectional models for pre-training. Instead, it employs a new masked language model (MLM) to generate deep bidirectional language representations. BiLSTM is an abbreviation for Bi-directional Long Short-Term Memory, composed of a forward LSTM and a backward LSTM. LSTM stands for Long Short-Term Memory, a type of RNN (Recurrent Neural Network). A conditional random field (CRF) is a conditional probability distribution model (i.e., a discriminative model) that assumes the output random variables form a Markov random field, given a set of input random variables.
[0113] like Figure 3 As shown, preferably, step S7 includes the following steps:
[0114] B1. Disambiguate the sixth data point;
[0115] B2. Perform data fusion on the disambiguated sixth data and the first data to generate a knowledge graph.
[0116] In practical applications, the triple data sets obtained from entity extraction and relation extraction can be directly saved into the graph database using the graph database interface functions after data disambiguation and fusion.
[0117] Data disambiguation refers to the process of disambiguating semantic / lexical meanings in data. It's important to note that semantic / lexical disambiguation is a core and challenging aspect of natural language processing tasks, impacting the performance of almost all tasks, such as search engines, opinion mining, text understanding and generation, and reasoning. Throughout the long development of linguistics, language itself has accumulated many polysemous uses. The emergence of language is the result of multiple factors. Language usage is constantly evolving; a word may have many specific meanings during its development, and some meanings are still commonly used today. Different regions may have different usages of a word, different industries may use it differently, and even different groups, individuals, and tones of voice may have their own unique interpretations. Semantic disambiguation is a method of language understanding. On the one hand, we need to understand the polysemous meanings and applications of commonly used words; on the other hand, we must consider specific scenarios and utilize relevant knowledge bases and corpora for training to improve the performance of polysemous word understanding.
[0118] The data disambiguation method used in this embodiment is a dictionary-based method that uses knowledge base and knowledge graph technology. It is supervised learning, unsupervised, semi-supervised, and based on words or word vectors.
[0119] Data fusion technology includes the collection, transmission, integration, filtering, correlation and synthesis of useful information from various information sources, so as to assist people in situation / environment judgment, planning, detection, verification and diagnosis. Data fusion technology provides an important data processing technology foundation for advanced combat management and C[3]I systems. Data fusion plays an important processing and coordination role in multi-information source, multi-platform and multi-user systems, ensuring the connectivity and timely communication between the data processing system units and the aggregation center, and enabling many functions that were originally performed by military operators and intelligence analysts to be completed automatically by the data processing system quickly, accurately and effectively.
[0120] like Figure 4 As shown, a knowledge graph construction system based on data fusion includes:
[0121] Modules for building traditional databases, graph databases, and deep learning environments;
[0122] The retrieval module is used to retrieve objects from a traditional database.
[0123] The acquisition module is also used to obtain third-party data based on key information from the official account;
[0124] The filtering module is used to filter objects to obtain first data and second data.
[0125] The storage module is used to store the first data and the second data respectively;
[0126] The storage module is also used to store sixth data;
[0127] The processing module is used to process the second and third data to generate the fourth and fifth data;
[0128] The processing module is also used to process the fourth and fifth data to generate the sixth data;
[0129] The processing module is also used to process the sixth and first data to generate a knowledge graph.
[0130] In practical applications, the knowledge graph construction system based on data fusion includes a construction module, an acquisition module, a filtering module, a storage module, and a processing module. During operation, the construction module constructs traditional databases, graph databases, and a deep learning environment; the acquisition module retrieves objects from the traditional database; the acquisition module retrieves third data from key information in public accounts; the filtering module filters objects to obtain first and second data; the storage module stores the first and second data respectively; the storage module also stores a sixth data; the processing module calls and processes the second and third data to generate fourth and fifth data; the processing module further processes the fourth and fifth data to generate the sixth data; and the processing module performs a third processing on the sixth and first data to generate the knowledge graph.
[0131] like Figure 5 As shown, preferably, it also includes:
[0132] The definition module is used to define entity types, entity relationships, and entity attributes;
[0133] The processing module includes: annotation submodule, training submodule, preprocessing submodule, extraction module, disambiguation submodule, and data fusion submodule;
[0134] The annotation submodule is used to annotate the second and third data.
[0135] The training submodule is used to train the fourth data;
[0136] The preprocessing submodule is used to preprocess the fifth data.
[0137] The extraction submodule is used to extract the trained fourth data and the preprocessed fifth data;
[0138] The disambiguation submodule is used to disambiguate the sixth data.
[0139] The data fusion submodule is used to fuse the disambiguated sixth data and the first data to generate a knowledge graph.
[0140] In practical applications, the knowledge graph construction system based on data fusion also includes a definition module, which defines entity types, entity relationships, and entity attributes. The processing module includes annotation, training, preprocessing, extraction, disambiguation, and data fusion submodules. These submodules respectively implement the steps of annotation, training, preprocessing, extraction, disambiguation, and data fusion, ultimately generating a knowledge graph.
[0141] In the embodiments provided in this application, it should be understood that the disclosed methods and apparatus can be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods, such as: multiple modules or components can be combined, or integrated into another system, or some features can be ignored or not executed. In addition, the coupling, direct coupling, or communication connection between the various components shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or modules, and can be electrical, mechanical, or other forms.
[0142] Furthermore, in the various embodiments of the present invention, each functional module can be fully integrated into a processor, or each module can be a separate device, or two or more modules can be integrated into a device; each functional module in the various embodiments of the present invention can be implemented in hardware or in the form of hardware plus software functional units.
[0143] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by program instructions and related hardware. The aforementioned program instructions can be stored in a computer-readable storage medium. When the program instructions are executed, they perform the steps of the above method embodiments. The aforementioned storage medium includes various media that can store program code, such as mobile storage devices, read-only memory (ROM), magnetic disks, or optical disks.
[0144] It should be understood that the use of terms such as "system," "device," "unit," and / or "module" in this application is merely one method of distinguishing different components, elements, parts, sections, or assemblies at different levels. However, if other terms can achieve the same purpose, they may be replaced by other expressions.
[0145] As indicated in this application and claims, unless the context clearly indicates otherwise, the words "a," "an," "a," and / or "the" are not specifically singular and may include the plural. Generally, the terms "comprising" and "including" only indicate the inclusion of expressly identified steps and elements, which do not constitute an exclusive list, and the method or apparatus may also include other steps or elements. An element defined by the phrase "comprising an..." does not exclude the presence of other identical elements in the process, method, product, or apparatus that includes the element.
[0146] Hereinafter, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
[0147] If a flowchart is used in this application, it is used to illustrate the operations performed by the system according to embodiments of this application. It should be understood that the preceding or following operations are not necessarily performed in exact order. Instead, the steps can be processed in reverse order or simultaneously. Furthermore, other operations can be added to these processes, or one or more steps can be removed from them.
[0148] The foregoing has provided a detailed description of a knowledge graph construction method based on data fusion provided by the present invention. The above description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A knowledge graph construction method based on data fusion, characterized in that, Includes the following steps: S1. Construct traditional databases, graph databases, and deep learning environments; S2. Obtain the object from the traditional database; S3. Filter the objects to obtain first data and second data, and store them respectively; S4. Obtain third-party data based on key information from the official account; S5. Process the second data and the third data to generate the fourth data and the fifth data; S6. Process the fourth data and the fifth data to generate and store the sixth data; S7. Process the sixth data and the first data to generate a knowledge graph; S8. Repeat steps S2 to S7 until the knowledge graph is complete; Step S2 specifically involves: The traditional database is filtered according to preset keywords to obtain the object; Before step S5, the following is also included: Define entity types, entity relationships, and entity attributes respectively; Step S5 specifically involves: Based on the defined entity type, entity relationship, and entity attribute, the second data and the third data are labeled to generate the fourth data and the fifth data; The fourth data is a labeled dataset, and the fifth data is an unlabeled dataset; Step S6 includes the following steps: Training the fourth data; Preprocess the fifth data; Based on the defined entity type, entity relationship, and entity attribute, the trained fourth data and the preprocessed fifth data are extracted to generate the sixth data and stored in the graph database; Step S7 includes the following steps: Disambiguation is performed on the sixth data; The sixth data after disambiguation and the first data are fused to generate a knowledge graph. 2.The data fusion based knowledge graph construction method of claim 1, wherein, Step S3 includes the following steps: The objects are filtered according to the programming language to obtain the first data and the second data; The first data is stored in the graph database in a first format; The second data is stored in the traditional database.
3. The knowledge graph construction method based on data fusion as described in claim 1, characterized in that, Step S4 specifically involves: Obtain key information from the official account; The third data is obtained by filtering each public account based on the public account information and the preset keywords.
4. A knowledge graph construction system based on data fusion, characterized in that, include: Modules for building traditional databases, graph databases, and deep learning environments; The acquisition module is used to acquire objects from the traditional database; specifically, it is used to filter the traditional database according to preset keywords to acquire the objects. The acquisition module is also used to obtain third-party data based on key information from the official account; A filtering module is used to filter the objects to obtain first data and second data; A storage module is used to store the first data and the second data respectively; The storage module is also used to store sixth data; A processing module is used to process the second data and the third data to generate the fourth data and the fifth data; The processing module is also used to process the fourth data and the fifth data to generate the sixth data; The processing module is also used to process the sixth data and the first data to generate a knowledge graph; Also includes: The definition module is used to define entity types, entity relationships, and entity attributes respectively; The processing module includes: a labeling submodule, a training submodule, a preprocessing submodule, an extraction submodule, a disambiguation submodule, and a data fusion submodule; The annotation submodule is used to annotate the second data and the third data according to the defined entity type, entity relationship and entity attribute to generate the fourth data and the fifth data; wherein, the fourth data is an annotated dataset and the fifth data is an unannotated dataset; The training submodule is used to train the fourth data; The preprocessing submodule is used to preprocess the fifth data; The extraction submodule is used to extract the trained fourth data and the preprocessed fifth data according to the defined entity type, entity relationship and entity attribute to generate the sixth data and store it in the graph database; The disambiguation submodule is used to disambiguate the sixth data; The data fusion submodule is used to fuse the disambiguated sixth data and the first data to generate a knowledge graph.