Method and device for constructing knowledge graph, electronic equipment and storage medium
A knowledge graph and memory technology, applied in the creation of semantic tools, electronic digital data processing, unstructured text data retrieval, etc., can solve the problems of low efficiency, high design cost, poor versatility, etc., and achieve low construction cost and versatility. Strong and efficient effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0094] In this embodiment, the need to construct a knowledge graph in the biological field is taken as an example for illustration. In this embodiment, documents such as papers and journal articles in the biological field are collected as the first corpus; training data of the word vector model is used as the second corpus. Wherein, some documents in the first and second corpus may be the same.
[0095] In this embodiment, the process of knowledge map construction is as follows image 3 As shown, the following steps S210-S230 are included:
[0096] S210. Mining out new words according to the first corpus as professional vocabulary in the biological field, and storing them in the first thesaurus.
[0097]S220. Use the second corpus to train the Word2Vec model to obtain word vectors for each specialized vocabulary in the first thesaurus.
[0098] S230. Establish a knowledge map in the biological field according to all professional words and their word vectors in the first the...
Embodiment 2
[0100] This embodiment specifically illustrates the process of mining professional vocabulary according to the first corpus, such as Figure 4 As shown, including steps 401-406:
[0101] 401. Load an existing second thesaurus; the loaded second thesaurus may only include general thesaurus, or may include existing specialized thesaurus in the biological field in addition to the general thesaurus.
[0102] 402. Perform word segmentation on the first corpus to obtain a first word sequence, and use every two adjacent words in the first word sequence as a candidate word.
[0103] 403. Calculate the left information entropy, right information entropy and mutual information value for each candidate word.
[0104] 404. For each candidate word, sum the left information entropy, right information entropy and mutual information value of the candidate word, and use the obtained result as the score of the candidate word;
[0105] 405. Sort all the candidate words according to the descend...
Embodiment 3
[0111] This embodiment specifies the process of utilizing the second corpus to train the Word2Vec model, such as Figure 5 As shown, the following steps 501-504 are included:
[0112] 501. Acquire the first thesaurus and the second thesaurus.
[0113] 502. Load the first thesaurus and the second thesaurus.
[0114] 503. Carry out word segmentation to the second corpus to obtain the second word sequence after word segmentation; use jieba (stutter) word segmentation to obtain the second word sequence in this embodiment; other word segmentation tools or algorithms can be used, and this application does not carry out limit.
[0115] 504. Use the second word sequence to train the Word2Vec model to obtain a word vector of each professional word in the first thesaurus.
[0116] In this embodiment, the Word2Vec model with CBOW structure is used for training. In the Word2Vec model of the CBOW structure, each neuron in the input layer inputs different words in the context of the cen...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


