Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for constructing knowledge graph, electronic equipment and storage medium

A knowledge graph and memory technology, applied in the creation of semantic tools, electronic digital data processing, unstructured text data retrieval, etc., can solve the problems of low efficiency, high design cost, poor versatility, etc., and achieve low construction cost and versatility. Strong and efficient effect

Pending Publication Date: 2021-09-28
ZHUHAI KINGSOFT OFFICE SOFTWARE +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] However, it is very costly for domain experts to design, and the basic attributes of nodes / edges in the knowledge graph designed by demanders and domain experts can only be applied to a certain scene or a certain field, and cannot be used if another scene / field is changed The original design scheme needs to be redesigned, resulting in poor versatility and low efficiency of the above method of generating knowledge graphs; in addition, the cost of manually labeling data is very high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for constructing knowledge graph, electronic equipment and storage medium
  • Method and device for constructing knowledge graph, electronic equipment and storage medium
  • Method and device for constructing knowledge graph, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0094] In this embodiment, the need to construct a knowledge graph in the biological field is taken as an example for illustration. In this embodiment, documents such as papers and journal articles in the biological field are collected as the first corpus; training data of the word vector model is used as the second corpus. Wherein, some documents in the first and second corpus may be the same.

[0095] In this embodiment, the process of knowledge map construction is as follows image 3 As shown, the following steps S210-S230 are included:

[0096] S210. Mining out new words according to the first corpus as professional vocabulary in the biological field, and storing them in the first thesaurus.

[0097]S220. Use the second corpus to train the Word2Vec model to obtain word vectors for each specialized vocabulary in the first thesaurus.

[0098] S230. Establish a knowledge map in the biological field according to all professional words and their word vectors in the first the...

Embodiment 2

[0100] This embodiment specifically illustrates the process of mining professional vocabulary according to the first corpus, such as Figure 4 As shown, including steps 401-406:

[0101] 401. Load an existing second thesaurus; the loaded second thesaurus may only include general thesaurus, or may include existing specialized thesaurus in the biological field in addition to the general thesaurus.

[0102] 402. Perform word segmentation on the first corpus to obtain a first word sequence, and use every two adjacent words in the first word sequence as a candidate word.

[0103] 403. Calculate the left information entropy, right information entropy and mutual information value for each candidate word.

[0104] 404. For each candidate word, sum the left information entropy, right information entropy and mutual information value of the candidate word, and use the obtained result as the score of the candidate word;

[0105] 405. Sort all the candidate words according to the descend...

Embodiment 3

[0111] This embodiment specifies the process of utilizing the second corpus to train the Word2Vec model, such as Figure 5 As shown, the following steps 501-504 are included:

[0112] 501. Acquire the first thesaurus and the second thesaurus.

[0113] 502. Load the first thesaurus and the second thesaurus.

[0114] 503. Carry out word segmentation to the second corpus to obtain the second word sequence after word segmentation; use jieba (stutter) word segmentation to obtain the second word sequence in this embodiment; other word segmentation tools or algorithms can be used, and this application does not carry out limit.

[0115] 504. Use the second word sequence to train the Word2Vec model to obtain a word vector of each professional word in the first thesaurus.

[0116] In this embodiment, the Word2Vec model with CBOW structure is used for training. In the Word2Vec model of the CBOW structure, each neuron in the input layer inputs different words in the context of the cen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and device for constructing a knowledge graph, electronic equipment and a computer storage medium. The method for constructing the knowledge graph comprises the following steps: determining nodes of the knowledge graph according to first vocabularies contained in a first word library; obtaining word vectors of the first vocabularies based on a word vector model, with the word vector model being obtained through pre-training in an unsupervised training mode; and according to the relationship between the word vectors of the first vocabularies, determining nodes for establishing edges in the knowledge graph, and establishing edges for the determined nodes. According to the method and the device, the knowledge graph can be automatically constructed without manual participation, the efficiency is high, the universality is high, and the construction cost is low.

Description

technical field [0001] The present application relates to the field of computer information, in particular to a method, device, electronic equipment and storage medium for constructing a knowledge graph. Background technique [0002] At present, more and more knowledge graphs are used in intelligent question answering systems, search recommendation and other fields. The knowledge map can be regarded as a semantic network graph used to describe various concepts in the real world and the connections or relationships between different concepts; the knowledge map includes multiple nodes and edges between nodes, and the nodes are used for Represents a concept / entity in the real world, and the edges between nodes are used to represent the relationship between corresponding concepts / entities. [0003] In related technologies, the process of generating a knowledge map includes the following steps 101-104: [0004] 101. The basic attributes of nodes and edges in the knowledge graph...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/332G06F40/284G06F40/289
CPCG06F16/367G06F16/3329G06F40/284G06F40/289
Inventor 潘云嵩
Owner ZHUHAI KINGSOFT OFFICE SOFTWARE