Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Keyword joint type generation method and system based on semantics and knowledge graph

A knowledge graph and keyword technology, which is applied in the field of combined keyword generation method and system based on semantics and knowledge graph, can solve the problem of inability to handle keyword extraction of headline party articles, difficulty in controlling cost and labeling time, accuracy/coverage Degree/diversity loss etc.

Pending Publication Date: 2020-04-24
过群
View PDF8 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Through the analysis of the above-mentioned published patent documents, it is found that there are the following disadvantages: when selecting candidate keywords, relying on the "preset vocabulary" requires additional manual workload; the above method is highly dependent on the semantic similarity between the text and the title Assume that it is impossible to deal with the keyword extraction problem of headline party articles and untitled articles; the above method uses the attention weight method when calculating the semantic relevance, and establishes the relationship between candidate words and each character in the title, lacking the overall and The understanding of topic-level semantics may lead to insufficient coverage of the extracted keywords; the above-mentioned method is a supervised method (sequence annotation method), and a supervised corpus is constructed through the semantic similarity between the title and the document, which reduces the workload of annotation. But it will also bring loss in accuracy / coverage / diversity; and the process is complicated, the LSTM neural network used is difficult to parallelize the calculation, the prediction speed is slow, and it cannot solve the problem of easy extraction of high-frequency words and redundant keywords
[0007] Through the analysis of the above-mentioned published patent documents, it is found that there are the following disadvantages: the semantic model adopted relies on manual labeling, and the amount of labeling required is relatively high, and the cost and labeling time are difficult to control; the above method is based on the graph model, and the time The complexity is high and the calculation speed is slow; the paragraph / position information based on the above method depends on the document structure, and the generality of keyword extraction for short texts is not good; the above method is based on word frequency and inverse document probability information, and cannot solve the Problem; and there are a large number of manual summary features, insufficient scalability, only suitable for travel documents, insufficient versatility

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword joint type generation method and system based on semantics and knowledge graph
  • Keyword joint type generation method and system based on semantics and knowledge graph
  • Keyword joint type generation method and system based on semantics and knowledge graph

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0127] Such as Figure 1-7 As shown, a joint keyword generation method based on semantic and knowledge graphs, including preparation phase, training phase 1, training phase 2 and inference phase;

[0128] The steps included in the preparatory phase are:

[0129] S11: According to the characteristics of the applicable field, artificially construct the corresponding knowledge map M (such as tree, undirected graph, directed graph) structure to represent the structural relationship of the required keywords (such as inclusion, causality);

[0130] S12: Prepare domain-related unlabeled Chinese corpus of more than 1 million sentences, which are required to contain all the words in the label map M, and the number of occurrences of each word is not less than 50;

[0131] S13: Install sentence vector toolkit sent2vec;

[0132] The steps included in the training phase 1 are:

[0133] S21: clean the prepared corpus, remove special characters, programming language (such as html statemen...

Embodiment 2

[0188] Such as Figure 1-7 As shown, a keyword joint generation system based on semantic and knowledge graph, including preparation unit, training unit 1, training unit 2 and inference unit;

[0189] The preparation unit is used to install the sentence vector toolkit, and the steps included are:

[0190] According to the characteristics of the applicable field, artificially construct the corresponding knowledge map M (such as tree, undirected graph, directed graph) structure to represent the structural relationship of the required keywords (such as inclusion, causality);

[0191] Prepare more than 1 million sentences of unlabeled Chinese corpus related to the field, requiring all the words in the label map M, and the number of occurrences of each word is not less than 50;

[0192] Install the sentence vector toolkit sent2vec;

[0193] The training unit 1 is used to train the sent2vec model, and the steps included are:

[0194] Clean the prepared corpus, remove special chara...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a keyword joint generation method and system based on semantics and a knowledge graph. The keyword joint generation method comprises a preparation stage, a training stage 1, atraining stage 2 and an inference stage, wherein the preparation stage comprises the following steps: according to the characteristics of an application field, manually constructing a corresponding knowledge graph M (such as a tree, an undirected graph and a directed graph) structure to represent the structural relationship (such as inclusion and causality) of required keywords; preparing unlabeled Chinese corpora related to the field; installing a sentence vector tool kit (such as python3.5 and sent2vec). The method is high in accuracy, high in semantic relevancy, high in semantic divergenceand wide in coverage range, and the complementarity of the keyword extraction method, the keyword distribution method and the keyword generation method is better.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence, and in particular relates to a method and system for jointly generating keywords based on semantic and knowledge graphs. Background technique [0002] With the explosive growth of Internet text data, it is often necessary to extract keywords that can summarize the core ideas of articles in business, so as to achieve accurate recommendation and improve reading efficiency. This type of business has the characteristics of strong standard subjectivity and difficulty in obtaining available annotation corpus, which leads to low accuracy of traditional methods and high calculation time for some methods. In related technologies, keyword extraction can be realized through two methods: keyword extraction (for words that have appeared in the text) and keyword generation (for words that have not appeared in the text). The main methods of keyword extraction are: statistics-based methods, grap...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/332G06F16/35G06F16/36G06F40/30
CPCG06F16/3329G06F16/3344G06F16/35G06F16/367Y02D10/00
Inventor 过群朱郑州朱若飞
Owner 过群
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products