Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Target keyword extraction system

A keyword and keyword library technology, applied in the computer field, can solve problems such as the inability to guarantee accuracy, achieve the effects of reducing the amount of calculation, widely using value, and improving efficiency and accuracy

Active Publication Date: 2021-12-07
BEIJING YUCHEN SHIMEI SCI & TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But there are at least the following disadvantages: some words that are not keywords but have a large number of occurrences may appear in the document, such as "的", or prepositions in English. Even if some words are removed, it is impossible to determine the target keyword based on word frequency alone. guaranteed accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Target keyword extraction system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 1

[0047] The step S5 may specifically include: directly sorting the second candidate keywords in the second candidate keyword set according to the distance from the center point from near to far, and determining the first M second candidate keywords as target keywords word.

Embodiment approach 2

[0049] The step S5 may specifically include: acquiring the word frequency of each second candidate keyword in the document to be processed in the second candidate keyword set, and determining the second candidate keywords whose word frequencies are in the top M as target keywords.

[0050] It should be noted that in step S5, the target keyword is further determined through word frequency based on the second candidate keyword set. On the one hand, the second keywords are already keywords in the professional field and have a certain degree of accuracy; on the other hand, Compared with counting the word frequency of all word segments in the prior art, performing word frequency statistics only based on the second candidate keyword set can greatly reduce the calculation amount of target keyword extraction, and can improve accuracy.

Embodiment approach 3

[0052] Vocabulary in some professional fields may occupy an important position, but often the corresponding word frequency is not too high. Therefore, it can be adjusted by further setting the weight to improve the accuracy of keyword extraction results. On the basis of the second embodiment, the system also Including the keyword weight configuration list, the weight of each keyword in the keyword bank is configured, and the step S5 includes:

[0053] Step S51, acquiring the word frequency in the document to be processed of each second candidate keyword in the second candidate keyword set;

[0054] Specifically, the TF-IDF algorithm may be used to obtain the word frequency of each second candidate keyword in the document to be processed in the second candidate keyword set. The TF-IDF algorithm is an existing algorithm, and will not be repeated here.

[0055] Step S52, multiplying the word frequency of each second candidate keyword in the document to be processed by the weight...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a target keyword extraction system, which is implemented as follows: S1, acquiring a to-be-processed document, extracting a first candidate keyword from the to-be-processed document based on a keyword library, and constructing a first candidate keyword set; S2, converting each candidate keyword into a corresponding first candidate word vector, and constructing a first candidate word vector set; S3, clustering all first candidate word vectors in the first candidate word vector set to obtain N first candidate word vector subsets, and obtaining a first candidate keyword subset corresponding to each first candidate word vector subset based on the first candidate keyword set; S4, obtaining the average character number of all the first candidate keywords in each first candidate keyword subset, and determining the first candidate keyword subset with the maximum average character number as a second candidate keyword set; and S5, determining a target keyword from the second candidate keyword set. According to the invention, the accuracy of target keyword extraction is improved.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a target keyword extraction system. Background technique [0002] In the prior art, for extracting target keywords of a document, the document is usually segmented into words, and then the words with the most occurrence times are used as the target keywords by means of counting word frequency or the like. But there are at least the following disadvantages: some words that are not keywords but have a large number of occurrences may appear in the document, such as "的", or prepositions in English. Even if some words are removed, it is impossible to determine the target keyword based on word frequency alone. Guaranteed accuracy. Especially for documents in professional fields, some common words in non-professional fields may be obtained based on word frequency extraction, rather than target keywords. It can be seen that how to improve the accuracy of target keyword extraction has...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284G06F40/216
CPCG06F40/284G06F40/216
Inventor 刘羽傅晓航林方刘宸
Owner BEIJING YUCHEN SHIMEI SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products