Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Topic semantic perception-based feature keyword extraction method

An extraction method and keyword technology, applied in semantic analysis, special data processing applications, natural language data processing, etc.

Active Publication Date: 2020-12-18
NANJING UNIV OF POSTS & TELECOMM
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing technology does not combine the LDA topic model and information gain to complete the keyword extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic semantic perception-based feature keyword extraction method
  • Topic semantic perception-based feature keyword extraction method
  • Topic semantic perception-based feature keyword extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038]Embodiments of the present invention will be disclosed in the following diagrams. For the sake of clarity, many practical details will be described together in the following description. It should be understood, however, that these practical details should not be used to limit the invention. That is, in some embodiments of the invention, these practical details are not necessary.

[0039] For the convenience of description, the relevant symbols are defined as follows:

[0040] document set D = {d 1 , d 2 ,...,d n}, the words contained in each document in D form a keyword set W={w 1 ,w 2 ,...,w u}, the topic set used in the LDA model is T={t 1 ,t 2 ,...,t m}. IG(w i ,T) is the keyword w i The information gain score under T, TI(w i , d j ) is w i and for each document d in D j TF-ITF score between, TR(w i , d j ) is w i and d j Topic correlation score between FW IG is the set of global information gain feature keywords, FW TR is the feature keyword se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a topic semantic perception-based feature keyword extraction method, which specifically comprises the following steps of: firstly, calculating an information gain score of eachkeyword in a keyword set by utilizing a quantitative method of semantic relevancy between the keyword and a document topic; calculating a topic frequency inverse topic frequency (TF-ITF) score of each keyword in the document; then, selecting the first k keywords with the maximum information gain score to form an information gain feature keyword set; selecting for each document, the first lambda keywords with the highest topic relevancy score in the document, and then forming a global topic information feature keyword set; and finally, combining the global information gain feature keyword setand the global theme information feature keyword set to generate a final feature keyword set. According to the invention the topic semantic relations between the keywords and between the keywords andthe documents are comprehensively considered, and the feature keywords representing the topic semantic information of the documents are extracted.

Description

technical field [0001] The invention belongs to the fields of natural language processing and text mining, and in particular relates to a method for extracting feature keywords based on topic semantic perception. Background technique [0002] With the advent of the era of big data and the explosive growth of information, people are exposed to more and more document data. In the face of huge and complicated data, it is particularly important to quickly and accurately retrieve data and dig out useful information from it. Features are the key Word extraction is an effective method, which plays an important role in the utilization of document data. For example, in information retrieval scenarios, accurate keyword extraction can greatly improve retrieval efficiency, and feature keyword extraction is to capture the most representative documents. Feature words of subject and content, feature keyword extraction, as a key technology in the field of natural language processing and tex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/31G06F40/279G06F40/30
CPCG06F16/313G06F40/279G06F40/30
Inventor 戴华姜莹莹戴雪龙周倩杨庚黄海平
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products