Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-text comparison method based on keyword extraction

A keyword and text technology, applied in the field of multi-text comparison based on keyword extraction, can solve the problem of not taking into account the similarities and differences of multi-text comparison.

Active Publication Date: 2014-04-23
北京优捷信达信息科技有限公司
View PDF8 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

CN103399901A is based on the frequency of vocabulary occurrence and the co-occurrence of word pairs, and CN101196904 is based on word frequency and part-of-speech patterns. These two methods extract keywords for a single text, and cannot take into account the comparison of similarities and differences between multiple texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-text comparison method based on keyword extraction
  • Multi-text comparison method based on keyword extraction
  • Multi-text comparison method based on keyword extraction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The present invention will be further described below in conjunction with the accompanying drawings and preferred embodiments.

[0047] see figure 1 , the text comparison method based on keyword extraction proposed by the technical solution of the present invention, the method is realized by two main processes of keyword extraction and text comparison.

[0048] 1. The keyword extraction process includes the following steps:

[0049] 1.1 Part-of-speech tagging and word segmentation. Treat each sentence in natural language as a hidden Markov chain - this can be found using the Viterbi algorithm. The most probable part-of-speech tag sequence for the observed data. This process implies word segmentation. For example, "Tiananmen" will be marked as "NSB-NSM=NSE", which means "beginning of place name", "middle of place name" and "end of place name".

[0050]1.2 Custom thesaurus and inseparable words. The hidden Markov chain model in the keyword extraction process supports...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-text comparison method based on keyword extraction. The multi-text comparison method includes: A, adopting each sentence in natural language as a hidden Markov chain, and figuring out speech tagging parts and segmentations; B, adding a custom thesaurus in a hidden Markov chain model, setting words in the custom thesaurus as strong correlations, and preferentially combining words in the custom thesaurus as one during speech sequence part tagging; C, filtering segmentation results according to a given part-of-speech list, and removing stop words; D, performing multi-text comparison according to final speech tagging parts and segmentations. By the aid of the method, keyword extraction of one single text is completed, and a feasible scheme is provided for rapid multi-text comparison; according to texts with similar themes and different aspects, common themes and aspects of each text can be recognized by the method.

Description

technical field [0001] The invention relates to the field of text recognition, and more specifically, relates to a multi-text comparison method based on keyword extraction. Background technique [0002] A common technique for counting and analyzing large amounts of text information is keyword extraction. Usually, limited by manpower and time resources, people cannot read massive text libraries verbatim. The goal of keyword extraction technology is to find out the words in the text that best reflect the gist of the text, so as to facilitate quick browsing and selection of information. [0003] Patent document CN101216825 discloses a method for predicting indexing keywords of a target web page, the method comprising: obtaining a training data set, and training a decision tree according to the acquired training data set; using the trained decision tree to generate indexing keywords Filter: use the trained decision tree and the generated filter to predict the indexing keywords...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 陈里波胡子扬祁点点
Owner 北京优捷信达信息科技有限公司