Method for extracting feature word of text

A subject heading and text technology, applied in the field of extracting text subject headings, can solve the problems of high probability of subject headings, incomplete subject headings, and impossible to extract inscriptions, so as to achieve the effect of improving comprehensiveness.

Active Publication Date: 2009-06-24
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 107 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, in actual network applications, there are often many new words that are not included in the existing language databases, such as "Bawangmian", "Bei Piao Family", "Nian Lao", etc., and these new words are used as the theme of the text. word probability is high
[0006] It can be seen that once the words that can express the subject of the text to be processed (hereinafter referred to as the subject words of the text to be processed) are not in the existing language database, since the subject words cannot be separated from the text to be processed according to the existing language database, therefore, It is also impossible to extract the inscription from the text to be processed, resulting in incomplete subject headings

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting feature word of text
  • Method for extracting feature word of text
  • Method for extracting feature word of text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples.

[0017] figure 1 It is a flow chart of the method for extracting text keywords provided by the present invention.

[0018] Wherein, steps 101-102 are used to extract text to be processed, and perform word segmentation processing and part-of-speech tagging on the text to be processed, and steps 101-102 can be implemented by using existing technologies.

[0019] Steps 103-104 are used to discover new words from the text to be processed.

[0020] Step 105 is used to extract text subject words from existing words and new words included in the text to be processed. Step 105 can be implemented by using the prior art scheme of using the words whose frequency of occurrence is within the first predetermined range as the subject words, or by using the subject wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a method for extracting subject headings of a text. The method comprises the following steps: a text to be processed is divided into combination sequences of the existing words; for each text to be processed, candidate character strings with a frequency of occurrence greater than a preset frequency in the text to be processed are found and extracted, and new words are filtered from the candidate character strings according to the lexicalization probability of the prefixes and/or suffixes of the candidate character strings; and subject headings of the text to be processed are extracted from the existing words and the new words according to the frequency of occurrences of the existing words and the new words. The invention ensures that the comprehensiveness of extracting subject headings from the text to be processed is improved.

Description

technical field [0001] The invention relates to the technical field of Internet information processing, in particular to a method for extracting text keywords. Background technique [0002] Extracting text keywords is a technical problem that many network applications need to face. For example, in a content-based online advertising application, it is necessary to extract the keywords of the web page content currently browsed by the user, and then send advertisements related to the keywords to the user; The subject words of each text, and then build the index of each text according to the subject words extracted from each text, so as to improve the retrieval efficiency; in content-based text classification, it is also necessary to extract the subject words that can reflect the text content, and then according to the extracted subject words for text classification. [0003] At present, the general method of extracting text keywords is: firstly, establish a large-scale corpus...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 方高林郑全战
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products