Method and device for extracting subject term from simple sentence

An extraction method and a technology of subject words, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as insufficient stability, inaccurate results, and distinction, so as to improve accuracy and efficiency and overcome low accuracy , to overcome the effect of poor stability

Active Publication Date: 2011-05-11
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the dependency analysis method has two disadvantages: one is that the technology is not stable enough in the case of complex realistic sentences, and often fails to achieve the expected results; Sometimes it's a modifier, sometimes it's a modified word
However, when

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting subject term from simple sentence
  • Method and device for extracting subject term from simple sentence
  • Method and device for extracting subject term from simple sentence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to make the object, technical solution and advantages of the present invention clearer, the implementation manner of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0054] see figure 1, the embodiment of the present invention provides a single-sentence subject heading extraction method, including:

[0055] 101: Count multiple keywords and multiple combinations in the corpus;

[0056] 102: Calculate the feature value of each keyword in the plurality of keywords;

[0057] 103: Determine the sequence of each multiple combination in the plurality of multiple combinations according to the frequency of occurrence of the multiple combinations in the corpus;

[0058] 104: Use each keyword in the single sentence as the current keyword respectively, extract a multiple combination containing the current keyword from the multiple multiple combinations obtained in 101, and calculate the level of the current ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for extracting the subject term from a simple sentence, which belongs to the technical field of subject term extraction. The method comprises the following steps: counting a plurality of key words and a plurality of multielement combinations in a language material, working out the characteristic value of each key word, and determining the sequence of each multielement combination; taking each key word of a simple sentence as the current key word respectively, extracting the multielement combination containing the current key word, calculating the grade of the current key word according to the sequence of the extracted multielement combination, and working out the metric of the current key word according to the characteristic value and the grade of the current key word; and picking out the subject term of the simple sentence from all key words according to the metrics after the metrics of all the key words of the single sentence are obtained. The device comprises a counting module, a characteristic value calculation module, a sequence calculation module, an extraction module, a grade calculation module, a metric calculation module and a subject term selection module. The invention has the advantage that the text message of a simple sentence is fully utilized, so that the accuracy and the efficiency of extracting the subject term from a simple sentence can be improved.

Description

technical field [0001] The present invention relates to the technical field of subject heading extraction, in particular to a single-sentence subject heading extraction method and device. Background technique [0002] Corpus refers to language material, that is, text information. Usually, with the objective and detailed language evidence provided by large-scale corpora, it is possible to engage in linguistic research and guide the development of natural language information processing systems. A single sentence refers to a sentence composed of phrases or single words. Key words refer to the key words in the text that represent the characteristics of its content, can best explain the problem, and play a key role. Existing keyword extraction usually adopts the following two methods: dependency analysis method and TFIDF (Term Frequency Inverse Document Frequency, term frequency inverse document frequency) method. [0003] The dependency analysis method is a method for analyz...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 姜中博刘怀军方高林
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products