Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for generating Chinese comparative sentence sorter model and identifying Chinese comparative sentences

A technology of model generation and classifier, which is applied in the direction of instruments, calculations, and electrical digital data processing, etc., can solve the problems of not considering word positions and dependencies, poor results, etc., and achieve the effect of improving the recognition effect

Active Publication Date: 2010-06-09
PEKING UNIV +2
View PDF0 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional text classifiers represent text as a collection of words, regardless of the position and dependencies between words, and are not effective when applied to the problem of dividing sentences into "comparative" and "non-comparative".

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for generating Chinese comparative sentence sorter model and identifying Chinese comparative sentences
  • Method and device for generating Chinese comparative sentence sorter model and identifying Chinese comparative sentences
  • Method and device for generating Chinese comparative sentence sorter model and identifying Chinese comparative sentences

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] An embodiment of the present invention provides a method for generating a Chinese comparative sentence classifier model. By processing a data set containing several marked Chinese sentences, a comparison pattern set and a Chinese comparative sentence classifier model are obtained. Its flow chart is as figure 1 As shown, its execution steps are as follows:

[0036] Step S10: Read in a sentence from the data set in sequence.

[0037] All sentences in the dataset have been manually annotated with their categories.

[0038] Step S11: Using automatic word segmentation and part-of-speech tagging techniques / methods, segment the read sentence into several words, and add a part-of-speech tag to each word.

[0039] For example: using the existing Chinese word segmentation software, it is possible to divide each sentence into several words, and add a part-of-speech identifier for each word.

[0040] Taking the sentence "INTEL has a price advantage over AMD" as an example, the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for generating a Chinese comparative sentence sorter model and identifying Chinese comparative sentences. The method comprises the following steps: converting each clause containing set comparative keywords in each sentence of a data set into a sequence, and establishing a category tag as the same as the sentence of a corresponding clause for the sequence; obtaining a sequence set; mining a plurality of comparative modes from the sequence set by a sequence mode mining algorithm, and forming a comparative mode set; matching each comparative mode in the comparative mode set with each sequence one by one, and according to the matching result and the total number of the comparative modes, obtaining a group of characteristic vectors corresponding to each sequence; generating a sorter model according to the characteristic vectors and the corresponding category tag of the sequence; and then identifying a read-in sentence with unknown category through the obtained comparative mode set and the sorter model, and determining whether the sentence is a comparative sentence. By automatically learning the mode characteristics of the comparative sentences, the sorter model is generated and can automatically and effectively identify the comparative sentences in a text.

Description

technical field [0001] The invention relates to the technical field of intelligent information processing, in particular to a method and device for generating a Chinese comparative sentence classifier model and automatically identifying Chinese comparative sentences. Background technique [0002] With the rapid development of the Internet, Chinese information has also shown explosive growth. Among them, a lot of information involves the comparison of various things, such as comparison and recommendation of similar products. Automatically identifying such comparative information is of great practical value. By automatically detecting comparative sentences in articles, it provides a prerequisite for accurately extracting compared entities and the relationship between entities. [0003] The study of comparative sentences in the field of traditional Chinese linguistics started earlier. Regarding the definition of comparative sentences, Ma Jianzhong pointed out that "the same ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
Inventor 黄小江万小军杨建武肖建国
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products