Method and device for classifying text and structuring text classifier by adopting characteristic expansion

A classifier and construction technology, applied in the direction of instruments, character and pattern recognition, special data processing applications, etc., to achieve the effect of improving classifier performance, good recognition ability, and good classification ability

Inactive Publication Date: 2010-08-04
CHONGQING UNIV OF POSTS & TELECOMM +1
View PDF1 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical solution of the present invention to solve the above-mentioned technical problems is to use the training corpus and artificially constructed resources (such as HowNet, etc.) to dig out useful information such as feature combinations with specific relationships, and form a feature extension mode for expanding short texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for classifying text and structuring text classifier by adopting characteristic expansion
  • Method and device for classifying text and structuring text classifier by adopting characteristic expansion
  • Method and device for classifying text and structuring text classifier by adopting characteristic expansion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 1

[0054] Embodiment 1: Extract feature extension patterns using association rule algorithms. Such as figure 2 Shown is a flow chart of extracting feature extension patterns using the association rule algorithm. The extraction process is as follows:

[0055] Step 1.2 to N-order feature frequent item extraction processing 310

[0056] Set the left part information table of the input feature expansion mode, the information table includes the maximum number of features N, support and confidence thresholds that can be included in the left part of the input feature expansion mode, the scanning module scans the feature sequence set 125' of the text training instance, Each feature sequence is processed as follows, using the association rule mining algorithm (the classic FP-Growth algorithm can be selected), extracting X-order frequent items that meet the support requirements from the feature sequence, and constructing 2-N order feature frequent itemsets, where 2≤X≤N+1.

[0057] Step...

Embodiment approach 2

[0061] Embodiment 2: Use a knowledge dictionary (such as "HowNet") to extract feature expansion patterns. In this embodiment, the types in the feature set are limited to words. Such as image 3 Shown is the flow chart of the extended mode of feature extraction using HowNet. The specific process is as follows:

[0062] Step 1. Word Pair Extraction Processing 410

[0063] The feature sequence set of the text training instance is used as input, which is input to the extraction processing module for processing, and the output is a word pair set.

[0064] Set the distance threshold θ between word pairs. The scanning module scans the feature sequence set 125' of the text training instance, and the extraction processing module processes each feature sequence as follows: obtain the positions of the two words in the feature sequence, calculate the difference between the positions of the two words, and compare the difference with the distance threshold θ , extract word pairs whose d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for structuring a text classifier by carrying out characteristic expansion treatment on text information objects in a training set and a text classifying device applied to the method. Through carrying out the characteristic expansion treatment on texts to classify the texts and carrying out the characteristic expansion treatment on training texts to structure the classifier, the performance of the classifier is improved and harmful information in short text information can be intercepted and filtered. The invention has favorable identification ability and classification ability on short texts and is particularly suitable for the processing of the texts in instant messaging systems such as QQ, MSN and cellphone short messages and the texts in network comments.

Description

technical field [0001] The invention relates to a computer information processing system, in particular to a method and device for constructing a text classifier by performing feature expansion processing on text information objects in a training set. technical background [0002] Short text classification is to use computers to automatically classify short texts (usually less than 160 characters), such as instant messaging system QQ, texts in MSN, and texts in mobile phone short messages, to determine whether they belong to a certain category . [0003] Short text classification technology is a challenging key technology that must be solved in the field of short text applications, and has important application prospects. For example: short text classification is the basis for solving the realistic task of mobile phone short message filtering. Mobile phone short message has entered a period of explosive rapid growth in recent years, and has become an important information ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
Inventor 樊兴华
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products