Method and device for classifying text and structuring text classifier by adopting characteristic expansion

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A classifier and construction technology, applied in the direction of instruments, character and pattern recognition, special data processing applications, etc., to achieve the effect of improving classifier performance, good recognition ability, and good classification ability

Inactive Publication Date: 2010-08-04

CHONGQING UNIV OF POSTS & TELECOMM +1

View PDF1 Cites 28 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The technical solution of the present invention to solve the above-mentioned technical problems is to use the training corpus and artificially constructed resources (such as HowNet, etc.) to dig out useful information such as feature combinations with specific relationships, and form a feature extension mode for expanding short texts , to make up for the inherent defect of its weak concept signal, construct a classifier by performing feature extension processing on the text information objects of the training set, and first perform feature extension processing on the text information objects, and then classify them as belonging to a certain category or not a certain kind

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment approach 1

[0054] Embodiment 1: Extract feature extension patterns using association rule algorithms. Such as figure 2 Shown is a flow chart of extracting feature extension patterns using the association rule algorithm. The extraction process is as follows:

[0055] Step 1.2 to N-order feature frequent item extraction processing 310

[0056] Set the left part information table of the input feature expansion mode, the information table includes the maximum number of features N, support and confidence thresholds that can be included in the left part of the input feature expansion mode, the scanning module scans the feature sequence set 125' of the text training instance, Each feature sequence is processed as follows, using the association rule mining algorithm (the classic FP-Growth algorithm can be selected), extracting X-order frequent items that meet the support requirements from the feature sequence, and constructing 2-N order feature frequent itemsets, where 2≤X≤N+1.

[0057] Step...

Embodiment approach 2

[0061] Embodiment 2: Use a knowledge dictionary (such as "HowNet") to extract feature expansion patterns. In this embodiment, the types in the feature set are limited to words. Such as image 3 Shown is the flow chart of the extended mode of feature extraction using HowNet. The specific process is as follows:

[0062] Step 1. Word Pair Extraction Processing 410

[0063] The feature sequence set of the text training instance is used as input, which is input to the extraction processing module for processing, and the output is a word pair set.

[0064] Set the distance threshold θ between word pairs. The scanning module scans the feature sequence set 125' of the text training instance, and the extraction processing module processes each feature sequence as follows: obtain the positions of the two words in the feature sequence, calculate the difference between the positions of the two words, and compare the difference with the distance threshold θ , extract word pairs whose d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for structuring a text classifier by carrying out characteristic expansion treatment on text information objects in a training set and a text classifying device applied to the method. Through carrying out the characteristic expansion treatment on texts to classify the texts and carrying out the characteristic expansion treatment on training texts to structure the classifier, the performance of the classifier is improved and harmful information in short text information can be intercepted and filtered. The invention has favorable identification ability and classification ability on short texts and is particularly suitable for the processing of the texts in instant messaging systems such as QQ, MSN and cellphone short messages and the texts in network comments.

Description

technical field [0001] The invention relates to a computer information processing system, in particular to a method and device for constructing a text classifier by performing feature expansion processing on text information objects in a training set. technical background [0002] Short text classification is to use computers to automatically classify short texts (usually less than 160 characters), such as instant messaging system QQ, texts in MSN, and texts in mobile phone short messages, to determine whether they belong to a certain category . [0003] Short text classification technology is a challenging key technology that must be solved in the field of short text applications, and has important application prospects. For example: short text classification is the basis for solving the realistic task of mobile phone short message filtering. Mobile phone short message has entered a period of explosive rapid growth in recent years, and has become an important information ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06K9/62

Inventor樊兴华

OwnerCHONGQING UNIV OF POSTS & TELECOMM

Method and device for classifying text and structuring text classifier by adopting characteristic expansion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment approach 1

Embodiment approach 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology