Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text handling method and system

A text processing and text technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as low reliability, accurate classification of training data, and reduced classification accuracy, so as to reduce impact and improve accuracy , the effect of improving reliability

Inactive Publication Date: 2007-08-22
HUAWEI TECH CO LTD
View PDF0 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In the above prior art, in the traditional text feature extraction method, each document in the training document set has a strong correlation with the corresponding category
For SMS texts, a large amount of SMS texts are needed as training data sets to train the model during training, but due to the huge amount of training texts, it is impossible to manually classify each piece of training data accurately, resulting in the training text set itself including A large amount of noise data has low reliability. Using the traditional feature extraction method to extract SMS features based on the training set will cause the extracted feature set to contain more noise features, reducing the training features extracted from the training text. The reliability of the set also further reduces the accuracy of the classification
[0006] In addition, because SMS texts are different from traditional document texts, they often include some variant and irregular texts, called singular words, such as: QQ, SG, MM, and often have many separators or use different separators, It is called a singular symbol, so too much interference information is mixed in the SMS text, which leads to the extraction of a large number of wrong features, or noise features, in the text feature extraction and feature selection, and further reduces the training time extracted from the training text. The reliability of the feature set and the classification ability of the classification system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text handling method and system
  • Text handling method and system
  • Text handling method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In the embodiment of the present invention, the feature vector space extracted in the text feature extraction process is adaptively optimized, noise features are removed, and an optimal low-dimensional feature space is finally obtained.

[0026] Specifically, a text processing method provided by an embodiment of the present invention is applied to text feature extraction technology, and the method includes:

[0027] Step A, in the text training process, classify the training text based on the model parameters after training, and delete the wrongly classified training text, so that only the correctly classified training text is kept in the new training text set, and then based on the correctly classified training text Construct a new feature set; this step can be performed after training model parameters based on the training text set and the feature set obtained from the feature representation.

[0028] Step B, training model parameters based on the new training text se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for handling text,which is used for text feature extraction technology. The method includes it classifies the training text based on the training model parameters and deletes the wrong classified training text in order to concentrate new training text and retain the correctly classified training text, and it sets new version of feature according to the correctly classified training text. And it trains the model parameters based on the above new training set and the new version of the feature. The invention also provides a text-handling system.

Description

technical field [0001] The invention relates to the technical field of intelligent text information processing, in particular to a text processing method and a text processing system. Background technique [0002] Mobile phone text messages have great potential and prospects as a way of advertising, but judging from the current situation, spam text messages in mobile phone text messages have caused serious nuisance problems. In order to solve this problem, advertisement publishers need to adopt effective methods to obtain relevant information of advertisement audiences, so as to deliver targeted and responsive SMS advertisements. [0003] In order to obtain the relevant information of the advertising audience, it is necessary to mine the user's interest points from a large number of user text messages. How to quickly and effectively obtain users' points of interest from a large number of user text messages is a current problem, and text mining of text messages is just a met...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/21G06F17/30
Inventor 尚明生林劼傅彦邵刚
Owner HUAWEI TECH CO LTD
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More