Information processing feature extracting method

A feature extraction and feature item technology, which is applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of large amount of calculation, influence of score, and high dimension of feature vector, and achieve the effect of improving the extraction speed.

Inactive Publication Date: 2012-07-11
SHANGHAI DIANJI UNIV
View PDF3 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of mutual information is that the score is very affected by the edge probability of the term
The disadvantage of the information gain method is that it considers the case where the feature does not occur
There are two main reasons: 1) The amount of calculation for feature extraction is too large, and the effic

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information processing feature extracting method
  • Information processing feature extracting method
  • Information processing feature extracting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In order to better understand the technical content of the present invention, specific embodiments are given together with the attached drawings for description as follows.

[0047] The present invention calculates the occurrence frequency (word frequency) of each item in the word frequency matrix, selects a predetermined number of feature items according to the size of the word frequency to form a feature subset (ie keywords), and designs a word frequency space feature extraction method.

[0048] Feature items are selected according to a certain algorithm. Word frequency or feature evaluation functions can be used. In addition, the location information of the entry needs to be considered, such as the article title, subtitle, and the entry that appear in the keyword table, all of which should be retained.

[0049] After preprocessing, the feature vector dimension of the text is still very high. Such high-dimensional features may not be beneficial for classification. High...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an information processing feature extracting method. The method comprises the following steps of: establishing a feature item set which comprises the original feature items; calculating feature items in each class and the weight of the class; sorting the calculated weights of the feature items from high to low in each class and extracting the first K feature items; and combining the feature items extracted from the classes and unifying a feature space. An information processing feature extracting algorithm is provided, and a method for extracting spatial features by using word frequency is implemented. In the process of selecting a feature extracting algorithm, the time and space complexity and the feature extracting effect of the algorithm are comprehensively considered, and the feature extracting algorithm which is simple and practicable is designed and implemented.

Description

technical field [0001] The invention relates to the field of information processing algorithms, and in particular to an information processing feature extraction method. Background technique [0002] The representation of text and the selection of feature items is a basic problem in text mining and information retrieval, which quantifies the feature words extracted from the text to represent text information. Transform them from an unstructured original text into structured information that can be recognized and processed by a computer, that is, scientifically abstract the text and establish its mathematical model to describe and replace the text. It enables the computer to realize the recognition of text through the calculation and operation of this model. Since text is unstructured data, if we want to mine useful information from a large amount of text, we must first convert the text into a processable structured form. At present, people usually use the vector space mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 赵孟德
Owner SHANGHAI DIANJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products