Unlock instant, AI-driven research and patent intelligence for your innovation.

Feature selection method based on word frequency reordering at document level

A feature selection method and reordering technology, applied in unstructured text data retrieval, text database clustering/classification, etc., can solve the problem of low classification accuracy, and achieve the effect of improving classification accuracy

Active Publication Date: 2019-02-22
XIAN UNIV OF TECH
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a feature selection method based on word frequency reordering at the document level to solve the problem of low classification accuracy in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method based on word frequency reordering at document level
  • Feature selection method based on word frequency reordering at document level
  • Feature selection method based on word frequency reordering at document level

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0042] Relevant definitions in the present invention are as follows:

[0043] Definition 1: word frequency, entry t i in document d j The ratio of the number of occurrences in this document to the total number of entries in this document, using tf ij express.

[0044] Definition 2: Intra-class term frequency sum, entry t i in a category C k The sum of the word frequencies of all documents in tf ki Indicates that the calculation formula is as follows:

[0045]

[0046] Among them, k is the category information label, N is the total number of documents in the data set, I(d j , C k ) is to judge the document d j Whether it belongs to category C k formula,

[0047] Definition 3: The total term frequency sum, the term t in each document in the entire data set i The sum of the word frequency, use tfi Indicates that the calculation ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a feature selection method based on word frequency reordering of document layer. Its main purpose is to reduce the dimension of feature space and improve the accuracy of classification. Based on an existing dataset, the redundant features with very little information are removed, and then the dimension of the data set is reduced according to the feature selection method. Finally, based on the current feature set, the classification model is constructed and the classification F1 value is obtained through the 5-fold cross-validation method, and the feature set with the highest corresponding classification F1 value is selected as the optimal feature set. The method of the patented technology of the present invention is used for feature selection, assists in discoveringentry information having discriminating power, and overcomes the problem of singularity of document frequency calculation mode and the problem of unbalance of data set by means of a reordering methodof entry frequency on a document layer.

Description

technical field [0001] The invention belongs to the technical field of data mining methods, and relates to a feature selection method based on word frequency reordering at the document layer. Background technique [0002] With the continuous development of the Internet, scientific knowledge, Internet data and various resources have shown massive characteristics. With the continuous improvement of data processing and data storage technology, the number of documents in the network is also increasing exponentially. How to quickly and accurately obtain valuable information from massive information has become an urgent problem that people need to solve. The ability to manually process data is far from meeting the requirements of real life. Effectively organizing and managing information and quickly distinguishing useful and useless information are all facing great challenges. Classification technology has become the key technology to solve this problem, and is widely used in dif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
Inventor 周红芳张英杰刘虹江张尧张懿辉吴珞风
Owner XIAN UNIV OF TECH