Multi-label text data feature selection method and device

A technology of data features and text data, applied in the fields of electrical digital data processing, natural language data processing, instruments, etc., can solve problems such as low accuracy and complex algorithms

Pending Publication Date: 2020-08-18
HENAN NORMAL UNIV
View PDF0 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a multi-label text data feature selection method and device to solve the problems of low accuracy and complex algorithm of the current multi-label text data feature selection method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-label text data feature selection method and device
  • Multi-label text data feature selection method and device
  • Multi-label text data feature selection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The specific embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

[0047] method embodiment

[0048] Before introducing the specific means of the present invention, some knowledge related to the present invention, the Fisher-Score algorithm and the neighborhood rough set algorithm are introduced.

[0049] 1) Related concepts of mutual information

[0050] Assuming that A and B are two events, and P(A)>0, the conditional probability of event B under the condition that event A occurs is:

[0051]

[0052] For a discrete random variable X={x 1 ,x 2 ,...,x n}, then the information entropy of random variable X can be expressed as:

[0053]

[0054] In the formula, P(x i ) is the occurrence of event x i The probability of; n is the total number of possible events (states). Obviously, for a fully determined variable X, H(X)=0; for a random variable X, H(X)>0 (non-negativity), and the value of H(X) in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multi-label text data feature selection method and device, and belongs to the technical field of text data processing. The multi-label text data feature selection method comprises the steps of firstly considering second-order correlation between marks in a text data set, grouping the marks to enable the marks to be better suitable for a multi-mark data set, determining afinal score of each feature according to scores calculated by the features in each mark group, and selecting a set number of features with higher scores from the final scores to form a feature set; and then, based on the obtained feature set, determining the neighborhood granularity of each sample according to the classification interval of each sample in the text data set for the marks to obtaina multi-mark neighborhood decision-making system, calculating importance by utilizing the dependency degree of the improved neighborhood rough set, and screening the obtained feature set, thereby realizing feature selection of the multi-mark text data. Compared with an original neighborhood rough set feature selection method for all attributes, the multi-label text data feature selection method is lower in time complexity, and the optimal feature subset is more accurate.

Description

technical field [0001] The invention relates to a multi-label text data feature selection method and device, belonging to the technical field of text data processing. Background technique [0002] Multi-label learning is a research hotspot in the fields of pattern recognition, machine learning, data mining and data analysis. In multi-label learning, each instance is not only described by a set of feature vectors, but also corresponds to multiple decision attributes. There are also many problems in real life that belong to the category of multi-label learning, for example: a movie can belong to multiple categories at the same time, such as "action", "sci-fi" and "war"; a document may have multiple topics at the same time, such as " Medicine", "technology" and "artificial intelligence"; an image may be labeled with multiple semantics at the same time, such as "street", "car" and "pedestrian". It is difficult to accurately classify this type of problem using single-label clas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/117
CPCG06F40/117
Inventor 孙林王天翔李文凤李梦梦
Owner HENAN NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products