Multi-label text data feature selection method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of data features and text data, applied in the fields of electrical digital data processing, natural language data processing, instruments, etc., can solve problems such as low accuracy and complex algorithms

Pending Publication Date: 2020-08-18

HENAN NORMAL UNIV

View PDF0 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to provide a multi-label text data feature selection method and device to solve the problems of low accuracy and complex algorithm of the current multi-label text data feature selection method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] The specific embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

[0047] method embodiment

[0048] Before introducing the specific means of the present invention, some knowledge related to the present invention, the Fisher-Score algorithm and the neighborhood rough set algorithm are introduced.

[0049] 1) Related concepts of mutual information

[0050] Assuming that A and B are two events, and P(A)>0, the conditional probability of event B under the condition that event A occurs is:

[0051]

[0052] For a discrete random variable X={x 1 ,x 2 ,...,x n}, then the information entropy of random variable X can be expressed as:

[0053]

[0054] In the formula, P(x i ) is the occurrence of event x i The probability of; n is the total number of possible events (states). Obviously, for a fully determined variable X, H(X)=0; for a random variable X, H(X)>0 (non-negativity), and the value of H(X) in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a multi-label text data feature selection method and device, and belongs to the technical field of text data processing. The multi-label text data feature selection method comprises the steps of firstly considering second-order correlation between marks in a text data set, grouping the marks to enable the marks to be better suitable for a multi-mark data set, determining afinal score of each feature according to scores calculated by the features in each mark group, and selecting a set number of features with higher scores from the final scores to form a feature set; and then, based on the obtained feature set, determining the neighborhood granularity of each sample according to the classification interval of each sample in the text data set for the marks to obtaina multi-mark neighborhood decision-making system, calculating importance by utilizing the dependency degree of the improved neighborhood rough set, and screening the obtained feature set, thereby realizing feature selection of the multi-mark text data. Compared with an original neighborhood rough set feature selection method for all attributes, the multi-label text data feature selection method is lower in time complexity, and the optimal feature subset is more accurate.

Description

technical field [0001] The invention relates to a multi-label text data feature selection method and device, belonging to the technical field of text data processing. Background technique [0002] Multi-label learning is a research hotspot in the fields of pattern recognition, machine learning, data mining and data analysis. In multi-label learning, each instance is not only described by a set of feature vectors, but also corresponds to multiple decision attributes. There are also many problems in real life that belong to the category of multi-label learning, for example: a movie can belong to multiple categories at the same time, such as "action", "sci-fi" and "war"; a document may have multiple topics at the same time, such as " Medicine", "technology" and "artificial intelligence"; an image may be labeled with multiple semantics at the same time, such as "street", "car" and "pedestrian". It is difficult to accurately classify this type of problem using single-label clas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/117

CPCG06F40/117

Inventor 孙林王天翔李文凤李梦梦

Owner HENAN NORMAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-label text data feature selection method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology