Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text Classification Feature Selection Method and Its Application in Biomedical Text Classification

A feature selection method and text classification technology, applied in special data processing applications, instruments, calculations, etc., can solve problems such as not considering the specific pattern of feature words, achieve the effect of reducing dimensionality, improving performance, and optimizing feature sets

Active Publication Date: 2018-11-23
南京睿晖数据技术有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] When most of the Filter methods evaluate irrelevant features, their evaluation functions are based on the assumption that each feature is isolated, without considering the specific patterns that may exist between feature words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text Classification Feature Selection Method and Its Application in Biomedical Text Classification
  • Text Classification Feature Selection Method and Its Application in Biomedical Text Classification
  • Text Classification Feature Selection Method and Its Application in Biomedical Text Classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] Embodiment 1: a kind of text classification feature selection method based on local context similarity measure, it is characterized in that, carry out as follows:

[0039] S1. Extract feature words t from the data set i and t j , then the feature word t i and t j The local context of the context l (t i , N) and context l′ (t j , N) similarity is:

[0040]

[0041] Among them, N is the number of contextual N-grams; t il is included in the local context context l (t i , N) in the feature word t i , t jl′ is included in the local context context l′ (t j , N) in the feature word t j . ; The context N-gram number N is determined by 10-fold cross-validation. In this formula, the cosine similarity cosin_sim degree is used as a measure of the text similarity between local context pairs: if the two texts are exactly the same, the similarity is 1; if the two texts are completely different, the similarity is 0 ; otherwise the similarity is between 0 and 1. By ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a text classification feature selection method and application thereof to biomedical text classification. Local context similarity calculation based on shallow grammatical analysis is adopted, a feature selection algorithm is proposed, and based on measurement of local context similarity of feature words, features are found whether to exist in a certain specific modes so as to measure importance of the features. Moreover, a feature selection method (LLFilter method) based on local context similarity is adopted and by filtering of the features, a sample is enabled to obtain the best classification effect, i.e. inter-class dispersion of the sample obtained after feature filtering reaches the highest, and within-class dispersion reaches the lowest, so that the ability of distinguishing classes is improved. According to the text classification feature selection method and the application thereof to biomedical text classification, which are provided by the present invention, mainly in a biomedical text classification task, local context information in a text is utilized to automatically perform feature importance sorting, so that a feature set is optimized, a dimension of a feature space is reduced, and performance of text classification can be effectively improved.

Description

technical field [0001] The invention relates to a text classification feature selection method, in particular to a text classification feature selection method based on local context similarity, and belongs to the technical field of big data mining. Background technique [0002] With the advent of the information age and the rapid development of information technology, the Internet provides people with extremely rich information resources, resulting in an exponential growth in the amount of information. In order to effectively manage and utilize the information, content-based information retrieval and data mining have gradually become areas of concern. Among them, the realization of automatic text classification has become a key technology with practical value, especially now that manual classification is powerless in the face of massive texts, automatic text classification is particularly important. Text Classification (TC) technology is an important basis for information ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/355G06F40/253
Inventor 陈一飞
Owner 南京睿晖数据技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products