An FCBF-based user-defined feature dimension text feature selection algorithm
A feature selection and self-defining technology, applied in the direction of unstructured text data retrieval, text database clustering/classification, calculation, etc., can solve problems such as reducing classification accuracy and affecting classification efficiency.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0051] The present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solution of the present invention more clearly, but not to limit the protection scope of the present invention.
[0052] A FCBF-based custom feature dimension text feature selection algorithm, comprising the following steps:
[0053] Step 1, the vectorized text matrix is X, the text category matrix is C, and T={t is initialized according to the text matrix X 1 ,t 2 ...t m}, let S list ={},S best ;
[0054] for S list Assignment: S list Initially an empty set, for t k ∈T, calculate the kth feature word t of the text k Correlation Corr(t k ,C), when Corr(t k ,C) ≥thresh add t k Into S list , T is the set of all feature words, S list is a set of feature words whose correlation between feature words and categories meets the above requirements, thresh is a decimal number (a number between 0 and...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


