Unlock instant, AI-driven research and patent intelligence for your innovation.

A text feature selection method based on full-coverage granular computing

A feature selection method and full coverage technology, applied in the field of text mining, can solve problems such as poor accuracy and weak feature representation

Inactive Publication Date: 2019-01-08
TAIYUAN UNIV OF TECH
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In order to solve the shortcomings of existing feature selection methods, such as poor accuracy and weak feature expression, the present invention proposes a text feature selection method based on full coverage granule calculation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text feature selection method based on full-coverage granular computing
  • A text feature selection method based on full-coverage granular computing
  • A text feature selection method based on full-coverage granular computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to clarify the purpose, technical solutions and advantages of the present invention, the present invention will be further described in detail below with practical examples.

[0042] Use web crawlers to obtain a certain amount of news in different fields from Sohu News, analyze and organize these articles, remove the same news and non-text symbols in the news, and use it as a sample set.

[0043] In order to select a representative set of feature words from the text, the title and body of the sample set are segmented, stop words removed and part-of-speech tagged.

[0044] The improved TFIDF method is used to calculate the probability of feature words, and words with different positions and parts of speech are given different weight coefficients. For example, a news article can be expressed as: d i ={t i |t i1 ,t i2 ,t i3 ,t i4 ,...,t im}, where t i Represents the set of news words, t i1 ,t i2 ,t i3 represent the words in the title, and the rest repre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text feature selection method based on full-coverage granular computing, which comprises the following steps: 1) segmenting a sample text set, stopping the use of words, andpart-of-speech tagging; 2) extending the position and part of speech factors to the document of feature words in a TFIDF algorithm with different weight coefficients To calculate the text-word frequency probability of the feature words; 3) using a bLDA topic model to generate the probability of the feature words to calculate the semantic information of the feature word; 4) granulating the featurewords, reducing the feature words by using the knowledge reduction algorithm of full coverage granulating computation, and obtaining the document of the text-word frequency probability of the reducedfeature word set; 5) combining bLDA and the improved TFIDF algorithm to calculate the weight of feature words to obtain the text-word frequency probability of the reduced feature word set. Through the invention, the part of speech, position and semantic factors of the feature words are considered, and meanwhile, the feature words which are not strong in meaning to the text are removed, so the more representative feature word set is selected and the precision of clustering is improved.

Description

technical field [0001] The invention belongs to the intersecting field of text mining and full-coverage granule calculation, specifically relates to text feature selection and full-coverage granule calculation model, and especially relates to the application of full-coverage granule calculation knowledge reduction in text feature selection. Background technique [0002] Text clustering is an important research topic in the fields of pattern recognition, machine learning, and data mining. It is mainly to group a collection of text objects into multiple classes composed of similar objects, so as to realize the clustering of unknown text data. At present, the vector space model is mainly used to represent text information in a structured way, but the model has problems of high dimensionality of feature space and data sparsity. The high-dimensional feature space not only increases the time complexity and space complexity of the system operation, but also contains a large number ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35
Inventor 谢珺邹雪君靳红伟续欣莹
Owner TAIYUAN UNIV OF TECH