Unlock instant, AI-driven research and patent intelligence for your innovation.

A Text Data Classification and Information Mining Method

A text data and information mining technology, applied in the computer field, can solve problems such as inaccurate classification, and achieve the effect of overcoming high-dimensional-sparse, reducing dimensionality and enhancing accuracy

Active Publication Date: 2021-05-28
JIANGNAN UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem of inaccurate classification caused by high dimensionality and sparseness existing in current text classification methods, and the need to obtain classified texts, the present invention provides a text data classification and information mining method, the method comprising:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Data Classification and Information Mining Method
  • A Text Data Classification and Information Mining Method
  • A Text Data Classification and Information Mining Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] This embodiment provides a text data classification and information mining method, see Figure 1-3 , the method includes:

[0037] Step 1: Text Preprocessing

[0038] After obtaining a large amount of text data from the 12345 mayor’s hotline, firstly segment each text according to the NLPIR Chinese word segmentation system, use the existing stop word dictionary to remove stop words, and obtain the discretized data text, that is, the initial text vector.

[0039] Step 2: Get the text feature vector

[0040] Establish keyword databases of different levels and categories, and determine the text feature vector corresponding to each piece of text data according to the keyword database;

[0041] According to the actual characteristics of the text keywords and categories, the category keyword database is established, and the initial text vector obtained in step 1 is matched with the established keyword databases of different levels and categories to obtain the word frequency...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text data classification and information mining method, which belongs to the technical field of computers. The present invention utilizes the keyword database established according to the actual data, so that the dimension of the text feature vector is greatly reduced and the information is concentrated, which overcomes the problem of high-dimensional-sparseness in the text big data processing process, and facilitates the realization of the support vector machine algorithm. At the same time, the differentiation coefficient of membership degree is defined to select reliable individuals to enhance the accuracy of text classification. Further, before training the support vector machine, the present invention uses two-layer fuzzy classification to initially obtain its category, without knowing the category of the training data in advance. For the text data of the 12345 hotline, this application also proposes a method of using swarm intelligence for different categories to classify The final question text gives solutions, which can fully mobilize professionals in different fields to give answers.

Description

technical field [0001] The invention relates to a text data classification and information mining method, belonging to the technical field of computers. Background technique [0002] Text classification is a very important problem in the field of natural language processing. It is widely used in spam filtering, user comment sentiment recognition, user query intent recognition, news classification, etc. The purpose is to better extract the common information contained in the text, discover regular characteristics, and improve the efficiency of further text processing. [0003] For example, the text data classification of the 12345 mayor hotline, which is closely related to people's daily life, can better summarize a large number of problems reflected by the people through classification, and professionals can give accurate and unified answers to form a knowledge base, avoiding Duplication of work by government personnel improves work efficiency; through classification, diffe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/35G06F18/2411
Inventor 鲁玥王玉曲皓张逍玉孔祥智
Owner JIANGNAN UNIV