Unlock instant, AI-driven research and patent intelligence for your innovation.

Text classification method and device

A text classification and classification algorithm technology, which is applied in the computer field, can solve the problems that big data text classification cannot be realized, and statistical analysis tools cannot meet large-scale data processing, etc.

Inactive Publication Date: 2018-02-09
CHINA UNITED NETWORK COMM GRP CO LTD
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Due to the rapid increase in data volume in recent years and the advent of the era of big data, traditional statistical analysis tools and data processing capabilities can no longer meet the requirements of large-scale data processing, and it is impossible to realize big data text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and device
  • Text classification method and device
  • Text classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

[0024] First, the terms involved in the present invention are explained:

[0025] R language: R language is a free and open source statistical analysis tool and an efficient programming language. It has rich statistical models and data analysis methods, but has poor data processing scalability. Its memory-based core technology engine can process or process The amount of data is very limited, and i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a text classification method and device. The method comprises the steps that all sample texts in a sample set are preprocessed to obtain category information and a segmented word corpus of all the sample texts; the sample set is divided into a test set and a training set by summarizing the category information and the segmented word corpus of all the sampletexts in the sample set; and text classification is performed on all the sample texts in the sample set through the test set and the training set. According to the embodiment, all the sample texts inthe sample set are preprocessed to obtain the category information and the segmented word corpus of all the sample texts; the sample set is divided into the test set and the training set by summarizing the category information and the segmented word corpus of all the sample texts in the sample set; text classification is performed on all the sample texts in the sample set through the test set andthe training set; and classification of big data texts is realized in combination with an R language and Hadoop.

Description

technical field [0001] Embodiments of the present invention relate to the field of computer technologies, and in particular, to a text classification method and apparatus. Background technique [0002] Text classification uses computers to automatically classify and mark text sets (or other entities or objects) according to a certain classification system or standard. It belongs to an automatic classification based on a classification system and is a naive Bayesian classification method. [0003] Text classification generally includes the process of text expression, classifier selection and training, evaluation and feedback of classification results, etc. The text expression can be subdivided into text preprocessing, indexing and statistics, feature extraction and other steps. The overall functional modules of the text classification system are: (1) Preprocessing: format the original corpus into the same format to facilitate subsequent unified processing; (2) Indexing: deco...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/289
Inventor 许丹丹刘静沙刘颖慧
Owner CHINA UNITED NETWORK COMM GRP CO LTD