Quick text classification method for corpus

A technology of text classification and classification method, applied in the field of fast text classification of corpus, can solve the problems of inconvenient corpus in-depth analysis and research, cannot improve the efficiency and accuracy of corpus classification, and achieve the effect of improving classification speed and classification accuracy.

Pending Publication Date: 2021-02-05
BOHAI UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The object of the present invention is to provide the fast text classification method of corpus, to solve the existing fast text classification method of corpus proposed in the above-mentioned background technology, often classify according to part of speech, can not carry out fast and accurate classification to the corpus in the corpus , so that the efficiency and accuracy of corpus classification cannot be improved, and it is not convenient for researchers and scholars to conduct in-depth analysis and research on corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Quick text classification method for corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific embodiments.

[0025] Such as figure 1 As shown, the fast text classification method of the corpus, the classification method includes the following steps:

[0026] (1) Select the existing corpus that needs to be used;

[0027] (2) Extract information data in the corpus, and preprocess the information data;

[0028] (3) Input the preprocessing result into the vector space model;

[0029] (4) carry out feature word processing;

[0030] (5) Select a classifier for the feature words;

[0031] (6) Evaluate the effect of the classifier;

[0032] (7) Use a classifier to classify the corpus.

[0033] The existing corpus in step (1) specifically refers to the Chinese corpus, and its corpus type is specifically a monolingual type.

[0034] The information data in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a quick text classification method for a corpus. The method comprises the following steps: selecting an existing corpus needing to be used; extracting information data in the corpus, and preprocessing the information data; inputting a preprocessing result into the vector space model; carrying out feature word processing; selecting a classifier for the feature words; evaluating the effect of the classifier; and utilizing a classifier to classify the corpus. According to the quick text classification method for the corpus, corpora in the corpus can be quickly and accurately classified, so that the corpus classification efficiency and accuracy can be improved, and researchers and scholars can conveniently and deeply analyze and research the corpora.

Description

technical field [0001] The invention relates to the technical field of corpus text classification, in particular to a fast text classification method for corpus. Background technique [0002] Corpus refers to a large-scale electronic text library that has been scientifically sampled and processed. Its role is to enable researchers to carry out relevant language theory and application research by using computer analysis tools. Since corpora are the basic resources that carry language knowledge, corpus It is one of the main data bases for researchers and scholars to carry out linguistic research, and the corpus stores the language materials that have actually appeared in actual use. Therefore, the corpus is also one of the important theoretical sources of linguistic research methods. It is mainly used in With the continuous development of the times and the continuous improvement of computer technology in dictionary compilation, language teaching, traditional language research ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
CPCG06F16/353
Inventor 王大鹏
Owner BOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products