Unlock instant, AI-driven research and patent intelligence for your innovation.

Text Classification Methods, Electronic Devices

A technology of text classification and word segmentation method, which is applied in text database clustering/classification, unstructured text data retrieval, electronic digital data processing, etc., and can solve the problems of limited coverage and accuracy, low ceiling, time-consuming and labor-intensive problems, etc.

Active Publication Date: 2021-01-05
SHANGHAI GUAN AN INFORMATION TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The text classification problem is a very classic problem in the field of natural language processing. Related research can be traced back to the 1950s, when it was classified through expert rules (Pattern), and even developed to use knowledge engineering to establish expert knowledge in the early 1980s. System, the advantage of doing this is to solve the top problem quickly, but obviously the ceiling is very low, not only time-consuming and labor-intensive, but also the coverage and accuracy are very limited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text Classification Methods, Electronic Devices
  • Text Classification Methods, Electronic Devices
  • Text Classification Methods, Electronic Devices

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In the process of implementing this application, the inventors found that there is a set of sample data (such as sample text) in the existing text classification method, also known as the training sample set, and each data in the sample set has a label, that is, knowing the sample Concentrate the relationship between each data and its category. After inputting unlabeled data, each feature in the new data is compared with the corresponding feature of the data in the sample set, and the classification label of the most similar data (nearest neighbor) of the feature in the sample set is extracted. Generally speaking, only the top k most similar data in the sample data set are selected, usually k is an integer not greater than 20. Finally, the classification with the most occurrences among the k most similar data is selected as the classification of the new data.

[0054] However, when the above method is applied to samples with an unbalanced quantity, the prediction devia...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the present application provide a text classification method, an electronic device, and a computer program product. Obtaining the central text from the sample text using the scheme of the present application; According to the similarity between the center text and the text to be classified, the preset number of center texts is selected as the similarity center text. Determining a weight of each similarity center text; The category of the text to be classified is determined according to the weight of each similarity center text. After selecting a preset number of similar center texts, Instead of simply determining the category of the text to be categorized according to the number of similar central text in each category, the weight of each similar central text is determined,and the category of the text to be categorized is determined according to the weight of each similar central text, which can improve the accuracy of text classification.

Description

technical field [0001] The present application relates to natural language processing technology, in particular, to a text classification method and electronic equipment. Background technique [0002] The text classification problem is a very classic problem in the field of natural language processing. Related research can be traced back to the 1950s, when it was classified through expert rules (Pattern), and even developed to use knowledge engineering to establish expert knowledge in the early 1980s. System, the advantage of doing this is to solve the top problem quickly, but obviously the ceiling is very low, not only time-consuming and labor-intensive, but also very limited coverage and accuracy. [0003] With the development of statistical learning methods, especially the increase in the number of online texts on the Internet and the rise of machine learning disciplines after the 1990s, a set of classic methods for solving large-scale text classification problems has gra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289
CPCG06F40/289
Inventor 唐海龙张岩杨柳方蒙
Owner SHANGHAI GUAN AN INFORMATION TECH