Unlock instant, AI-driven research and patent intelligence for your innovation.

A text classification method and equipment based on k-nearest neighbor knn

A text classification and nearest neighbor technology, applied in the field of artificial intelligence, can solve the problem of insufficient stability of calculation performance of the table matching algorithm, achieve the effect of reducing the amount of calculation, improving the accuracy rate, and improving the sparse distribution

Active Publication Date: 2021-05-11
DEEPBLUE TECH (SHANGHAI) CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, in this method, the calculation performance of the table matching algorithm is not stable enough due to noise

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text classification method and equipment based on k-nearest neighbor knn
  • A text classification method and equipment based on k-nearest neighbor knn
  • A text classification method and equipment based on k-nearest neighbor knn

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0101] Because the traditional KNN text classification method needs to encode the text into a numerical vector and then input it into the KNN model for text classification, but the encoding of text into a numerical vector has the problems of large vector dimensions and sparse distribution of text feature information, resulting in the text encoding based on KNN. When numerical vectors are used for text classification, the accuracy of classification results is low. Therefore, the embodiment of the present invention provides a KNN-based text classification method, which encodes the text into a character string vector and inputs it into the KNN model, so that the text classification result can be obtained quickly and effectively.

[0102] Such as figure 1 As shown, the specific implementation steps are as follows:

[0103] Step 10: decomposing the text into words, and extracting words representing feature information of the text from the words;

[0104] The above-mentioned words...

Embodiment 2

[0159] Based on the same inventive concept, the embodiment of the present invention also provides a text classification device based on K-nearest neighbor KNN, such as image 3 As shown, the device includes: a processor 30 and a memory 31, wherein the memory stores program codes, and when the program codes are executed by the processor, the processor 30 is used for:

[0160] decomposing the text into words, and extracting words representing feature information of the text from the words;

[0161] Encoding the text into a character string vector using the extracted words;

[0162] Using the KNN model to calculate the similarity between the character string vector and the sample character string vector in the KNN model, according to the similarity and the classification label corresponding to the sample character string vector, determine the character string vector Classify labels and output.

[0163] As an optional implementation manner, the processor 30 is specifically configu...

Embodiment 3

[0188] The present invention provides another text classification device based on K-nearest neighbor KNN, such as Figure 4 As shown, the device includes: a decomposition module 40, an encoding module 41 and a classification module 42, wherein:

[0189] Decomposition module 40, is used for decomposing text into words, extracts the word representing the feature information of text from said word;

[0190] An encoding module 41, configured to encode the text into a character string vector using the extracted words;

[0191] A classification module 42, configured to use the KNN model to calculate the similarity between the character string vector and the sample character string vector in the KNN model, and determine according to the similarity and the classification label corresponding to the sample character string vector Classification labels for the string vector and output.

[0192] As an optional implementation manner, the classification module 42 is specifically used for:...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text classification method and equipment based on K nearest neighbor KNN. It is used to reduce the computational load of text classification, represent text feature information more effectively, and improve the accuracy of text classification. The method includes: decomposing the text into words, extracting words representing feature information of the text from the words; using the extracted words, encoding the text into character string vectors; calculating the characters by using the KNN model The similarity between the string vector and the sample string vector in the KNN model, according to the similarity and the classification label corresponding to the sample string vector, determine the classification label of the string vector and output it.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a KNN (k-NearestNeighbor)-based text classification method and equipment thereof. Background technique [0002] At present, text classification is to automatically classify and mark text collections according to a certain classification system or standard, which belongs to an automatic classification based on a classification system. The text classification process can be understood as the process of matching the data to be classified with sample data according to certain characteristics of the data to be classified. Generally, there are two ways to extract and classify text data, as follows: [0003] One is to encode the feature information in the text data into a digital vector, calculate the similarity between the digital vector and the sample digital vector, and determine the classification result of the text data corresponding to the digital vector according ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06F16/35
CPCG06F18/24143G06F18/214
Inventor 陈海波
Owner DEEPBLUE TECH (SHANGHAI) CO LTD