Improvement-based KNN (K Nearest Neighbor) text classification method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A text classification and text technology, which is applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc.

Active Publication Date: 2015-03-11

CHINA TECHENERGY +1

View PDF2 Cites 22 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Aiming at the problem that the accuracy and speed cannot be taken into account in the existing text classification algorithm, the present invention proposes an improved text classification algorithm based on the KNN algorithm, and uses text classification technology to solve the rule matching and failure mode matching problems in software reliability evaluation and analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0056] The invention proposes an improved KNN-based text classification algorithm, which is applied in the review process of software requirements and design documents (especially software reliability review). The algorithm first preprocesses the training text and builds a feature vector space model, including word segmentation (this algorithm uses a general word segmentation method that combines statistical word segmentation and a dictionary for word segmentation), and removes stop words (stop words refer to some in the file set) Words with a high frequency of occurrence and obviously no or little contribution to the classification task. Function words such as adverbs, pronouns, articles, prepositions, and conjunctions that appear in the file set that do not represent actual semantics belong to the category of stop words), feature Word extraction (the purpose is to select words that are helpful for classification, and reduce the dimension, using the chi-square test method, see...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an improvement-based KNN (K Nearest Neighbor) text classification method. The method comprises the following steps: preprocessing a training text, computing the feature vector of each training sample, and constructing a feature vector spatial model of a training set; defining a density and a distance, defining a density and a distance, defining a whole sample space into a plurality of spherical regions and outliers according to types, and storing as a training set library; during testing, judging whether a text to be tested falls into a certain spherical region, judging the type of the text to be tested according to a corresponding mark number, otherwise, using the outliers and the center point of each sphere as a training set library, calling a KNN algorithm, and judging the type of the text to be tested. By adopting the method provided by the invention, the classification speed, classification accuracy and data skew sensitivity are considered. The method can be well applied to the classification problem of non-spherical distribution, and is particularly suitable for a text classification problem having a high-dimension feature vector and a distribution irregularity feature.

Description

technical field [0001] The invention belongs to the technical field of verification and confirmation of nuclear safety level software, and in particular relates to an improved KNN-based text classification method. Background technique [0002] In recent years, with the gradual standardization of the software development process, the quality requirements for technical documents are getting higher and higher, especially in the process of software development for nuclear safety level, a large number of technical documents, such as requirements documents, design documents, etc., are generated. According to the requirements of relevant nuclear power standards, each requirement item and design item must meet the requirements of certain evaluation rules; and with the iterative development of the software, the technical documents are constantly updated and upgraded, aiming at each version of the document Each item (such as a requirement item or a design item) must also meet the requ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/35

Inventor冯素梅赵云飞刘建龙张亚栋刘邦信周小波程建明

OwnerCHINA TECHENERGY

Improvement-based KNN (K Nearest Neighbor) text classification method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology