Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

KNN-based text classification method

A text and original text technology, applied in the field of nuclear safety-level software verification and reliability verification, can solve problems such as loss of important title information, reduce storage requirements and online calculation, reduce feature dimensions, and improve accuracy.

Active Publication Date: 2016-09-28
CHINA TECHENERGY +1
View PDF5 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In addition, as a technical document in the field of nuclear power, its compilation must conform to the standard specifications, especially the titles of the same kind have a high degree of generalization and similarity. Important information that large loss titles bring to classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • KNN-based text classification method
  • KNN-based text classification method
  • KNN-based text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The KNN-based text classification method of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0020] In the text classification method based on KNN proposed by the present invention, the training sample data set is represented by two parts of information of the original text itself and all titles in the text. For all texts and all titles, two DBM models are constructed according to the feature hierarchy of the text from shallow to deep, and further low-dimensional and high-discriminatory deep features are extracted and stored. During the test, text titles are considered with appropriate weights for similarity. The contribution brought by the degree calculation to determine the category of the text to be tested. This method can make full use of the information of the text title, and has a significant improvement in classification performance compared to using shallow feature vectors as a training set, and can reduce ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a KNN-based text classification method which is suitable for nuclear safety software verification and reliability verification. The KNN-based text classification method comprises a training process processing step and a test process processing step: representing training sample data sets by information of original texts and all the titles in the texts; and constructing two DBM models according to characteristic hierarchical structures in the texts from shallow to deep, extracting and storing deep characteristics with low dimensionality and high discrimination, and determining the categories of to-be-tested texts through considering the contribute brought from text titles to the similarity calculation via proper weights in the test process. According to the method disclosed in the invention, information of the text titles are fully utilized, the classification performance, relative to the condition of taking the shallow characteristic vectors as training sets, is remarkably improved, and meanwhile, the storage demand and online calculation amount are reduced at the same time, so that the problem of characteristic vector high-dimensionality disaster is solved and the classification correctness is improved; and the method can be used for the matching of rules in safety level software reliability evaluation analysis and the establishment of a failure mode library.

Description

technical field [0001] The invention relates to a KNN-based text classification method, in particular to nuclear safety level software verification and reliability verification. Background technique [0002] In the process of nuclear safety software development, a large number of technical documents are generated, and with the iterative development of software, the technical documents are constantly updated, for each item in each version of the document (such as requirements or design items) , according to the nuclear power-related software standards, the requirements of the evaluation rules must be met. Therefore, it is an urgent problem for quality personnel to quickly and accurately determine the relationship between items and rules. In addition, in the entire life cycle of software product development, in order to detect potential failures early, iteratively perform failure mode and effect analysis (FMEA) at each stage, establish a failure mode library and be able to aut...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/35G06F18/24147
Inventor 冯素梅赵云飞张亚栋江国进白涛王晓燕宁祾程建明
Owner CHINA TECHENERGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products