Method for improving vector distance classifying quality

A vector distance and quality technology, applied in the field of improving the quality of automatic classification, can solve the problems of classification quality and classification technology that need to be further improved

Inactive Publication Date: 2014-06-04
DALIAN LINGDONG TECH DEV
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Although these algorithms have achieved good classification results in some fields or in some specif

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for improving vector distance classifying quality
  • Method for improving vector distance classifying quality
  • Method for improving vector distance classifying quality

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be further described below in conjunction with the drawings. Such as figure 1 Shown is a schematic diagram of a professional dictionary constructed in the feature weighting process based on VSM. Such as figure 2 Shown is a schematic diagram of an adaptive system based on professional dictionaries. If 3 shows a schematic diagram of an adaptive system based on the training corpus. The experimental process is as follows:

[0035] A. Feature weighting based on VSM

[0036] A1. Word frequency weighting of feature items based on word meaning

[0037] The present invention establishes three dictionaries: a professional main dictionary, a professional synonym dictionary, and a professional connotative word dictionary, which are used for entry segmentation and word frequency statistics. The entries of the professional main dictionary are required to be as independent as possible in meaning.

[0038] When performing word frequency statistics, feature extr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for improving the vector distance classifying quality. The method comprises the following steps that the characteristic weighting process based on VSM is carried out, wherein the characteristic weighting process comprises the characteristic item word frequency weighting based on the word meaning and the characteristic item frequency weighting based on a document structure; word stems of English searching are extracted; a user query log is analyzed; a training corpus is corrected and expanded. The main defect of VSM formed by contradiction between the characteristic entry mutual independence requirement and the natural language diversity is overcome by constructing a segmentation dictionary, the classifying quality and the algorithm efficiency are improved while the classifying algorithm processing is simplified through the word stem processing technology, the query requirement in which a user takes great interest is obtained by analyzing the query log of the user and is used for guiding and amending the terminological dictionary, then the corpus can be dynamically changed along with the improvement of various special technologies through the amending and expanding of the training corpus, and a Robot program is guided to collect the latest special technical data.

Description

Technical field [0001] The invention relates to a technology for improving the quality of automatic classification, in particular to a method for improving the quality of vector distance classification. Background technique [0002] The key problem of automatic document classification is how to construct a classification function or classification model (also called a classifier), and use this classification model to map unknown documents to a given category space. There are many construction algorithms for classifiers, mainly including probability and statistics algorithms, machine learning algorithms, and neural network algorithms. The probability and statistics algorithm uses a relatively simple mechanism. It has achieved satisfactory results in processing large-scale real documents. [0003] The idea of ​​the simple vector distance algorithm is very simple. According to the arithmetic average, it generates a center vector for each type of document set through sample training;...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/951G06F40/279
Inventor 李聪慧王秀坤
Owner DALIAN LINGDONG TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products