Document classification method and apparatus

A document classification and document technology, applied in the computer field, can solve problems such as difficult adjustment of classification methods and inflexible classification methods, and achieve the effect of flexible document classification

Inactive Publication Date: 2016-06-01
INSPUR QILU SOFTWARE IND
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the prior art, the processing of webpage text data is mainly based on a preset and fixed classification method, which is difficult to adjust according to the needs of users
For example, the accuracy rate of classification results is difficult to meet the needs of users, but it is also difficult for users to adjust the classification method, which has reached the user's accuracy requirements
It can be seen from the above description that the classification methods in the prior art are not flexible enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document classification method and apparatus
  • Document classification method and apparatus
  • Document classification method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work belong to the protection of the present invention. scope.

[0056] like figure 1As shown, the embodiment of the present invention provides a method for document classification, the method may include the following steps:

[0057] S1: Obtain multiple training documents, and determine the category corresponding to each training document;

[0058] S2: According to the training documents corresponding to each cate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a document classification method and apparatus. The method comprises the steps of obtaining a plurality of training documents and determining a type corresponding to each training document; according to the training document corresponding to each type, determining an eigenvector of each type, wherein the eigenvector comprises word strings occurring in the corresponding current type and an occurrence probability of each word string in the current type; obtaining a current to-be-classified document and extracting a matched eigenvector of the current to-be-classified document from the current to-be-classified document, wherein the matched eigenvector comprises to-be-matched word strings occurring in the current to-be-classified document; according to the to-be-matched word strings in the matched eigenvector and the occurrence probability in the eigenvector of each type, determining the similarity between the matched eigenvector and the eigenvector of each type; and taking a type corresponding to an eigenvector with the highest similarity as the type of the current to-be-classified document. According to the document classification method and apparatus provided by the invention, the document classification can be performed more flexibly.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for classifying documents. Background technique [0002] With the development of sustainable technology, natural language processing technology has received unprecedented attention and made great progress, and has developed into a relatively independent discipline that has attracted much attention. Now, with Internet +, big data and other popular concepts and With the attention of technology, various industries are making various attempts to make full use of webpage text data on the Internet, and natural language processing technology is the main force in these webpage text processing, analysis, and utilization tasks. [0003] In the prior art, the processing of webpage text data is mainly based on a preset and fixed classification method, which is difficult to adjust according to user requirements. For example, the accuracy rate of the classification resul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/35G06F18/2415
Inventor 唐旋毛立花王传超
Owner INSPUR QILU SOFTWARE IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products