Tax document hierarchical classification method based on multi-tag classification

A hierarchical classification and multi-label technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as information overload
CN104199857AActive Publication Date: 2014-12-10XI AN JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
XI AN JIAOTONG UNIV
Publication Date
2014-12-10

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

Provided is a tax document hierarchical classification method based on multi-tag classification. Firstly, generated subject distribution is extracted from a latent Dirichlet allocation model, and a latent Dirichlet allocation topic character of a tax file is built; then, tf idf feature vectors corresponding to training data are built, the tf idf feature vectors including the training data and files to be classified are calculated, and similarity is calculated to obtain candidate category tags; finally, source data of candidate category tag nodes are supplemented with auxiliary data, a multi-tag classification model based on transfer learning is built through a transfer learning algorithm TrAdaBoost, and the files to be classified are classified. According to the method, a hierarchical classification problem is converted into a searching stage and a classification stage, calculated amount is greatly reduced by means of incremental candidate category searching, computation complexity is lowered, the tax files are mapped to tax category hierarchical categories by means of the multi-tag classification model based on transfer learning, the auxiliary data are effectively used, and classification performance is improved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the field of data mining, and in particular relates to a tax document hierarchical classification method based on multi-label classification. Background technique

[0002] With the rapid development of the Internet, various resources have grown exponentially, and a large number of tax documents have also emerged on the Internet, causing information overload in the process of people obtaining. How to effectively organize and manage tax documents is the key to solving the problem of information overload in the process of obtaining tax resources, and it is a task of great significance to taxation.

[0003] In order to effectively organize and manage the massive tax documents on the Internet, tax documents are usually classified according to a subject category hierarchy or large-scale concept to better access and search these tax documents. Tax classification is the classification of various taxes according to certain standards. A...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More