Tax document hierarchical classification method based on multi-tag classification

A hierarchical classification and multi-label technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as information overload

Active Publication Date: 2014-12-10
XI AN JIAOTONG UNIV
View PDF6 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a tax document hierarchical classification method based on multi-label classification,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tax document hierarchical classification method based on multi-tag classification
  • Tax document hierarchical classification method based on multi-tag classification
  • Tax document hierarchical classification method based on multi-tag classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The present invention will be further described in detail below in conjunction with the accompanying drawings.

[0055] Tax documents refer to materials and articles describing, analyzing and researching taxation in the field of taxation. The classification of tax categories refers to the tax system formed by classifying various tax categories according to certain standards.

[0056] The tax document hierarchical classification method based on multi-label classification provided by the present invention includes the following three processes:

[0057] 1) Tax document subject feature construction, including 2 steps:

[0058] 1-1) Perform denoising preprocessing on tax documents to be classified, that is, convert all tax documents to be classified into text types, perform data cleaning on documents, delete garbled documents caused by conversion, remove duplicate documents, and remove Document title, author and other metadata information to obtain the document to be clas...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided is a tax document hierarchical classification method based on multi-tag classification. Firstly, generated subject distribution is extracted from a latent Dirichlet allocation model, and a latent Dirichlet allocation topic character of a tax file is built; then, tf idf feature vectors corresponding to training data are built, the tf idf feature vectors including the training data and files to be classified are calculated, and similarity is calculated to obtain candidate category tags; finally, source data of candidate category tag nodes are supplemented with auxiliary data, a multi-tag classification model based on transfer learning is built through a transfer learning algorithm TrAdaBoost, and the files to be classified are classified. According to the method, a hierarchical classification problem is converted into a searching stage and a classification stage, calculated amount is greatly reduced by means of incremental candidate category searching, computation complexity is lowered, the tax files are mapped to tax category hierarchical categories by means of the multi-tag classification model based on transfer learning, the auxiliary data are effectively used, and classification performance is improved.

Description

technical field [0001] The invention belongs to the field of data mining, and in particular relates to a tax document hierarchical classification method based on multi-label classification. Background technique [0002] With the rapid development of the Internet, various resources have grown exponentially, and a large number of tax documents have also emerged on the Internet, causing information overload in the process of people obtaining. How to effectively organize and manage tax documents is the key to solving the problem of information overload in the process of obtaining tax resources, and it is a task of great significance to taxation. [0003] In order to effectively organize and manage the massive tax documents on the Internet, tax documents are usually classified according to a subject category hierarchy or large-scale concept to better access and search these tax documents. Tax classification is the classification of various taxes according to certain standards. A...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/353
Inventor 刘均马健郑庆华张未展吴蓓
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products