Taxpayer industry classification method based on noise label learning

A classification method, taxpayer technology, applied in neural learning methods, text database clustering/classification, instruments, etc., can solve problems such as the decline of industry classification accuracy

Active Publication Date: 2021-05-07
XI AN JIAOTONG UNIV
View PDF6 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in reality, limited by the professional knowledge and experience of the reporting personnel, there is a lot of noise in the taxpayer industry cat

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Taxpayer industry classification method based on noise label learning
  • Taxpayer industry classification method based on noise label learning
  • Taxpayer industry classification method based on noise label learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0135] Select the taxpayer information registered in the national tax of a certain region from 2017 to 2019, including 97 major industry categories. The present invention will be described in further detail below in combination with experimental cases and specific implementation methods with reference to the accompanying drawings. All technologies implemented based on the content of the present invention belong to the scope of the present invention.

[0136] like figure 1 As shown, in the specific implementation of the present invention, the taxpayer industry classification based on noise label learning includes the following steps:

[0137] Step 1. Taxpayer Text Information Processing

[0138] A lot of useful information in the taxpayer industry information registration form is stored in the database in the form of string text. The five columns of {taxpayer name, main business, concurrent business, business mode, business scope} are extracted from the registered taxpayer i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A taxpayer industry classification method based on noise label learning comprises the steps that firstly, text information to be mined in taxpayer industry information is extracted for text embedding, and feature processing is conducted on the embedded information; secondly, non-text information in the taxpayer industry information is extracted and coded; thirdly, a BERT-CNN deep network structure conforming to the taxpayer industry classification problem is constructed, and the number of layers of the network, the number of neurons of each layer and the input and output dimensions are determined according to the processed feature information and the target category number; then, the constructed network is pre-trained through comparative learning, nearest neighbor semantic clustering and self-label learning in sequence; finally, a noise modeling layer is added on the basis of the constructed deep network, modeling is carried out on noise distribution through network self-trust and noise label information, and model training is carried out based on noise label data; and finally, the deep network in front of the noise modeling layer is taken as a classification model, and taxpayer industry classification is performed based on the model.

Description

technical field [0001] The invention belongs to the technical field of text classification methods with noise labels, in particular to a taxpayer industry classification method based on noise label learning. Background technique [0002] In recent years, with the rapid development of the national economy and the continuous prosperity of the market economy, the division of labor in my country's enterprises has also been continuously refined. Studying the industry classification of corporate taxpayers is the basic work of tax source classification management. It is the key prerequisite for improving the electronic level of tax file management and implementing information water management. It is to promote industry modeling and carry out tax source classification monitoring, early warning, analysis and implementation of professional tax assessment. important support. The "Taxpayer Classification and Diversity Management Measures" issued by the State Administration of Taxation ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F40/117G06F40/289G06K9/62G06N3/04G06N3/08G06Q40/00
CPCG06F16/35G06Q40/10G06F40/289G06F40/117G06N3/084G06N3/045G06F18/2415G06F40/30G06F40/129
Inventor 郑庆华赵锐阮建飞董博师斌
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products