A taxpayer tax registration address information clustering method based on a K-means algorithm model

A k-means algorithm and tax registration technology, applied in the computer field, can solve problems such as inability to identify address information

Inactive Publication Date: 2019-01-25
HEBEI AISINO TECH CO LTD
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to solve the technical problem that the address information in the taxpayer enterprise registration address registration cannot be accurately identified in the prior art, the present invention proposes a taxpayer tax registration address information clustering method based on the K-means algorithm model. The technical solutions adopted are as follows:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A taxpayer tax registration address information clustering method based on a K-means algorithm model
  • A taxpayer tax registration address information clustering method based on a K-means algorithm model
  • A taxpayer tax registration address information clustering method based on a K-means algorithm model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] A clustering method for taxpayer tax registration and registration address information based on the K-means algorithm model. First, the registration address is subjected to natural language semantic mining processing, including the expansion of thesaurus and word segmentation operations. For the result of address word segmentation, use the space vector model (VSM, Vector Space Model) to convert the text vector, and then use the partition-based clustering algorithm in the clustering algorithm——K-means algorithm to cluster the addresses converted into text vectors. Select the appropriate number K of clusters in an unsupervised manner, and specify the structure of the clustering results as required. Among them, word segmentation: the word segmentation process is the process of dividing Chinese character sequences into mutually independent words according to semantics. According to research, the feature granularity of words has a better effect than that of character features...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a taxpayer tax registration address information clustering method based on a K-means algorithm model, and belongs to the technical field of computers. First, the registration address is subjected to semantic mining of natural language, including the expansion of thesaurus and the operation of word segmentation. For the result of address segmentation, a Vector Space Model (VSM) is utilized to transform the text vector, and then the K-means algorithm is adopted to convert to the text vector address for clustering, an unsupervised mode is adopted to select the appropriate number K of clusters, and the structure is specified based on a clustering result and according to the need.

Description

technical field [0001] The invention relates to a taxpayer tax registration address information clustering method based on a K-means algorithm model, which belongs to the field of computer technology. Background technique [0002] Currently, in the face of taxpayer information analysis, there are ambiguous addresses and inaccurate filling in the taxpayer business registration address registration, which makes it impossible to use matching methods to determine whether multiple taxpayers have registered at the same address. Moreover, due to the heterogeneity of Chinese characters, it is more difficult to identify the same address, which leads to the inability to accurately identify the address information in the taxpayer's business registration address registration. Contents of the invention [0003] In order to solve the technical problem that the address information in the taxpayer enterprise registration address registration cannot be accurately identified in the prior ar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/28
Inventor 杨为琛伺彦伟张婷李慧祁洪波郭冰洁徐爱华
Owner HEBEI AISINO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products