Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data consistency evaluation method based on data pattern clustering

A data model and consistency technology, applied in database update, structured data retrieval, special data processing applications, etc., can solve problems such as data engineers do not understand the business, the application system is complicated, and it is impossible to specify the standard model of whether the data is reasonable.

Pending Publication Date: 2020-04-21
UESTC COMSYS INFORMATION
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In today's big data era, there are many types of businesses and complex application systems. In the process of system construction, the importance of data quality is often ignored, and insufficient measures are not taken, resulting in data quality problems with the gradual deepening of the application of the system and data. A little bit of exposure, such as data validity, accuracy, consistency, etc.
In fact, this evaluation method is very inefficient and cannot solve the following problems:
[0006] If the field allows the existence of multiple modes, the data engineer does not understand the business, and does not know which modes exist reasonably. At this time, it is impossible to specify a standard mode for judging whether the data is reasonable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data consistency evaluation method based on data pattern clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] In order to facilitate those skilled in the art to understand the technical content of the present invention, the content of the present invention will be further explained below in conjunction with the accompanying drawings.

[0017] Firstly, the usage scenario of the present invention is introduced, and the present invention can be used in any scenario where it is necessary to evaluate whether the data value pattern in a field meets the requirements. Especially in the case that the data engineer cannot know the standard mode of the field to be tested in advance, the invention can help the data engineer to easily evaluate the field data consistency.

[0018] Such as figure 1 As shown, assume that there is a two-dimensional table T, which has a phone number field F, and we want to find the data that does not conform to the phone number format. Then we can use the following pattern clustering algorithm to carry out pattern clustering on the value of field F.

[0019] I...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data consistency evaluation method based on data mode clustering. The method is applied to the big data analysis processing field, and solve the problem that in the prior art, consistency evaluation of multi-mode coexistence fields cannot be achieved. The method comprises the following steps: firstly, carrying out pattern clustering on to-be-evaluated fields read from a database according to a determined pattern clustering algorithm, then, determining a standard pattern in clustered patterns, and finally, carrying out pattern matching on values of the to-be-evaluatedfields by adopting the standard pattern to obtain dirty data. The method is especially suitable for application scenarios in which data engineers do not know about services and which modes exist reasonably are difficult to determine.

Description

technical field [0001] The invention belongs to the field of big data analysis and processing, and in particular relates to a consistency evaluation technology for structured data. Background technique [0002] Structured data, simply put, is a database. It is easier to understand when combined with typical scenarios, such as enterprise ERP and financial systems; medical HIS database; education card; government administrative approval; other core databases, etc. [0003] It basically includes high-speed storage application requirements, data backup requirements, data sharing requirements, and data disaster recovery requirements. [0004] Structured data, also known as row data, is logically expressed and realized by a two-dimensional table structure, strictly follows the data format and length specifications, and is mainly stored and managed by a relational database. The opposite of structured data is unstructured data that is not suitable for representation by two-dimensi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/23
CPCG06F16/2365
Inventor 唐雪飞蒲高飞黄永鑫王东方胡茂秋
Owner UESTC COMSYS INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products