Analysis preprocessing method and device for data quality, storage medium and terminal

A preprocessing device and data quality technology, applied in the field of data processing, can solve the problems of inaccurate data quality assessment and inability to distinguish the difference in the quality of data providers, achieve small search volume, solve multi-level influence relationships, and improve accuracy Effect

Inactive Publication Date: 2018-05-15
上海数据发展科技有限责任公司
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The existing technology cannot distinguish the difference in the quality of the data provider used for data evaluation, which leads to inaccurate evaluation of data quality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Analysis preprocessing method and device for data quality, storage medium and terminal
  • Analysis preprocessing method and device for data quality, storage medium and terminal
  • Analysis preprocessing method and device for data quality, storage medium and terminal

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] As mentioned in the background technology, the existing technology cannot distinguish the difference in the quality of the data provider used for data evaluation, which leads to inaccurate evaluation of data quality.

[0032] The technical scheme of the present invention utilizes the source of the tag value and the preset association relationship between the source of the tag value to form a network, and establishes a directed acyclic Bayesian network by scoring the likelihood and complexity of the network. The Bayesian network established by applying the above method can effectively evaluate the data quality of the data to be evaluated and improve the accuracy of the evaluation; in addition, the complexity of the network is considered when constructing the Bayesian network, which is different from the existing Bayesian network. In comparison, the search volume in data quality analysis is small.

[0033] In order to make the above objects, features and advantages of the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an analysis preprocessing method and device for the data quality, a storage medium and a terminal. The analysis preprocessing method for the data quality comprises the steps that label value sources of data provided by multiple suppliers are extracted; the label value sources and corresponding label values are used as nodes of a network, and association relations between every two nodes are determined according to a preset association relation among the label value sources, wherein the association relations comprise set membership and association intensity; a likelihood grade and a complexity grade of a subnetwork of the network are calculated, wherein the subnetwork is a directed loop-free network; based on the likelihood grade and the complexity grade, the corresponding subnetwork is selected as a Bayesian network and used for evaluating the accuracy rate of to-be-evaluated data. According to the technical scheme, the Bayesian network for analyzing the data quality can be established, so that the accuracy for analyzing the data quality is improved.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a data quality analysis preprocessing method and device, a storage medium, and a terminal. Background technique [0002] Data quality analysis needs to be based on the comparison with the true value of the data, but the true value of the data is often difficult to obtain in the field of big data. [0003] The current existing methods mainly determine the accuracy of data through voting by different sources of data. Statistics Different intelligence sources make rough judgments on the data. For example, for a user of a mobile device, a mobile phone operator may determine that the user is a male based on the downloaded application program (Application, app); a dating website may determine that the user is a female based on the information filled in by the user. [0004] The existing technology cannot distinguish the difference in the quality of the data provider u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/215
Inventor 汤奇峰王也蒋宇一
Owner 上海数据发展科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products