Visual data cleaning script cleaning process analysis system

A process analysis system and data cleaning technology, which is applied in climate sustainability, energy-saving computing, software maintenance/management, etc., can solve problems such as unsupportable, inability to visualize data cleaning script cleaning process, and inability to display multi-table changes at the same time, etc. Achieve the effect of promoting understanding and facilitating code debugging

Pending Publication Date: 2022-05-27
ZHEJIANG UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are many works aimed at visualizing data lineage. For example, TACO is a visual comparison tool that can show the difference of tabular data over time, but this tool cannot visualize the cleaning process of data cleaning scripts completed in programming languages; Datamations can use animated Formally explain the steps in the data cleaning process, but the tool can only reveal the data changes of one table, and cannot display the changes of multiple tables at the same time
In addition, the above tools can only visualize a small part of data transformation operations, but cannot support complex operations such as transformations involving table structures (such as Pivot, Unpivot, etc.)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual data cleaning script cleaning process analysis system
  • Visual data cleaning script cleaning process analysis system
  • Visual data cleaning script cleaning process analysis system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The invention provides a visual data cleaning script cleaning process analysis system, such as figure 1 shown, including:

[0030]A program adapter and a visualization generator, the program adapter includes a program executor, a code parser, and a data conversion inferrer; wherein, the program executor includes a program processor and a program interpreter, and the program processor is used to mark the raw table data in the cleaning script. table information data with intermediate table information, the program interpreter is used to detect and save the intermediate table information, and column status information in each intermediate table; the intermediate table information includes the name of the intermediate table, the number of rows and the number of columns , the name of the intermediate table consists of the row number of the table information data in the original table data cleaning script and the variable name of the intermediate table in the table informatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a visual data cleaning script cleaning process analysis system. The system comprises a program executor used for detecting and storing intermediate table information and column state information in each intermediate table; the code analyzer is used for extracting input table information, output table information and functions of table information data for performing data conversion operation on the intermediate table; the data conversion inference device is used for determining data conversion operation types and parameters of the table information data and finally generating data conversion operation description information; the cleaning layout constructor is used for matching intermediate table information to each node in the directed acyclic graph framework to obtain an intermediate table directed acyclic graph; and the graph icon drawing device is used for distributing the data conversion operation graph to the nodes to form edges of the directed acyclic graph so as to obtain a data cleaning process visualization graph. The system can display the change of multiple tables in the data cleaning script and visualize complex data conversion operation.

Description

technical field [0001] The invention belongs to the field of program visualization, and relates to a visual data cleaning script cleaning process analysis system. Background technique [0002] Data Wrangling is a process of organizing complex and messy data into an ideal data format through cleaning and transformation operations. It is an important pre-step for tasks such as data access, data modeling, and data visual analysis. Two-dimensional data table is an effective means of organizing data. People widely use various tables in communication, scientific research and data analysis activities. Because the original table often contains "dirty" data, or the data format, content, etc. do not meet the expected goals, data workers must clean the table. [0003] Data cleaning involves various data transformation operations, such as removing duplicate rows, filling missing values, splitting composite columns, etc. Using programming languages ​​such as R and Python to write speci...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/70
CPCG06F8/70Y02D10/00
Inventor 巫英才熊凯傅四维
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products