Incremental data cleaning method based on memory mapping

A technology of incremental data and memory, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of restricting performance potential, poor utilization of T-tree cache, etc., and achieve the effect of reducing design cost

Inactive Publication Date: 2012-06-13
UESTC COMSYS INFORMATION
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the cache utilization rate of T-tree is very po

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Incremental data cleaning method based on memory mapping
  • Incremental data cleaning method based on memory mapping
  • Incremental data cleaning method based on memory mapping

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] A method for cleaning incremental data based on memory images, comprising the steps of:

[0045] The first step is to load all the databases into the memory, select the table that needs to be indexed in the memory database, and build an MDB-tree index structure for the key values ​​of the columns that need to be indexed in the table:

[0046] The first is to build a similar B+ tree: Insert keywords into the tree according to the B+ tree building rules. Each node does not store data, but only stores keywords. The keywords in each node are ordered and contain the number of keywords + 1 points to A pointer to a child node.

[0047] After the B+ tree is established for all keywords, the leaf nodes are filled with a HASH table: select a hash function with a low calculation amount for each keyword to calculate the hash value of the keyword, and then record the keyword and corresponding data in the memory address Put it in the HASH table, if the HASH address conflicts, store ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an incremental data cleaning method based on memory mapping. Data of a data source to be cleaned are loaded into a memory, and a memory MDB (meta database)-tree is built by the aid of a HASH list; quick data inquiry is completed by the aid of a top-down strategy, and operations of data insertion, deletion and the like are completed on the basis; views in the memory mapping are standardized into AUSPJ (aggregation, union, selection, projection and join) sections and D (differential) sections, and incremental maintenance of nodes is realized by the aid of relational algebraic operation; and finally data cleaning operation is realized by the aid of a result collection merging algorithmic method. The incremental data cleaning method can be widely applied to industries and fields with high data quantities and data quality requirements, such as education, telecommunication, governments and the like.

Description

technical field [0001] The invention belongs to the field of computer information analysis and data processing, in particular to a method for cleaning incremental data based on memory images. Background technique [0002] When building an information system, even with good design and planning, there is no guarantee that the quality of the stored data will meet the user's requirements in all cases. User entry errors, business mergers, and changes in the corporate environment over time can all affect the quality of the data stored. Therefore, it is necessary to use metadata to represent data quality. Data consistency (consistency), correctness (correctness), completeness (completeness) and minimum (minimality) four indicators are used to illustrate the quality of data. The data cleaning process is to solve the problems of different representations of the same concept, and to eliminate schema conflicts and similar duplicate records when integrating multiple data sources. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 唐雪飞陈科汪海良李应洪
Owner UESTC COMSYS INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products