MES-oriented mass data redundancy elimination method and system

A mass data and redundant technology, applied in the field of data processing, can solve the problems of weak redundant capacity and low processing efficiency of massive similar duplicate data, so as to increase redundant capacity, reduce time complexity, and improve overall efficiency Effect

Active Publication Date: 2021-01-01
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional data de-redundancy methods have problems such as low processing effic

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MES-oriented mass data redundancy elimination method and system
  • MES-oriented mass data redundancy elimination method and system
  • MES-oriented mass data redundancy elimination method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0074] The object of the present invention is to provide an MES-oriented mass data de-redundancy method and system, so as to improve the data de-redundancy capability while improving the MES data processing efficiency.

[0075] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodime...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an MES-oriented mass data redundancy elimination method and system. According to the MES-oriented mass data redundancy removal method and system, a minimum hash algorithm is adopted to compress preprocessed data to obtain a minimum hash signature, an LSH (Local Sensitive Hash) algorithm is adopted to avoid similarity calculation, and the data is subjected to bucket division according to hash values, so the time complexity in the process of finding out similar repeated data from mass data is greatly reduced, and the overall efficiency of data processing is improved. Moreover, the Jaccard similarity is used as a screening condition, the data of which the Jaccard similarity is greater than a threshold value is defined as potential similar data, and then similarity detection from distribution to overall is performed on the potential similar data to remove similar repeated data, so the redundancy removal capability is improved.

Description

technical field [0001] The invention relates to the field of data processing, in particular to an MES-oriented massive data redundancy removal method and system. Background technique [0002] Similar duplicate data means: there are two pieces of data 1R and 2R in the database, their content is the same or similar, and both correspond to the same real entity, then the data pair 1R and 2R are similar duplicate data. There may be many pairs of similar and repeated data in the actual database, their existence reduces the quality of the data, may hinder the normal operation of the system, and even affect the correctness of the enterprise information management system (MES) decision-making. [0003] In the industry, there is a large amount of similar duplicate data. The reason why similar duplicate data exists is that in the process of data acquisition or data storage, the same data has different manifestations, such as: misspelling of the same word, printing errors, inconsistent...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/215
CPCG06F16/215
Inventor 柴森春黄经纬王昭洋崔灵果李慧芳姚分喜张百海
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products