Unlock instant, AI-driven research and patent intelligence for your innovation.

Duplicate data identification method and device

A technology for repeating data and identifying methods, which is applied in the field of data processing and can solve problems such as the impossibility of manual identification.

Inactive Publication Date: 2017-01-04
ALIBABA GRP HLDG LTD
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Or there are developers who are familiar with the business on both sides, so to understand the duplicate data on both sides of the business, there is no good way to solve this problem at the platform level
[0004] However, this method has the following problems: it is necessary to manually get familiar with all the data to fully identify the duplicate data on the large-scale data processing platform; when the data on the large-scale data processing platform grows to a certain level, manual identification is no longer necessary. possible

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplicate data identification method and device
  • Duplicate data identification method and device
  • Duplicate data identification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar modules or modules having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are only for explaining the present application, and should not be construed as limiting the present application. On the contrary, the embodiments of the present application include all changes, modifications and equivalents falling within the spirit and scope of the appended claims.

[0017] figure 1 It is a schematic flow chart of a method for identifying duplicate data proposed in an embodiment of the present application, the method comprising:

[0018] S11: Obtain the required similarity feature value in the current situation, and the similarity feature value is obtained by performing similarity calculation on the corresponding feature...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a duplicate data identification method and a duplicate data identification device. The duplicate data identification method comprises the following steps: acquiring a similarity characteristic numerical value which is required in current situation, wherein the similarity characteristic numerical value is obtained by performing similarity calculation on corresponding characteristics of two groups of data to be identified; calculating the data similarity numerical value between the two groups of data to be identified by taking the similarity characteristic numerical value as a parameter of a preset similarity model; identifying duplicate data according to the data similarity numerical value. According to the method, the duplicate data can be identified automatically.

Description

technical field [0001] The present application relates to the technical field of data processing, and in particular to a method and device for identifying duplicate data. Background technique [0002] In the era of big data, more and more businesses within the enterprise need to use big data technology to analyze and support the business. However, different business teams have many similar business logics in the process of analyzing the business. The communication is not timely, resulting in a lot of similar data on the large-scale offline data processing platform, and with the development of business, there will be more and more similar data, which not only wastes the storage resources of the large-scale offline data processing platform, but also The computing resources of the large-scale offline data processing platform are wasted. [0003] In the existing technology, it is generally the developers who see the similar data of other business teams before discovering that t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/00G06F16/215
Inventor 王丰金
Owner ALIBABA GRP HLDG LTD