Method and device for managing data set

A technology for data collection and management equipment, which is applied in the Internet field, can solve the problems of in-situ expansion of old users' data and cannot be dispersed, and achieve the effects of flexible splitting, cost saving, and uniform distribution

Active Publication Date: 2014-12-31
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the current technical solution can well disperse the pressure from new users to the new database, but it cannot cope with the in-situ expansion of old user data
For userids smaller than the limit value M, the data growth has been pressed on the N machines from 0 to N-1, and cannot be dispersed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for managing data set
  • Method and device for managing data set
  • Method and device for managing data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0027] figure 1 A schematic diagram of a management device for managing data sets according to one aspect of the present invention is shown; Specifically, the detection means 11 detects whether the trigger condition for splitting the data set is met, wherein the data set includes one or more data subsets stored in the current storage means; the determination means 12 when the trigger condition is satisfied , according to the subset identification information corresponding to at least one data subset among the one or more data subsets, determine a preferred storage device corresponding to the at least one data subset from a plurality of candidate storage devices, wherein the The backup of the data set is stored in the candidate storage device; the updating device 13 updates the stored information of the at least one subset of data in the current storage devic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention aims to provide a method and a device for managing a data set. The method comprises the steps of determining an optimal storage device corresponding to a data subset by a management device according to subset identification information corresponding to the data subset when a trigger condition for carrying out sharding processing on the data set is met; updating storage information corresponding to the subset identification information according to storage information, which corresponds to the data subset, in a current storage device and by combining related information of the optimal storage device. Compared with the prior art, the storage information corresponding to the subset identification information is updated through the subset identification information corresponding to the data subset by combining the related information of the optimal storage device, thus segmental multi-stage database sharding on a target data subset can be realized, the problem that regular user data are subjected to in-place expansion can be solved, the uniform distribution of data is ensured, sub databases can be simply maintained, the expandability and the load balancing of the sub databases are realized, the sharding is flexible, and the cost is reduced.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a technology for managing data sets. Background technique [0002] In order to cope with the continuous and rapid growth of the user database, the original database needs to be fragmented to obtain higher throughput, better performance and larger storage capacity. [0003] At present, for the database sub-database, the method of sub-section modulo is mainly used. For example, since in most cases, the user ID (userid) is the foreign key of all user-related data, splitting the database according to the user ID can effectively avoid cross-database transactions and cross-database connection tables, making the same user Data is hashed on unique shards. When the userid is used as the segmentation standard, when the userid is less than an upper limit value M, take the modulus according to N, and hash the userid to 0, 1, ..., N-1, a total of N sub-databases; if the data volume con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/214G06F16/217G06F16/2453G06F16/95
Inventor 刘泽胤曾黎
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products