Scalable analysis platform for semi-structured data

A data and data source technology, applied in the field of scalable interactive database platform, can solve problems such as non-interactive, slow Hadoop, and difficulty in finding personnel

Active Publication Date: 2015-12-02
AMAZON TECH INC
View PDF4 Cites 55 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although Hadoop is flexible, using Hadoop requires specialized technical administrators and programmers with deep knowledge, which are often difficult to find
Also, Hadoop is too slow to be interactive

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scalable analysis platform for semi-structured data
  • Scalable analysis platform for semi-structured data
  • Scalable analysis platform for semi-structured data

Examples

Experimental program
Comparison scheme
Effect test

example

[0505] Examples of BI, AI, MI and VI. Consider a tweet similar to the one above, with the addition of a "retweet_freq" attribute that records how many times a tweet was retweeted in a day:

[0506]

[0507]

[0508] The schema for these records is:

[0509]

[0510] The JSON Schema for these records will be

[0511]

[0512] If retweet_freq is not considered a mapping, then the relationship schema is:

[0513] Root(text: str,

[0514] user.id: num, user.screen_name: str,

[0515] tags :join_key,

[0516] retweet_freq.2012-12-01:num,

[0517] retweet_freq.2012-12-02:num,

[0518] retweet_freq.2012-12-03:num,

[0519] retweet_freq.2012-12-04:num,

[0520] retweet_freq.2012-12-05:num)

[0521] Root. tags (id_jk: join_key,

[0522] index: int,

[0523] val: str)

[0524] In this case, the example records above would populate these relationships as follows:

[0525] Root:

[0526] ("Love #muffins...", 29471...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A data transformation system includes a schema inference module and an export module. The schema inference module is configured to dynamically create a cumulative schema for objects retrieved from a first data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. Dynamically creating the cumulative schema includes, for each object of the retrieved objects, (i) inferring a schema from the object and (ii) selectively updating the cumulative schema to describe the object according to the inferred schema. The export module is configured to output the data of the retrieved objects to a data destination system according to the cumulative schema.

Description

[0001] Cross References to Related Applications [0002] This disclosure claims priority to US Patent Application No. 14 / 213,941, filed March 14, 2014, and also claims the benefit of US Provisional Application No. 61 / 800,432, filed March 15, 2013. The entire disclosures of the above-mentioned applications are incorporated herein by reference. technical field [0003] The present disclosure relates to a scalable interactive database platform, and more particularly, to a scalable interactive database platform for semi-structured data incorporating storage and computation. Background technique [0004] The background description provided herein is for the purpose of generally presenting the context of the disclosure. The work of the presently mentioned inventors (insofar as described in this Background section) and aspects of the background description which may not have been prior art qualification at the time of filing are neither expressly nor implicitly considered to be p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/00
CPCG06F17/30292G06F17/30917G06F16/22G06F16/86G06F16/235G06F16/254G06F16/211
Inventor D·特思罗吉安尼斯N·A·宾克特S·哈里佐保罗斯M·A·沙赫B·A·索维尔B·D·卡普兰K·R·美亚
Owner AMAZON TECH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products