Data integration for distributed and massively parallel processing environments

a data integration and processing environment technology, applied in multi-dimensional databases, database management systems, instruments, etc., can solve problems such as conflict, increasing number of challenges that are not amenable to increasing, and traditional database techniques that have not generally focused on challenges

Inactive Publication Date: 2022-08-11
THOUGHTSPOT INC
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The present invention provides a user interface and supporting data integration structure and process that allows users to easily build set-based transformation rules for integrating data from multiple sources. Users can understand the outcome of each transformation rule as it is applied. The invention also includes stateless agents that assist in extracting, loading, and transforming data in a highly distributed network of data systems, resulting in increased efficiency and speed without requiring that all data be kept in a single repository. The metadata rules can be maintained in any convenient location accessible by one or more controller components. Multiple agents can be implemented within a single server farm for load balancing and further efficiency. Overall, the invention provides an improved data integration solution.

Problems solved by technology

However, traditional database techniques have not generally focused on the challenges that result from trying to mine data from large repositories that are not organized for such searching, for linking what is found to other data, or for reusing and repurposing the data without massive effort.
An increasing number of challenges are not amenable to solution by these conventional techniques.
This conflict is magnified when the objective is to apply data analytics to unstructured or diversely structured datasets.
While sufficient data integration can overcome at least some of this conflict, conventional approaches to such data integration typically result in unworkable complexity and a lack of transparency that hinders or prevents successful debugging of transformation logic.
The result is that attempts at efficient integration of large datasets from diverse sources has been largely unsuccessful.
Further, data integration has typically involved moving large amounts of data across relatively long distances.
Given the confidential and proprietary nature of such data, these movements have historically run the risk of exposing confidential information to third parties.
While various encryption techniques have been used, the challenges of encrypting large data sets for transmission across long distances can be daunting.
Compression techniques have been used in the past, but again the challenges can become daunting because of the volume of compression typically needed and the security risks involved concerning both privacy and confidentiality.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data integration for distributed and massively parallel processing environments
  • Data integration for distributed and massively parallel processing environments
  • Data integration for distributed and massively parallel processing environments

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036]Referring first to FIG. 1A, an embodiment of the environment 100 in which the present data integration invention operates can be better appreciated. As will be appreciated in greater detail hereinafter, data integration in accordance with the present invention comprises two related aspects: in a first aspect, a job flow must be developed based on the particular data which a user seeks to integrate, taking into account the sources of the data, their formats, and their geographical location, among other things. The development of a job flow involves development of a data flow for each such source, typically involving one or more extract / load / transform (sometimes “E / L / T” hereinafter) functions, together with any necessary E / L / T functions appropriate to move the data or results to a target. Then, in a second aspect, following the development of a job flow, the data integration job must execute efficiently, taking into account appropriate security, audit, and other data transfer co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Methods and systems for large scale data integration in distributed or massively parallel environments comprises a development phase wherein the results of a proposed jobflow can be viewed by the user during development, including the results of upstream units where the data sources and data targets can be any of a variety of different platforms, and further comprises the use of remote agents proximate to those data sources and data targets with direct communication between the associated agents under the direction of a topologically central controller to provide, among other things, improved security, reduced latency, reduced bandwidth requirements, and faster throughput.

Description

RELATED APPLICATION[0001]This application is a Continuation of U.S. patent application Ser. No. 16 / 611,199 filed on Nov. 5, 2019, which claims priority to and the benefit of International Patent Application No. PCT / US2018 / 031220 filed May 4, 2018, which claims priority to U.S. provisional patent application No. 62 / 502,594 filed 5 May 2017, each of which are incorporated herein by reference in their entirety.BACKGROUNDField of the Invention[0002]The present invention relates generally to data integration in either distributed or massively parallel processing environments, and, in one aspect, more particularly relates to interactive development of extract, load and transform data flows while, in another aspect, relates to the use of geographically dispersed agents to simplify extract, load and transform processes with enhanced security and improved data compression.Related Art[0003]More and more, data analysts require the use of data outside the control of their own organizations. Gre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F16/25G06F16/28G06F21/62
CPCG06F16/254G06F21/6218G06F16/252G06F16/283G06F7/00G06F16/25
Inventor PUNURU, RAVINDRAVYAS, SANJAYTUMATI, SRIPATHI
Owner THOUGHTSPOT INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products