Cross-domain information system-oriented real-time on-demand data aggregation method and system

An information system and data aggregation technology, applied in the direction of structured data retrieval, electronic digital data processing, database management system, etc., can solve the problems of not being able to effectively deal with distribution, autonomous real-time and flexibility, increasing network control and information leakage , No problems such as data aggregation framework and system were raised, to achieve the effect of rapid expansion, reduction of network load pressure, and rapid component update

Pending Publication Date: 2020-11-17
SHANGHAI JIAO TONG UNIV
2 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

Such an architecture can avoid the heterogeneity problem of distributed independent systems, but it has three important defects: 1. Direct communication between software in different security domains will increase the risk of network control and information leakage; 2. It cannot effectively deal with distributed, autonomous, H...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Method used

The filter of the present invention realizes the real-time acquisition of source data and data aggregation rules based on the stream API of source Kafka cluster and purpose Kafka cluster respectively, realizes the efficient filtering of source data based on OpIndex algorithm and PhSIH parallelization mechanism, and filters Then send the matched source data to the corresponding application Topic. OpIndex is designed for scalability in terms of data volume, velocity, and data variety. It can handle high-dimensional and sparse datasets. In addition, OpIndex has low memory requirements and maintenance costs, and can be easily extended to support more complex application data interests. PhSIH is a flexible filtering parallelization method, which can dynamically adjust the number of threads performing filtering operations according to performance requirements, so as to ensure the real-time performance of data aggregation.
The present invention adopts based on the peer-to-peer (P2P) peer-to-peer VLAN network, realizes the host computer communication of crossing safe domain; With VLAN network related application containerization, adopts heartbeat mechanism, port mapping to ensure the high availability and high availability of overlay network High reliability; on the basis of the open source Kafka, implement a data filtering component for adaptive adjustment of concurrency; combine InfluxDB, develop a web visualization interface, and realize the aggregation status monitoring of multiple data source systems; containerized data aggregation components, based on Docker's remote API, private data warehouse, docker container structure and other technologies, develop web visualization interface, realize visual configuration data aggregation requirements, and automatically and dynamically customize data aggregation components.
[0062] When the convergence system expands the new source information system, on the basis of installing the access point host, the filter is automatically deployed through the configurator to realize the expansion of the convergence system.
[0074] Both the converging end and the source end include a connector, and a virtual local area network tunnel can be established between the connectors to communicate with each other, thereby forming an ove...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention provides a cross-domain information system-oriented real-time on-demand data aggregation system and method. The system comprises a convergence end deployed in a target information systemand a plurality of source ends deployed in a plurality of source information systems, a convergence end connector and a source end connector establish virtual local area network tunnel mutual communication to form an overlay network; an aggregator collects aggregation requirements of various applications in a target information system for source data to form an aggregation rule, and gathers the data meeting the aggregation rule in the source information system; and a filter performs filtering operation on the data in the source information system according to the aggregation rule of the aggregator and transmits the data meeting application requirements to the aggregator in the target information system, and the aggregator provides an interface for an upper-layer application to acquire thedata. The invention provides a real-time cross-domain data aggregation framework which can dynamically customize data aggregation requirements, is millisecond-level in data transmission delay and iseasy to expand and update.

Application Domain

Technology Topic

Image

  • Cross-domain information system-oriented real-time on-demand data aggregation method and system
  • Cross-domain information system-oriented real-time on-demand data aggregation method and system
  • Cross-domain information system-oriented real-time on-demand data aggregation method and system

Examples

  • Experimental program(2)

Example Embodiment

[0042] Example 1
[0043] A real-time on-demand data aggregation system oriented to cross-domain information systems according to the present invention includes: an aggregation terminal deployed in a target information system and multiple source terminals deployed in multiple source information systems;
[0044] The source includes a filter and a connector;
[0045] The converging end includes a connector, a converging device and a configurator;
[0046] The converging-end connector and the source-end connector establish a virtual local area network tunnel to communicate with each other to form an overlay network;
[0047] The aggregator collects the aggregation requirements of various applications in the destination information system for the source data, forms an aggregation rule, and distributes the aggregation rule to all source-end filters; aggregates the data satisfying the aggregation rule in the source information system;
[0048] The filter performs a filtering operation on the data in the source information system according to the aggregation rules of the aggregator, and transmits the data meeting the application requirements to the aggregator in the destination information system, and the aggregator provides an interface for the upper layer application to obtain data;
[0049] The configurator at the aggregation end provides a visual monitoring interface, so as to detect the working status and performance of the filters in each source information system and the state of the coverage network of the aggregation system.
[0050] Specifically, the overlay network can shield the local area network settings of various information systems at the bottom layer, and realize mutual communication between hosts located in different local area networks across security domains when the firewall allows it.
[0051] Specifically, the filter implements the filtering function of the filter by using a matching algorithm in the content-based subscription/distribution mode.
[0052] Specifically, the configurator further includes: configuring and deploying a new filter or updating a filter through the configurator.
[0053] Specifically, the overlay network includes an overlay network that implements a convergence system based on open source N2N virtual local area network technology;
[0054] The N2N virtual local area network includes a super-node program and an edge-node program;
[0055] The edge-node program is deployed on the access point server in the destination information system and the source information system to form an overlay network; the super-node program is deployed in the destination information system to coordinate the auxiliary edge-node program overlay network.
[0056] Specifically, the aggregator realizes real-time on-demand data aggregation through a Kafka cluster; the Kafka cluster includes a source Kafka cluster and a destination Kafka cluster;
[0057] The stated purpose Kafka cluster is responsible for collecting and distributing data that applies data aggregation rules and meets filter conditions;
[0058] The source Kafka cluster includes the Kafka cluster that the aggregation system itself has been running.
[0059] Specifically, the filter includes real-time acquisition of source data and data aggregation rules based on the stream API of the source Kafka cluster and the destination Kafka cluster respectively; the filtering of the source data is realized based on the OpIndex algorithm and the PhSIH parallelization mechanism, and after filtering Send the matching source data to the corresponding application.
[0060] Specifically, the filter uses a content-based subscription and/or matching algorithm in the distribution mode to implement the filtering function of the filter, and the filtered data records and all destination applications are combined into a message and sent to the aggregator, and the aggregator A decoding program component is added in . The decoding program component deconstructs the message sent by the filter into a data record and a list of applications that require aggregation of the current data record, and sends the current data record to the application that requires aggregation.
[0061] Specifically, the configurator includes viewing aggregation system information through the configurator, inputting configuration parameters of the new filter through the configurator, and the configurator generates a new filter container according to the input configuration parameters of the new filter and performs remote Deploy a new filter container on the machine;
[0062] When the aggregation system expands the new source information system, on the basis of installing the access point host, the filter is automatically deployed through the configurator to realize the expansion of the aggregation system.
[0063] According to a real-time on-demand data aggregation system oriented to a cross-domain information system provided by the present invention, the following steps are performed by using the above-mentioned real-time on-demand data aggregation system oriented to a cross-domain information system:
[0064] Step M1: the sink connector and the source connector establish a virtual local area network tunnel to communicate with each other to form an overlay network;
[0065] Step M2: The aggregation end collects the aggregation requirements of various applications in the destination information system for source data, forms aggregation rules, and distributes them to all source-end filters;
[0066] Step M3: The source filter performs filtering operations on the data in the source information system according to the data aggregation rules received from the aggregator, and transmits the data meeting the application requirements to the aggregator in the destination information system;
[0067] Step M4: The aggregator provides an interface to the upper application to obtain data that meets the application requirements.

Example Embodiment

[0068] Example 2
[0069] Embodiment 2 is a modification of embodiment 1
[0070] 1 frame structure
[0071] like figure 1 As shown, the real-time on-demand data aggregation framework proposed by the present invention consists of two parts: the sink part deployed in the destination information system and the source part deployed in the source information system, wherein the source part There can be multiple, deployed in different source information systems. Each source consists of two modules: a filter and a connector, and each sink consists of three modules: a connector, an aggregator and a configurator.
[0072] 2 module process
[0073] (1) Connector
[0074] Both the sink end and the source end contain a connector, and a virtual local area network tunnel can be established between the connectors to communicate with each other, thereby forming an overlay network. The overlay network can shield the complex LAN design of each information system at the bottom layer, and with the permission of the firewall, it can realize the mutual communication between hosts located in different LANs across security domains. The communication between other components in the aggregation framework is transmitted by the connector, and the upper-level components can simplify the network model and framework design when designing. like figure 1 As shown in , the logical network between the filter and the aggregator is represented by a dotted line, and the actual network traffic is carried by a connector, which is represented by a solid black line.
[0075] (2) Aggregator
[0076] The aggregator in the converging terminal undertakes two important functions. One is to collect the source data aggregation requirements of various applications (such as machine learning applications, database OLAP applications) in the destination information system, express them in aggregation rules, and distribute them to Filters for all sources; the second is to aggregate data that meets the application aggregation rules in all source information systems, and provide interfaces to upper-layer applications to obtain data. Aggregators are the main components that enable on-demand access to source data.
[0077] (3) filter
[0078] The main function of the filter in the source is to perform filtering operations on the data in the source information system according to the data aggregation rules received from the aggregator, and transmit the data that meets the application requirements to the aggregator in the destination information system. The present invention adopts the matching algorithm in the content-based subscription/distribution mode to realize the filtering function of the filter.
[0079] The publish/subscribe model is a distributed system communication paradigm, which can realize the decoupling of communication parties in terms of time, space and synchronization. The content-based publish/subscribe model can provide users with fine-grained expression capabilities. Users can define their interested conditions based on the content of events (also called messages), and realize fine-grained event distribution. The matching algorithm is the core component of the content-based publish/subscribe model. The server will receive each event and compare it with the user's subscription, and send the event to the user whose subscription conditions are met.
[0080] (4) Configurator
[0081] On the aggregation side, the configurator is mainly responsible for providing a visual monitoring interface to the administrator of the data aggregation system. Administrators can use the configurator to monitor the working status and performance of filters in each source information system and the status of the coverage network of the aggregation system. In addition, administrators can also configure and deploy new filters or update old filters through the configurator. filter.
[0082] 3 system realization
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Micro-grid control method, computer equipment, storage medium and micro-grid system

InactiveCN112350385AReduce data transferSave network resourcesAc-dc network circuit arrangementsAc network circuit arrangementsComputer equipmentPower grid
Owner:恒创锦思(深圳)科技有限公司

Classification and recommendation of technical efficacy words

Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products