Distributed mass dynamic task engine and method for processing data with same
A dynamic task and large-volume technology, applied in other database retrieval, network data retrieval, network data indexing, etc., can solve problems such as adjustment and change, the performance of a server cannot meet the requirements of the system, and the system cannot meet the requirements of task execution, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0024] Embodiment one is the example that the distributed large batch dynamic task engine of the present invention is used for the data management of hotel system; figure 2 , the hotel system needs to capture the real-time prices of the hotel throughout the year, and it needs to send about 10,000,000 messages to the hotel system, and store the captured prices in the corresponding data warehouse (ie figure 2 Data Warehouses in A-D). To achieve this goal, we use three groups of servers. The first group of servers is responsible for managing these crawling tasks. Considering that different types of tasks have different parameters and different frequencies, they are managed by different servers (ie figure 2 Task Manager A-D in . The second group of servers is a DTE gateway server (DTE Gateway) and a DTE proxy server (DTE Agent). The third group of servers is responsible for storing the data warehouse of crawled data (i.e. figure 2 Data Warehouses in A-D). When the task rea...
Embodiment 2
[0026] Embodiment 2 presents the process of using the distributed large-batch dynamic task engine of the present invention to regularly synchronize data in different data warehouses.
[0027] see image 3, when the data in different data warehouses need to be synchronized regularly, it is necessary to check whether data warehouses A1, A2, A3 and A4 have data updates every five minutes, and if so, these changed data need to be synchronized to data warehouse B. The data of data warehouse B comes from multiple data warehouses (namely data warehouses A1-A4), and only contains part of the data, not all of them.
[0028] In order to realize this function, the task manager is responsible for fragmenting the data in the data warehouse B, obtaining the data source from the data source location server according to the keyword of the data, grouping the synchronization script according to the number of data keywords of the data source , a set of keywords plus the address of the source da...
Embodiment 3
[0031] Embodiment 3 presents the process of using the distributed large-batch dynamic task engine of the present invention to grab data from different websites.
[0032] see Figure 4 , when it is necessary to grab data from different websites and store it in a unified data warehouse after analysis, the task manager submits the task to the DTE Gateway server (DTE Gateway) and the DTE proxy server (DTE Agent). The DTE proxy server is responsible for fetching data from different target websites (namely websites 1-4), and finally delivers the results to the data warehouse for updating. Due to differences in crawling tasks on different websites, we store the scripts of task execution directly on the task manager. When there is a new website to crawl, you only need to add a new task script and configuration to the task manager.
[0033] Through the description of the first to third embodiments above, the distributed large-batch dynamic task engine proposed by the present inventio...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 