A method for managing multi-source heterogeneous data based on a meta model
By adopting a multi-source heterogeneous data management method based on meta-models, dynamic monitoring of file directory changes and automatic generation of search conditions are achieved, solving the problems of high coupling and low flexibility in existing technologies, and realizing efficient data management and flexible data query.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU EBOYLAMP ELECTRONICS CO LTD
- Filing Date
- 2023-10-30
- Publication Date
- 2026-06-26
AI Technical Summary
Existing methods for managing multi-source heterogeneous data suffer from problems such as high coupling, low flexibility, high development costs, and reduced scanning efficiency when faced with new data types or structural changes, resulting in low efficiency in data parsing and storage.
By adopting a meta-model-based approach, the data frame format is defined on the client side and mapped to the server side. Combined with file monitoring, parsing, and database storage components, changes in file directories are dynamically monitored, and search conditions are automatically generated based on meta-model information, enabling efficient data management and flexible querying.
It improves the sensitivity and computational efficiency of file monitoring, reduces the need to traverse the file system, enhances the timeliness of data processing, and provides flexible data query methods.
Smart Images

Figure CN117472858B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of multi-source heterogeneous data management, and specifically relates to a multi-source heterogeneous data management method based on a meta-model. Background Technology
[0002] Multi-source heterogeneous data refers to data originating from a wide range of sources, including different systems, devices, and applications, with varying structures and formats. Current mainstream multi-source heterogeneous data management methods typically employ independent business processing systems based on the type of data. During data generation or after data is formatted into files, the data parsing and import process is initiated via function calls or message notifications. Data parsing and import generally involves defining the database table structure, data parsing, data import, and defining database query statements for data retrieval, depending on the data format.
[0003] Current mainstream multi-source heterogeneous data management technologies rely on business system callers scanning the file system to find the files that need to be parsed. This method requires integration with external business systems, and new call flows are needed for each new data type. The file system scanning method suffers from decreased scanning efficiency as the number of files increases. The process involves defining database table structures, data parsing, data insertion, defining database query statements, and data retrieval steps, all based on the data structure and requiring the writing of corresponding database table structure definition statements, data parsing processing functions, database insertion statements, and database query statements. If a new data type is added or the existing data structure changes, corresponding modifications or new development are required, creating a "one-size-fits-all" problem, leading to high coupling, low flexibility, and high development costs. As the number of multi-source heterogeneous data types increases and data structures change, the efficiency of data parsing and insertion often decreases, negatively impacting subsequent data applications. Summary of the Invention
[0004] The purpose of this invention is to address the problems raised in the background art by proposing a multi-source heterogeneous data management method based on a meta-model.
[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0006] This invention proposes a multi-source heterogeneous data management method based on a meta-model, comprising:
[0007] Users define the data frame format of the metamodel in the client interface, map the data frame format of the metamodel to the metamodel definition component located on the server, associate the data in the metamodel with the file storage directory under the metamodel definition component, and store the association relationship in the database;
[0008] The server-side file monitoring component recursively adds folders under the file storage directory to the monitoring directory set. When a new file storage directory or a new folder is created, the new file storage directory or new folder is added to the monitoring directory set, and a notification is sent when a new file is created.
[0009] The server-side file parsing component responds to the notification by parsing the new file based on the fields in the data frame format;
[0010] The server-side data import component saves the parsed results to the database;
[0011] Users select fields and search criteria on the client side to search for files. The client sends the search request to the data retrieval component on the server side. The data retrieval component parses the search criteria, calls the database interface to perform the query, and then sends the returned results back to the client. After receiving the data, the client displays it on the interface.
[0012] Preferably, the data frame format of the metamodel includes model ID, model name, frame length, identifier length and identifier value, as well as the name, type and length of each keyword field.
[0013] Preferably, the metamodel definition component supports adding, deleting, modifying, and querying data frame formats. By performing these operations through the client interface, requests are sent to the metamodel definition component on the server. Upon receiving the request, the metamodel definition component adds, modifies, deletes, and retrieves records with corresponding fields from the database.
[0014] Preferably, when a new file storage directory is added to the monitoring directory set and a new file or folder is generated in that file storage directory, the new file storage directory is added to the monitoring directory set, and then the new file storage directory is traversed. When a new folder is encountered during traversal, the new folder is added to the monitoring directory set, and a notification is issued when a new file is encountered during traversal.
[0015] Preferably, the server-side file parsing component responds to the notification by parsing the new file based on fields in the data frame format, including:
[0016] After receiving a notification from the file listening component, the file parsing component reads the data from the new file into the cache in sequence. Based on the frame length, identifier length, and identifier value of the data in the cache, it finds the corresponding data frame format and then parses the data according to the definition of each field in the data frame format.
[0017] Preferably, the server-side data import component saves the parsed results to the database, including:
[0018] Based on the fields in the data frame format, define the corresponding data table in the database, and add two table entries: offset position and frame length. The parsing result is divided into fields according to the table entries, and corresponds one-to-one with the table entries.
[0019] Preferably, the user selects fields and search criteria on the client to search for files. The client sends the search request to the server's data retrieval component. The data retrieval component parses the search criteria, calls the database interface to perform the query, and then sends the returned results back to the client. After receiving the data, the client displays it on the interface, including:
[0020] The user first selects the file to be searched on the client interface. Based on the selected file, the client finds the corresponding data frame format, displays the searchable fields to the user, and the user selects search criteria. The client then sends the user's search request to the server's data retrieval component. After receiving the search request, the data retrieval component parses the client's search criteria and automatically constructs a field search SQL statement. After the search SQL statement is automatically constructed, it calls the database interface to perform a data query and sends the query results back to the client. The client then displays the query results on the interface.
[0021] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0022] 1. This meta-model-based multi-source heterogeneous data management method maps the data frame format of the meta-model defined by the client to the meta-model definition component located on the server, associates the data in the meta-model with the file storage directory, and stores the association relationship in the database. This enables the file monitoring component on the server to monitor folders or files in the file storage directory or to generate new file storage directories. The newly generated files are parsed by the file parsing component, and the parsing results are stored in the database by the data storage component, thereby realizing data management.
[0023] 2. The file event listening component of this meta-model-based multi-source heterogeneous data management method passively receives operating system event notifications by being triggered by file events. Compared with the traditional polling technique for judging file changes, it greatly improves sensitivity and computational efficiency, does not require traversing the file system, and has higher timeliness in data processing.
[0024] 3. The retrieval conditions of this meta-model-based multi-source heterogeneous data management method can be automatically generated based on meta-model information, and the retrieval scope is specified by the user. The combination of the two can provide a general data query method that is more flexible. Attached Figure Description
[0025] Figure 1 This is a block diagram of the multi-source heterogeneous data management method based on meta-model of the present invention;
[0026] Figure 2 This is a schematic diagram illustrating the changes in the monitored directory set according to the present invention;
[0027] Figure 3 This is a schematic diagram of the query page of the client of this invention. Detailed Implementation
[0028] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0029] It should be noted that when a component is referred to as being "connected" to another component, it can be directly connected to the other component or there may be an intervening component. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the specification of this application is for the purpose of describing particular embodiments only and is not intended to limit the application.
[0030] like Figure 1-3 As shown, a multi-source heterogeneous data management method based on a meta-model includes:
[0031] S1. The user defines the data frame format of the metamodel in the client interface, maps the data frame format of the metamodel to the metamodel definition component located on the server, associates the data in the metamodel with the file storage directory under the metamodel definition component, and stores the association relationship in the database.
[0032] Specifically, the user defines the data frame format of the metamodel on the client side. The client encapsulates the data frame format into network data and sends it to the metamodel definition component on the server side via the TCP protocol. The metamodel definition component defines the model's data frame format based on the received network data. The model's data frame format includes the model ID, model name, frame length, identifier length, and identifier value, as well as the name, type, and length of each keyword field (i.e., the fields include the name, type, and length of each keyword; the specific number and type of keywords are not limited, such as...). Figure 3The keywords displayed are: type, sequence (seq), length (len), and information (info). For frame length, 0 indicates variable length, otherwise it indicates a fixed-length frame. Types include integer, floating-point, and string. Integer fields support 1 byte, 2 bytes, 4 bytes, 8 bytes, and combinations of signed and unsigned integers. Floating-point fields support 4 bytes and 8 bytes. Simultaneously, data tables are defined in the database based on the fields in the data frame format.
[0033] S2. The server-side file monitoring component recursively adds folders under the file storage directory to the monitoring directory set. When a new file storage directory or a new folder is created, it adds the new file storage directory or new folder to the monitoring directory set and sends a notification when a new file is created.
[0034] Specifically, the monitoring directory set is a collection of directories monitored by the file monitoring component. After receiving the operation of binding data in the metamodel with the file storage directory under the metamodel definition component, the file monitoring component recursively adds the folders under the file storage directory to the monitoring directory set. When a new file storage directory or folder is created, it is added to the monitoring directory set. When a new file is created (new files monitored by the file monitoring component refer not only to new files created by users on the client side, but also to new files created by the server side during operation), the server-side file parsing component is notified to parse the new file (i.e., the file parsing component is called to parse the new file). When a file storage directory or folder is deleted, it is deleted from the monitoring directory set, and the corresponding parsed result in the database is also deleted. Existing operating systems provide an interface for monitoring a specified first-level directory for file monitoring. This interface only supports monitoring events such as file creation, deletion, and modification in subdirectories under the first-level directory and cannot provide monitoring of deeper directories. However, this method dynamically adds new file storage directories / folders to the monitoring directory set (or removes deleted file storage directories / folders from the monitoring directory set) to achieve monitoring of deeper directories.
[0035] The file event listening component passively receives operating system event notifications triggered by file events, which greatly improves sensitivity and computational efficiency compared to traditional polling techniques for determining file changes.
[0036] like Figure 2 This diagram illustrates the changes in the monitored directory set. Figure 2In this context, d_root represents the file storage directory, dir_1, dir_2, dir_3, dir_1_1, dir_1_2, and dir_1_1_1 all represent folders, and file_1, file_2, file_3, and file_4 all represent files.
[0037] S3. The server-side file parsing component responds to the notification and parses the new file based on the fields in the data frame format.
[0038] Specifically, after receiving a notification from the file listening component, the file parsing component reads the data from the new file into the cache sequentially. Based on the frame length, identifier length, and identifier value of the data in the cache, it finds the corresponding data frame format. Then, it parses the data according to the definitions of each field in that data frame format (obtained from the metamodel definition component, meaning the metamodel definition component needs to call the file parsing component to obtain the fields in the data frame format). First, it parses the field types, and based on the field types, it handles the data in the following ways:
[0039] a) Signed integers
[0040] Determine the length of the field and convert the corresponding field length in the frame data format into an integer value based on the field length.
[0041] i. Field length is 1
[0042] Extract 1 byte of data from the frame and convert it into a signed int8_t integer.
[0043] ii. Field length is 2
[0044] Extract 2 bytes of data from the frame and convert them into signed int16_t integer data.
[0045] iii. Field length is 4
[0046] Extract 4 bytes of data from the frame and convert them into signed int32_t integer data.
[0047] iv. Field length is 8
[0048] Extract 8 bytes of data from the frame and convert them into signed int64_t integer data.
[0049] b) Unsigned integer
[0050] Similar to the signed integer processing method, such as:
[0051] i. Field length is 1
[0052] Extract 1 byte of data from the frame and convert it into an unsigned uint8_t integer.
[0053] ii. Field length is 2
[0054] Extract 2 bytes of data from the frame and convert them into unsigned uint16_t integer data.
[0055] iii. Field length is 4
[0056] Extract 4 bytes of data from the frame and convert them into unsigned uint32_t integer data.
[0057] iv. Field length is 8
[0058] Extract 8 bytes of data from the frame and convert them into unsigned uint64_t integer data.
[0059] c) Floating-point type
[0060] Determine the length of the field and convert the corresponding field length in the frame data format into floating-point numerical data based on the field length.
[0061] i. Field length is 4
[0062] Extract 4 bytes of data from the frame and convert them into 4 bytes of float data.
[0063] ii. Field length is 8
[0064] Extract 8 bytes of data from the frame and convert them into 8 bytes of double floating-point data.
[0065] d) String
[0066] First, obtain the length value corresponding to the field through the field length value. If the field is fixed-length data, extract the data of the specified length from the corresponding offset position of the frame. If the field is variable-length data, obtain the length corresponding to the string from the index length field, and then extract the data of the specified length from the corresponding offset position of the frame.
[0067] The values of each field in the extracted frame data format are entered into the database by calling the data entry component interface (i.e., the file parsing component calls the data entry component to enter the data into the database).
[0068] S4. The server-side data import component saves the parsed results to the database.
[0069] Specifically, based on the fields in the data frame format, a corresponding data table is defined in the database, and two table entries, offset position and frame length, are added. The parsing results are divided into fields according to the table entries, corresponding one-to-one with the table entries.
[0070] The specific steps are as follows:
[0071] When the data import component is called by the file parsing component, it matches the corresponding data frame format based on the file storage directory where the file is located and the binding relationship between the data in the metamodel and the file storage directory.
[0072] Define corresponding data tables in the database based on the fields in the corresponding data frame format (data tables are automatically created when defining the data frame format of the metamodel). Use each field in the data frame format as table entries, and add two additional table entries: offset position and frame length. The offset position is used to record the offset position of the data in the file, and the frame length is used to record the length of each frame of data.
[0073] When the parsed results are saved to the database, they are divided according to the fields in the data frame format. The information to be inserted into the database corresponds one-to-one with the table entries. After receiving the file parsing results, the data insertion component will construct an INSERT statement (representing a database insertion operation) according to the relationship between the data table and the parsed structure. The construction method is as follows:
[0074] The system retrieves the number and names of fields in the data frame format, begins constructing an SQL (Query Language) statement, and concatenates all field names and the corresponding number of placeholders (in database usage scenarios, placeholders are special characters or symbols used in SQL statements to represent values that need to be dynamically filled. Placeholders are typically used in prepared statements to prevent SQL injection attacks and improve performance) "VALUE(?)". The order of the data fields is exactly the same as the order of the table entries.
[0075] After the SQL statement is preprocessed, the data in the file parsing result is bound to the handle of the SQL statement, which is equivalent to filling the corresponding data into the placeholder;
[0076] Call the database interface to execute complete SQL statements (i.e., the data entry component calls the database to store data).
[0077] To improve the efficiency of data entry, a method of performing multiple data results into the database at once is adopted instead of entering each data result individually. After the database inserts all file parsing results into the data table and saves them, the status of successfully parsed files is set to "parsed complete" in the database, so that the specified files can be retrieved later.
[0078] S5. The user selects fields and search criteria on the client to search for files. The client sends the search request to the data retrieval component on the server. The data retrieval component parses the search criteria, calls the database interface to perform the query, and then sends the returned results back to the client. After receiving the data, the client displays it on the interface.
[0079] Specifically, the user first selects the file to be searched on the client interface. The client finds the corresponding data frame format based on the selected file and displays the searchable fields to the user. The client sends the user's search request to the server's data retrieval component. After receiving the search request, the data retrieval component parses the client's search conditions and then automatically constructs a field search SQL statement. After the search SQL statement is automatically constructed, it calls the database interface to perform data query and sends the query results to the client. The client then displays the query results on the interface.
[0080] The specific steps are as follows:
[0081] Users select the file to search on the client interface;
[0082] Based on the selected file, the client finds the associated data frame format and displays the searchable fields to the user;
[0083] Users can select the field to search in the client interface and choose search conditions for each field according to the field type. For integer and floating-point fields, the search conditions support the selection of equal to, not equal to, greater than, greater than or equal to, less than, and less than or equal to. For string fields, the search conditions support the selection of equal to, not equal to, contain, start with the search string, and end with the search string.
[0084] The client sends the user's search request to the server's data retrieval component via the TCP protocol;
[0085] After receiving a search request, the data retrieval component parses the client's search criteria and then automatically constructs a field-based search SQL statement. Multiple fields are concatenated using "AND" in the SQL statement to form a combined search statement. The construction of a single-field search SQL statement mainly falls into the following categories depending on the search type:
[0086] a) equals
[0087] Use the "=" symbol to concatenate SQL statements;
[0088] b) Not equal to
[0089] Use the "<>" symbol to concatenate SQL statements;
[0090] c) greater than
[0091] Use the ">" symbol to concatenate SQL statements;
[0092] d) less than
[0093] Use the "<" symbol to concatenate SQL statements;
[0094] e) Greater than or equal to
[0095] Use the ">=" symbol to concatenate SQL statements;
[0096] f) less than or equal to
[0097] Use the "<=" symbol to concatenate SQL statements;
[0098] g) Includes
[0099] This search condition is only applicable to string type fields. Use "LIKE '%search keyword%'" to construct the SQL.
[0100] h) Not included
[0101] This search condition is only applicable to string type fields. Use "NOT LIKE '%search keyword%'" to construct the SQL.
[0102] i) Starts with the search string
[0103] This search condition is only applicable to string type fields. Use "LIKE 'search keyword %'" to construct the SQL.
[0104] j) Ending with the search string
[0105] This search condition is only applicable to string type fields. Use "LIKE '%search keyword'" to construct the SQL.
[0106] After the SQL query statement is automatically constructed, the database interface is called to query the data.
[0107] Send the query results to the client;
[0108] The client displays the query results on the interface.
[0109] Figure 3 This is a diagram of the client's query page.
[0110] The order of steps S1, S2, S3, S4 and S5 is not restricted; they can be executed simultaneously or their order can be changed according to the actual situation.
[0111] In one embodiment, the metamodel definition component supports adding, deleting, modifying, and querying data frame formats. By performing these operations through the client interface, the client sends the request to the metamodel definition component on the server. Upon receiving the request, the metamodel definition component adds, modifies, deletes, and retrieves records with corresponding fields from the database.
[0112] Specifically, the metamodel definition component adds the following operations to the data frame format:
[0113] Users define the format of added data frames in the client interface. The client encapsulates the added data frame format into network data and sends it to the metamodel definition component on the server via the TCP protocol. After receiving the added data frame format, the metamodel definition component first checks whether the model name of the added data frame format exists. If it does not exist, it adds the fields of the data frame format to the data table in the database and returns the model ID of the data frame format.
[0114] When the metamodel definition component modifies, deletes, or queries the data frame format, it performs the modification, deletion, or query operation through the client interface and sends the request to the metamodel definition component on the server. After receiving the request, the metamodel definition component modifies, deletes, or retrieves the records of the corresponding fields in the database table.
[0115] In one embodiment, when a new file storage directory is added to the monitoring directory set and a new file or folder is generated in that file storage directory, the new file storage directory is added to the monitoring directory set and then traversed. When a new folder is encountered during traversal, the new folder is added to the monitoring directory set, and a notification is issued when a new file is encountered during traversal.
[0116] It should be noted that the process of adding a new file storage directory to the monitoring directory set requires a certain amount of computation time. If a new file is generated in the file storage directory during this time, the new file will not be captured. To solve this problem, after adding the new file storage directory to the monitoring directory set, the new file storage directory is traversed, and any new folders found during the traversal are added to the monitoring directory set. When a new file is found during the traversal, the file parsing component is notified to parse the file.
[0117] This metamodel-based multi-source heterogeneous data management method maps the data frame format of the client-defined metamodel to the metamodel definition component on the server. It associates the data in the metamodel with file storage directories and stores these associations in a database. This allows the server-side file monitoring component to monitor folders or files in the file storage directory, as well as the creation of new file storage directories. Newly created files are parsed by a file parsing component, and the parsing results are stored in the database by a data storage component, thus achieving data management. The file event monitoring component passively receives operating system event notifications triggered by file events. Compared to traditional polling techniques for determining file changes, this significantly improves sensitivity and computational efficiency, eliminating the need for file system traversal and resulting in more timely data processing. Search criteria are automatically generated based on metamodel information, while the search scope is specified by the user. The combination of these two approaches provides a general and more flexible data query method.
[0118] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0119] The embodiments described above are merely specific and detailed examples of the embodiments described in this application, and should not be construed as limiting the scope of the patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the scope of protection of this application. Therefore, the scope of protection of this patent application should be determined by the appended claims.
Claims
1. A method for managing multi-source heterogeneous data based on a meta-model, characterized in that: The meta-model-based multi-source heterogeneous data management method includes: Users define the data frame format of the metamodel in the client interface, map the data frame format of the metamodel to the metamodel definition component located on the server, associate the data in the metamodel with the file storage directory under the metamodel definition component, and store the association relationship in the database; The server-side file monitoring component recursively adds folders under the file storage directory to the monitoring directory set. When a new file storage directory or a new folder is created, the new file storage directory or new folder is added to the monitoring directory set, and a notification is sent when a new file is created. The server-side file parsing component responds to the notification and parses the new file based on the fields in the data frame format; The server-side data import component saves the parsed results to the database; Users select fields and search criteria on the client to search for files. The client sends the search request to the data retrieval component on the server. The data retrieval component parses the search criteria, calls the database interface to perform the query, and then sends the returned results back to the client. The client receives the data and displays it on the interface. The data frame format of the meta-model includes model ID, model name, frame length, identifier length and identifier value, as well as the name, type and length of each keyword field; The metamodel definition component supports adding, deleting, modifying, and querying data frame formats. By performing these operations through the client interface, requests are sent to the metamodel definition component on the server. Upon receiving the request, the metamodel definition component adds, modifies, deletes, and retrieves records with corresponding fields from the database.
2. The multi-source heterogeneous data management method based on meta-model as described in claim 1, characterized in that: When a new file storage directory is added to the monitoring directory set and a new file or folder is generated in that file storage directory, the new file storage directory is added to the monitoring directory set and then traversed. When a new folder is encountered during traversal, the new folder is added to the monitoring directory set, and when a new file is encountered during traversal, a notification is issued.
3. The multi-source heterogeneous data management method based on meta-model as described in claim 1, characterized in that: The server-side file parsing component responds to the notification by parsing the new file based on fields in the data frame format, including: After receiving a notification from the file listening component, the file parsing component reads the data from the new file into the cache in sequence. Based on the frame length, identifier length, and identifier value of the data in the cache, it finds the corresponding data frame format and then parses the data according to the definition of each field in the data frame format.
4. The multi-source heterogeneous data management method based on meta-model as described in claim 1, characterized in that: The server-side data import component saves the parsed results to the database, including: Based on the fields in the data frame format, define the corresponding data table in the database, and add two table entries: offset position and frame length. The parsing result is divided into fields according to the table entries, and corresponds one-to-one with the table entries.
5. The multi-source heterogeneous data management method based on meta-model as described in claim 1, characterized in that: The user selects fields and search criteria on the client to search for files. The client sends the search request to the server's data retrieval component. The data retrieval component parses the search criteria, calls the database interface to perform the query, and then sends the returned results back to the client. After receiving the data, the client displays it on the interface, including: The user first selects the file to be searched on the client interface. Based on the selected file, the client finds the corresponding data frame format, displays the searchable fields to the user, and the user selects search criteria. The client then sends the user's search request to the server's data retrieval component. After receiving the search request, the data retrieval component parses the client's search criteria and automatically constructs a field search SQL statement. After the search SQL statement is automatically constructed, it calls the database interface to perform a data query and sends the query results back to the client. The client then displays the query results on the interface.