A web front-end-based intelligent data model acquisition system
By using a web-based intelligent data model acquisition system, which employs a recursive parsing algorithm and an optimal acquisition strategy generation module, the problems of low efficiency and poor flexibility in existing data acquisition are solved. This enables efficient and accurate data acquisition and verification, and adapts to dynamic changes in data models.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 紫金诚征信有限公司
- Filing Date
- 2026-04-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing data collection methods are inefficient, manual collection has a high error rate, and traditional automated collection tools are inflexible, unable to cope with dynamic changes in data models, and have insufficient verification capabilities, resulting in reduced availability of collected data.
An intelligent data model acquisition system based on a web front-end is adopted, including a data model parsing module, an acquisition strategy generation module, a web front-end acquisition interface generation module, and a data acquisition execution module. It identifies the hierarchical relationship of the data model through a recursive parsing algorithm, generates the optimal acquisition strategy, and performs data acquisition and verification on an interactive interface.
It achieves efficient and accurate data collection, can quickly adapt to changes in data models, generate interactive web front-end interfaces, improve the flexibility and verification capabilities of data collection, and ensure the quality and timeliness of data.
Smart Images

Figure CN122240711A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of web data acquisition technology, and more specifically, to an intelligent data model acquisition system based on a web front-end. Background Technology
[0002] In today's global digital revolution, data has become a core driving force for development across all industries. Data collection, as the starting point of the data lifecycle, directly determines the effectiveness of subsequent data mining, analysis, and decision-making processes through its quality and efficiency.
[0003] Currently, data collection methods on the market have many limitations. Manual data collection is not only costly in terms of manpower and resources, but also suffers from an error rate as high as 5%-10%. Furthermore, its inefficiency is particularly pronounced for massive data collection tasks, often requiring several days or even weeks to complete a single large-scale data collection. Traditional automated data collection tools, such as those based on fixed scripts, reduce human intervention to some extent, but lack flexibility. When changes occur in the data model, such as the addition or removal of fields, type changes, or adjustments to relationships, technical personnel need to rewrite or modify the scripts, resulting in long response times, typically 1-3 days to complete the adaptation, severely impacting the timeliness of data collection.
[0004] Furthermore, existing systems have weak data verification capabilities, often remaining at a simple format verification level and unable to perform in-depth verification based on business rules, leading to reduced usability of the collected data. For example, in e-commerce scenarios, frequent adjustments to product specifications and changes in pricing systems due to promotional activities; in the financial sector, the expansion of customer information dimensions and the addition of data fields due to regulatory requirements, all make traditional data collection systems difficult to cope with. Summary of the Invention
[0005] The main purpose of this application is to provide an intelligent data model acquisition system based on a web front-end to solve the problem of weak dynamic data model acquisition capabilities and poor adaptability in various industries.
[0006] To achieve the above objectives, the first aspect of this application proposes an intelligent data model acquisition system based on a web front-end, comprising: The data model parsing module is used to parse the definition rules of the data model and form rule expressions; The data acquisition strategy generation module is connected to the data model parsing module and is used to generate the optimal data acquisition strategy based on the rule expression. The WEB front-end data acquisition interface generation module is connected to the data acquisition strategy generation module and is used to generate an interactive WEB front-end data acquisition interface based on the rule expression and the optimal data acquisition strategy. The data acquisition execution module is connected to the WEB front-end acquisition interface generation module and is used to acquire target data in the interactive WEB front-end acquisition interface according to the optimal data acquisition strategy, and to verify and process the acquired target data.
[0007] In some feasible implementations, the data model parsing module includes: The hierarchical relationship parsing unit is used to parse the original multi-format model definition file layer by layer using a recursive parsing algorithm, identify the inclusion relationship and inheritance relationship between the parent model and the child model, as well as the association rules between the fields of the child model and the fields of the parent model, and output a structured model hierarchical relationship graph. The format unification processing unit, connected to the hierarchical relationship parsing unit, is used to receive the original multi-format model definition file and the structured model hierarchical relationship graph, unify the file format of the original multi-format model definition file, and combine it with the structured model hierarchical relationship graph to form a unified intermediate format model definition file, wherein the unified intermediate format model definition file represents a rule description containing complete hierarchy and inheritance relationship; The metadata extraction unit, connected to the format unification processing unit, is used to use a lexical analyzer to identify the attributes of key fields in the unified intermediate format model definition file to obtain a structured model metadata list. The attributes of the key fields represent at least one of the following: field name, data type, length limit, value range, required attributes, default value, and associated fields. The rule expression generation unit is connected to the metadata extraction unit and the format unification processing unit, respectively, and is used to combine the structured model metadata list with the unified intermediate format model definition file to form a rule expression.
[0008] In some feasible implementations, the acquisition strategy generation module includes: A multi-source acquisition and adaptation unit is connected to the data model parsing module. It is used to receive the rule expression and various types of data sources, and generate corresponding acquisition and adaptation strategies for each type of data source in combination with the rule expression. It can also generate multiple data acquisition and adaptation strategies for the same type of data source. The strategy decision engine unit, connected to the multi-source acquisition adaptation unit, is used to generate multiple data acquisition adaptation strategies based on the rule expression and for the same type of data source. In the preset strategy generation rule base, the optimal strategy is selected for multiple data acquisition adaptation strategies for the same type of data source by using the strategy selection experience and effect indicators of historical acquisition data, so as to obtain the preliminary optimal data acquisition strategy for the same type of data source. A data acquisition scheduling strategy unit, connected to the strategy decision engine unit, is used to integrate the current data scheduling parameters into the preliminary optimal data acquisition strategy to obtain the optimized optimal data acquisition strategy for the same type of data source. The data scheduling parameters include at least one of data size, update frequency, and priority. The data cleaning strategy unit, connected to the acquisition scheduling strategy unit, is used to extract the attributes of key fields that match the optimized optimal data acquisition strategy from the metadata extraction unit according to the optimized optimal data acquisition strategy of the same type of data source, and generate corresponding data cleaning rules. The data cleaning rules represent at least one of the following: removing spaces, filtering special characters, standardizing date formats, and converting numerical units.
[0009] In some feasible implementations, the web front-end data acquisition interface generation module includes: The interface layout engine is connected to the data model parsing module and the acquisition strategy generation module respectively. It is used to receive the attributes of key fields in the rule expression and the optimized data acquisition strategy of the same type of data source. It uses an adaptive layout algorithm that combines grid layout and flow layout to group and lay out the key fields according to their importance to obtain an interactive WEB front-end acquisition interface. The intelligent control matching unit, connected to the data model parsing module, is used to receive the data type of the key field and a preset mapping relationship library between data types and UI controls. Based on the mapping relationship library, it matches the corresponding UI controls for the key fields of different data types to obtain the UI control configuration corresponding to each key field. The mapping relationship library includes at least one of the following mapping relationships between string, integer / floating-point number, enumeration, boolean, date / time, file, array type and corresponding UI control:
[0010] In some feasible implementation methods, the data acquisition execution module includes: The data storage and processing unit is connected to the WEB front-end acquisition interface generation module. It is used to receive the optimized optimal data acquisition strategy, the front-end acquisition target data and the data source type, select the appropriate storage method according to the data source type, and perform field type mapping and encrypted storage of the target data. The front-end data input processing unit is connected to the WEB front-end acquisition interface generation module. It is used to receive user input data from the interactive WEB front-end acquisition interface and verify the user input data against preset verification rules. If the verification passes, the user is allowed to input data; otherwise, the user is prompted to correct the input data.
[0011] Some feasible approaches also include a data model update monitoring and adaptation module, which is connected to the data model parsing module, the acquisition strategy generation module, the web front-end acquisition interface generation module, and the data acquisition execution module, respectively. This module is used to detect changes in the data model and, based on the changes, drive the data model parsing module, the acquisition strategy generation module, the web front-end acquisition interface generation module, and the data acquisition execution module to make adaptive adjustments.
[0012] In some feasible implementations, the data model update monitoring and adaptation module includes: The model change monitoring unit is connected to the data model parsing module and is used to monitor the rule expression output by the data model parsing module and obtain model change notifications. The change analysis unit, connected to the model change monitoring unit, is used to receive the old and new data models, perform a difference comparison analysis on the old and new data models through a tree structure comparison algorithm, identify the change content, and generate a change report. The data acquisition strategy adjustment unit is connected to both the change analysis unit and the data acquisition strategy generation module, and is used to receive the change report and issue an instruction to the data acquisition strategy generation module to update the data acquisition strategy. The interface dynamic update unit is connected to the acquisition strategy adjustment unit and the WEB front-end acquisition interface generation module. It is used to receive updated acquisition strategies and update the interface controls of the WEB front-end acquisition interface generation module using an incremental update method.
[0013] Some feasible approaches also include a system management and monitoring module, which is connected to the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module, respectively, for user management, access control, operation monitoring, and log analysis.
[0014] In some feasible implementation methods, the system management and monitoring module includes: The user and permission management unit is connected to the WEB front-end acquisition interface generation module, the acquisition strategy generation module, and the data acquisition execution module, respectively, and is used to manage user permissions based on the role-based access control model. The operation status monitoring unit is connected to the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module, respectively, and is used to monitor the operation status of the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module through a preset operation status monitoring threshold. The log analysis and reporting unit is connected to the data model parsing module, the collection strategy generation module, the web front-end collection interface generation module, and the data collection execution module, respectively. It is used to analyze the logs generated by the data model parsing module, the collection strategy generation module, the web front-end collection interface generation module, and the data collection execution module, and generate various core reports.
[0015] Secondly, this application provides a computer program that, when executed by a processor, implements the aforementioned system.
[0016] The beneficial effects of the technical solutions provided in the embodiments of this application are given in the specific embodiments. Attached Figure Description
[0017] The accompanying drawings, which form part of this application, are used to provide a further understanding of the application and to make other features, objects, and advantages of the application more apparent. The illustrative embodiments and descriptions of this application are used to explain the application and do not constitute an undue limitation of the application. In the drawings: Figure 1 A logical diagram of an intelligent data model acquisition system based on a web front-end is provided for this application. Detailed Implementation
[0018] The following explanations of some terms used in this application are provided to aid in understanding the technical solution. Terms that are explained separately are interpreted in the conventional manner: ANTLR is an abbreviation for ANother Tool for Language Recognition, an open-source parser generation tool.
[0019] An Abstract Syntax Tree (AST) is an intermediate data structure that represents the syntactic logic of a text in a tree structure, generated after lexical and syntactic analysis of source code or structured text.
[0020] LL(1) is a top-down parsing algorithm, belonging to the non-backtracking deterministic parsing method. The first L represents scanning the input string from left to right. The second L represents constructing the leftmost derivation, that is, always prioritizing the derivation of the leftmost non-terminal symbol in the grammar rule. 1 means that each step of the analysis only needs to look at one subsequent input symbol to determine which grammar rule to use.
[0021] GridLayout, also known as CSS grid layout, is a two-dimensional layout system introduced in CSS3. It divides the front-end page container into a grid composed of "rows + columns" (similar to the cells of a table). It can precisely control the position, size, spacing, and alignment of elements in both horizontal (column) and vertical (row) dimensions. Each element can occupy one or more grid cells.
[0022] FlexLayout, also known as CSS flexible layout, is a one-dimensional layout system introduced in CSS3. It uses a "flexible container + flexible items" model to control the arrangement, alignment, and space allocation of elements in a single row or column (horizontal / vertical). Elements will automatically expand and contract according to the size of the container (for example, when the container becomes narrower, the elements will automatically adjust the spacing or wrap).
[0023] The WebHook mechanism is an event-driven real-time notification mechanism based on the HTTP protocol, used for real-time monitoring of data model changes.
[0024] Incremental update technology is a partial update technology that differs from full update. It updates only the parts of the system or interface that have changed, rather than updating the entire system or all interface elements.
[0025] like Figure 1 As shown, this application provides an intelligent data model acquisition system based on a web front-end, comprising: The data model parsing module is used to parse the definition rules of the data model and form rule expressions.
[0026] It should be noted that in the web-based intelligent data model acquisition system, the data model parsing module is the core foundation module. Its core function is to comprehensively and accurately decompose and transform the definition rules of the data model, organize the scattered and non-standardized model rules into rule expressions that the system can directly call, and provide a unified and standardized rule basis for subsequent acquisition strategy generation, web front-end acquisition interface construction, target data acquisition and verification, etc., so as to ensure the standardization and accuracy of the entire system's data acquisition process.
[0027] The data model parsing module includes a hierarchical relationship parsing unit, a format unification processing unit, a metadata extraction unit, and a rule expression generation unit.
[0028] Furthermore, the hierarchical relationship parsing unit is used to parse the original multi-format model definition file layer by layer using a recursive parsing algorithm, identify the inclusion relationship and inheritance relationship between the parent model and the child model, as well as the association rules between the fields of the child model and the fields of the parent model, and output a structured model hierarchical relationship graph.
[0029] Specifically, the hierarchical relationship parsing unit decomposes and organizes the original multi-format model definition files (supporting mainstream formats such as JSON, XML, and YAML, while also being compatible with enterprise-defined non-standard model definition formats) layer by layer from top to bottom. Starting from the top-level model, this unit gradually penetrates to each level of sub-model, identifying the inclusion relationship between parent and child models (i.e., which parent model the child model belongs to, and whether multi-level nesting exists), the inheritance relationship between different models (i.e., whether the child model reuses the basic rules and fields of the parent model), and clarifying the association mapping rules between fields within the child model and fields in the parent model (such as field reuse relationships, field dependency relationships, and field value association relationships). During the parsing process, this unit structurally records the identified hierarchical and association information, ultimately outputting a structured model hierarchical relationship graph. This structured model hierarchical relationship graph presents the association logic of all models and fields in a clear hierarchical structure, marking the parent / child affiliation of each model and the association mapping relationships of fields, providing a hierarchical basis for subsequent format unification and rule transformation.
[0030] For example, a recursive parsing algorithm is used to perform deep parsing on data models containing nested structures and inheritance relationships. For instance, for a nested model containing "user basic information" and "user account information", it can accurately identify "user account information" as a sub-model of "user basic information", as well as the association relationship between fields in the sub-model and fields in the parent model.
[0031] The format unification processing unit, connected to the hierarchical relationship parsing unit, is used to receive the original multi-format model definition file and the structured model hierarchical relationship graph, unify the file format of the original multi-format model definition file, and combine it with the structured model hierarchical relationship graph to form a unified intermediate format model definition file; wherein, the unified intermediate format model definition file represents a rule description containing complete hierarchy and inheritance relationship.
[0032] Specifically, the format unification processing unit receives the original multi-format model definition files and the structured model hierarchy relationship graph. Based on the conventional principles of format adaptation and standardization, this unit first identifies the format of the received original multi-format model definition files, distinguishing different file format types. Then, through a built-in format conversion mechanism, it uniformly converts model definition files of various formats (including non-standard formats) into an intermediate format that the system can universally recognize. During the conversion process, redundant format information in the original files is removed, and format non-standard issues are corrected to ensure that the converted files have unified syntax and can be stably read by the system. Simultaneously, the logic of the hierarchy and association rules formed by the hierarchy relationship parsing unit is preserved, ultimately forming a unified intermediate format model definition file. This unified intermediate format model definition file serves as a standardized rule description carrier, fully containing the hierarchical structure, inheritance relationships, and basic field association rules of the data model, and can be directly used as the core input for subsequent metadata extraction and rule transformation.
[0033] For example, the format unification processing unit supports mainstream data model definition formats such as JSON, XML, and YAML, and has a built-in format conversion engine (a conventional format engine) that can convert non-standard format model definition files into intermediate formats that the system can recognize.
[0034] The metadata extraction unit, connected to the format unification processing unit, is used to use a lexical analyzer to identify the attributes of key fields in the unified intermediate format model definition file, thereby obtaining a structured model metadata list. The attributes of the key fields represent at least one of the following: field name, data type, length limit, value range, required attributes, default value, and associated fields.
[0035] Specifically, the metadata extraction unit, based on the principles of content segmentation and keyword matching, performs comprehensive content parsing of the unified intermediate format model definition file using a lexical analyzer (built using the ANTLR tool). This metadata extraction unit first segments the content of the unified intermediate format file into its smallest semantic units, then locates the core key fields in the file through a key field matching mechanism. Subsequently, it identifies and extracts the attribute information of each key field one by one. The extracted key field attribute information includes, but is not limited to: field name (the unique identifier of the field), data type (basic types: string, integer, floating-point number, boolean value, etc.; complex types: array, object, enumeration, date and time, etc.), field length constraints (maximum / minimum length of character fields, digit limit of numeric fields, etc.), value range constraints (value range of numeric fields, list of optional values for enumeration fields, etc.), required attributes (whether the field is required, i.e., whether it must be filled in during data collection), default value (default value when the field is not filled in), and related field information (dependence or mapping relationship between this field and other fields). After extraction, the metadata extraction unit organizes and summarizes the attribute information of all fields according to a standardized structure, and finally obtains a structured model metadata list. This model metadata list clearly presents the complete attribute constraints of each key field, providing precise field-level basis for the subsequent generation of rule expressions.
[0036] For example, the metadata extraction unit performs word segmentation on the model definition file based on the ANTLR tool to identify key fields; it uses a parser (using the LL(1) parsing algorithm) to construct an abstract syntax tree (AST) and extracts metadata such as field names, data types (including basic types such as strings, integers, and floating-point numbers, and complex types such as arrays, objects, and enumerations), field length limits, value range constraints, required attributes, default values, and related field information.
[0037] The rule expression generation unit is connected to the metadata extraction unit and the format unification processing unit, respectively, and is used to combine the structured model metadata list with the unified intermediate format model definition file to form a rule expression.
[0038] Specifically, the rule expression generation unit obtains the structured model metadata list and the unified intermediate format model definition file. Based on the principle of rule integration and standardized transformation, it transforms the scattered field attribute constraints and business rules into system-executable rule expressions. The rule expression generation unit first extracts business rule descriptions (including but not limited to field validation rules, numerical calculation rules, and related field constraint rules) from the unified intermediate format model definition file. Then, it matches the business rules with the corresponding field attribute constraints by referring to the structured model metadata list. For business rules involving numerical calculations, formulas can be defined and executed according to the formulas. Finally, through a conventional rule standardization transformation mechanism, all associated rules (including field attribute constraints, business rules, calculation rules, etc.) are integrated into a rule expression that the system can directly call. This rule expression will serve as the core rule input for the subsequent data acquisition strategy generation module, the web front-end data acquisition interface generation module, and the data acquisition execution module, ensuring the accuracy of data acquisition strategy optimization, dynamic interface generation, and data validation processing.
[0039] For example, the business rule descriptions contained in the parsing model definition, such as the calculation rule of "order amount = unit price of goods × quantity + shipping fee - discount amount" and the verification rule of "ID number must be 18 digits and meet the check digit rule", are converted into rule expressions that can be executed by the system.
[0040] The data acquisition strategy generation module is connected to the data model parsing module and is used to generate the optimal data acquisition strategy based on the rule expression.
[0041] The data acquisition strategy generation module receives the rule expression output by the data model parsing module. Based on the constraints of the rule expression, it generates an optimal data acquisition strategy that covers various types of data sources and balances acquisition efficiency and data quality through multi-unit collaborative operation (multi-source adaptation, optimal filtering, scheduling optimization, and matching cleaning rules). At the same time, it generates matching data cleaning rules to ensure the standardization and usability of the acquired data.
[0042] The data acquisition strategy generation module includes a multi-source data acquisition adaptation unit, a strategy decision engine unit, a data acquisition scheduling strategy unit, and a data cleaning strategy unit.
[0043] The multi-source acquisition and adaptation unit is connected to the data model parsing module. It is used to receive the rule expression and various types of data sources, and generate corresponding acquisition and adaptation strategies for each type of data source in combination with the rule expression. It can also generate multiple data acquisition and adaptation strategies for the same type of data source.
[0044] Specifically, the data sources in the multi-source acquisition and adaptation unit can include relational databases, non-relational databases, third-party API interfaces, local file systems, etc., and, in conjunction with the core content such as field constraints and data format requirements in the rule expression, acquisition adaptation strategies adapted to the data storage format and access protocol of each type of data source are generated. Among them, for the same type of data source, based on different acquisition scenario requirements (such as full acquisition scenario, incremental acquisition scenario, high-frequency update scenario, etc.), multiple data acquisition adaptation strategies with differentiated execution logic are generated to ensure that each type of data source has multiple feasible acquisition schemes to choose from.
[0045] After receiving the rule expression, the multi-source data acquisition and adaptation unit parses the core information it contains, such as field types, value ranges, and required constraints. It also identifies the access characteristics of each data source type (e.g., relational databases support SQL queries, API interfaces support HTTP / HTTPS calls, and file systems support path access). Secondly, for each data source type, it generates a basic data acquisition and adaptation strategy that adapts to its access characteristics and conforms to the rule expression constraints. Finally, for different business scenarios of the same type of data source, the basic data acquisition and adaptation strategy can be optimized to generate multiple differentiated strategies, or different basic data acquisition and adaptation strategies can be formed as needed. For example, for relational databases (the same type of data source), three differentiated adaptation strategies can be generated: a "full SQL query acquisition strategy," a "timestamp-based incremental SQL query acquisition strategy," and a "batch-and-pagination SQL query acquisition strategy." These strategies adapt to scenarios such as data initialization, daily incremental updates, and large-volume batch acquisition, providing ample alternatives for subsequent optimal strategy selection.
[0046] It should be noted that, for the basic data collection adaptation strategy, a conventional basic data collection adaptation strategy can be selected according to the scenario corresponding to each type of data source. This application does not limit the specific content of the strategy corresponding to the data collection, and can be selected and adjusted according to the specific situation.
[0047] For example, the multi-source data acquisition and adaptation unit generates corresponding acquisition and adaptation strategies for different types of data sources. For database data sources, it automatically generates SQL query statements (supporting dynamic condition concatenation) or data synchronization scripts; for API interface data sources, it generates configurations such as interface call parameters, request header information, and authentication methods (such as Token authentication, OAuth2.0); for file data sources, it generates file parsing rules (such as delimiter recognition for CSV files and sheet selection for Excel files).
[0048] The strategy decision engine unit, connected to the multi-source acquisition adaptation unit, is used to generate multiple data acquisition adaptation strategies based on the rule expression and for the same type of data source. In the preset strategy generation rule base, the optimal strategy is selected for multiple data acquisition adaptation strategies for the same type of data source by using the strategy selection experience and effect indicators of historical acquisition data, so as to obtain the preliminary optimal data acquisition strategy for the same type of data source.
[0049] Specifically, the strategy decision engine unit relies on a pre-set strategy generation rule base (which includes strategy filtering rules, priority determination rules, and effect evaluation rules) to comprehensively evaluate and select the optimal strategy from multiple data acquisition adaptation strategies for the same type of data source based on historical data collection experience and effect indicators (specifically including collection time, data integrity rate, data accuracy rate, and system resource utilization rate). This ultimately yields a preliminary optimal data acquisition strategy for the same type of data source. The pre-set strategy generation rule base is a pre-built library containing several rules, such as strategy filtering rules, priority determination rules, and effect evaluation rules. The ultimate goal of these rules is to select the optimal data acquisition strategy from multiple data acquisition adaptation strategies to suit the corresponding data source.
[0050] In the strategy decision engine unit, the working principle is rule constraints + historical experience support + comprehensive evaluation and selection. It does not rely on complex algorithms; the optimal strategy is selected by combining preset rules with historical data experience. First, the strategy decision engine unit obtains multiple candidate adaptation strategies from the multi-source data acquisition and adaptation unit for the same type of data source. Simultaneously, it loads rule expressions and a preset strategy-generated rule base. The filtering rules in the preset strategy-generated rule base must be consistent with the rule expression constraints (e.g., if the rule expression requires "real-time acquisition," then the filtering rules prioritize candidate strategies that support real-time acquisition). Second, it retrieves historical acquisition data and extracts the performance indicators and selection experience of each candidate strategy in the past (e.g., historical data shows that the "incremental SQL query acquisition strategy" has shorter acquisition time and lower resource consumption in daily update scenarios). Finally, it comprehensively evaluates the performance indicators of each candidate strategy based on the preset rule base. During the evaluation process, the evaluation results can be quantified using a basic evaluation formula to ensure the objectivity of the selection. The evaluation formula is as follows: Basic assessment formula: E=α×A+β×B+γ×C+δ×D The letters in the formula have the following meanings: E represents the comprehensive evaluation score of the alternative strategies (the higher the score, the better the strategy adaptability); A represents the data completeness rate (the proportion of collected data to the total data to be collected, ranging from 0 to 1, with a value closer to 1 being better); B represents the data accuracy rate (the proportion of collected data that conforms to the constraints of the rule expression, ranging from 0 to 1, with a value closer to 1 being better); C represents the collection efficiency coefficient (the amount of data collected per unit time, ranging from 0 to 100, with a higher value indicating higher efficiency); D represents the resource utilization coefficient (the combined proportion of system CPU and memory used during the collection process, ranging from 0 to 1, with a value closer to 0 being better); α, β, γ, and δ represent the weighting coefficients (which can be set according to business needs, α+β+γ+δ=1, for example, γ=0.4 can be set for scenarios with high real-time requirements, and β=0.4 can be set for scenarios with high accuracy requirements).
[0051] The comprehensive evaluation score of each alternative strategy is calculated using the above formula. The strategy with the highest score is selected as the preliminary optimal data collection strategy for this type of data source. If there are cases with the same score, the preliminary optimal data collection strategy is determined by combining historical strategy selection experience (such as the strategy that has been selected most often in the same scenario in the past).
[0052] For example, the strategy decision engine unit can generate collection strategies by combining rule-based reasoning with machine learning. The rule-based reasoning module generates a rule base based on preset strategies, such as "using drop-down selection controls for enumeration type fields" and "using file upload controls for file type fields". The machine learning module optimizes and adjusts the strategies by analyzing historical collection data (including indicators such as collection efficiency and data quality). For example, it automatically strengthens the verification strategy for fields that frequently erroneous.
[0053] The data acquisition and scheduling strategy unit, connected to the strategy decision engine unit, is used to integrate the current data scheduling parameters into the preliminary optimal data acquisition strategy to obtain the optimized optimal data acquisition strategy for the same type of data source. The data scheduling parameters include at least one of data size, update frequency, and priority.
[0054] Specifically, the data acquisition and scheduling strategy unit obtains the current data scheduling parameters (the data scheduling parameters include at least one of the following: data size, update frequency, and priority, and may also include auxiliary parameters such as the number of concurrent acquisition threads, acquisition time window, and retry mechanism threshold), integrates the above data scheduling parameters into the preliminary optimal data acquisition strategy, and obtains the optimized optimal data acquisition strategy for the same type of data source by adjusting the execution logic of the strategy, such as the execution sequence, resource allocation, and triggering mechanism.
[0055] Furthermore, the data acquisition and scheduling strategy unit works by optimizing the preliminary optimal strategy based on the results of the previous step, combined with real-time scheduling parameter adaptation, to ensure the strategy adapts to the current system operating status and business scheduling requirements. First, the data acquisition and scheduling strategy unit receives the preliminary optimal data acquisition strategy output by the strategy decision engine unit, and simultaneously acquires the real-time scheduling parameters of the current system. The data volume determines the way acquisition tasks are split (large data volumes are split into multiple threads for execution, small data volumes are executed in a single thread), the update frequency determines the acquisition trigger cycle (high-frequency update data is set to short-cycle timed triggering, low-frequency update data is set to long-cycle timed triggering or trigger-based acquisition), and the priority determines the execution order of acquisition tasks (high-priority tasks occupy system resources first for execution). Second, the execution logic of the preliminary optimal strategy is adjusted according to the above scheduling parameters. For example, if the preliminary optimal strategy is " The "Relational Database Incremental SQL Query Collection Strategy" is optimized as follows: If the current data volume is 1 million records (large data volume), the update frequency is once per hour (high frequency), and the priority is the highest, then the optimal data collection strategy is to "execute incremental SQL query collection in relational databases using 10 concurrent threads, triggering at regular intervals every hour, occupying the highest priority system resources, and setting the retry mechanism threshold to 3 times (retrying 3 times after collection failure)". Finally, the optimized optimal data collection strategy is output to ensure that the strategy not only has the best collection adaptability but also adapts to the current system scheduling requirements, ensuring efficient and stable execution of the collection task. For example, the data collection and scheduling strategy unit can generate a data collection and scheduling plan based on factors such as data volume, update frequency, and priority. It supports multiple modes, including real-time collection (such as order payment data), scheduled collection (such as daily product inventory data), and triggered collection (such as incremental collection triggered when the data model is updated), and can configure the number of concurrent collection threads to avoid putting excessive pressure on the data source.
[0056] The data cleaning strategy unit, connected to the acquisition scheduling strategy unit, is used to extract the attributes of key fields that match the optimized optimal data acquisition strategy from the metadata extraction unit according to the optimized optimal data acquisition strategy of the same type of data source, and generate corresponding data cleaning rules. The data cleaning rules represent at least one of the following: removing spaces, filtering special characters, standardizing date formats, and converting numerical units.
[0057] Specifically, the data cleaning strategy unit extracts the attributes of key fields that match the optimized optimal data collection strategy from the metadata extraction unit (the metadata extraction unit is a component of the data model parsing module and is used to extract the structured metadata of the data model). The specific content of the key field attributes can be found in the definition of key fields in the metadata extraction unit, such as field name and data type. Based on the extracted key fields, the corresponding data cleaning rules are generated.
[0058] The purpose of the data cleaning strategy unit is to ensure the standardization of collected data through data cleaning, laying the foundation for subsequent data processing. First, the data cleaning strategy unit receives the optimized data collection strategy output by the collection scheduling strategy unit, clarifying core information such as the data source type and the range of collected fields corresponding to the strategy. Simultaneously, it extracts key fields matching the strategy from the metadata extraction unit (only extracting key fields corresponding to the fields covered by the strategy to avoid redundant extraction). Second, based on the extracted key fields, it generates targeted cleaning rules. For example, if the extracted key fields are "Field Name: Order Date, Data Type: Date, Format Requirement: YYYY-MM-DD", then the following rule will be generated: "Date Format Standardization (Convert dates not in YYYY-MM-DD format to YYYY-MM-DD format)". Cleaning rules: If the key field to be extracted is "Field Name: Order Amount, Data Type: Numeric, Format Requirements: Keep 2 decimal places, Value Range: Greater than 0", then the following cleaning rules are generated: "Remove Spaces (remove space characters from the amount field), Filter Special Characters (filter non-numeric and non-decimal point special characters from the amount field), Convert Numerical Units (convert yuan / jiao / fen units to yuan units, keeping 2 decimal places)". Finally, the generated data cleaning rules are output. These cleaning rules are used in conjunction with the optimized optimal data collection strategy. Data cleaning operations are performed synchronously during the collection task to ensure that the collected data meets the constraints of the rule expression and the requirements of the data model, thereby improving data quality.
[0059] For example, the data collection and scheduling strategy unit can generate data cleaning rules based on the field characteristics of the parsed key fields, including removing spaces, filtering special characters, standardizing date formats (such as uniformly converting "2023 / 10 / 05" and "05-10-2023" to "2023-10-05"), and converting numerical units (such as converting "1kg" to "1000g").
[0060] The WEB front-end data acquisition interface generation module is connected to the data acquisition strategy generation module and is used to generate an interactive WEB front-end data acquisition interface based on the rule expression and the optimal data acquisition strategy.
[0061] The WEB front-end data acquisition interface generation module is responsible for the visualization construction of the interface in the intelligent data model acquisition system based on the WEB front-end. Its function is to receive the rule expressions refined by the acquisition strategy generation module, as well as the optimized data acquisition strategy of the same type of data source output by the acquisition strategy generation module. Through the collaborative work of internal functional units, it generates an interactive WEB front-end data acquisition interface that adapts to business needs, is easy to operate, and is compatible with multiple devices. It provides users with an intuitive and efficient data acquisition entry point, ensuring the smoothness of the acquisition operation and the standardization of the acquired data.
[0062] Furthermore, the web front-end data collection interface generation module and the data collection strategy generation module are connected. The data sources are divided into two parts: First, the optimized optimal data collection strategy output by the data collection strategy generation module (this strategy has refined the rule expression output by the data model parsing module, clarifying core information such as the scope of collection fields, field display priority, and adaptation requirements of collection scenarios); Second, through indirect connection with the data model parsing module via internal functional units, the attributes of key fields covered in the rule expression are obtained (including basic information such as the importance, correlation, required attributes, and data type of key fields). This basic information is the basis for interface construction. The two work together to ensure that the generated interactive web front-end data collection interface not only meets the constraints of the rule expression but also accurately adapts to the execution requirements of the optimal data collection strategy.
[0063] Specifically, the web front-end data collection interface generation module may include an interface layout engine and a control intelligent matching unit.
[0064] The interface layout engine is connected to the data model parsing module and the acquisition strategy generation module, respectively. It is used to receive the attributes of key fields in the rule expression and the optimized data acquisition strategy of the same type of data source. It uses an adaptive layout algorithm that combines grid layout and flow layout to group and lay out the data according to the importance of the key fields to obtain an interactive WEB front-end acquisition interface.
[0065] Specifically, the interface layout engine works by combining strategy requirements with key field attribute adaptation. By integrating dual input information, it completes the planning and adaptive adjustment of the interface layout, ensuring that the interface display logic is clear, the operation is convenient, and it is compatible with various terminal devices.
[0066] The interface layout engine first receives the optimized optimal data collection strategy output by the collection strategy generation module, extracts the clearly defined collection field range (i.e., the list of key fields to be presented on the interface; this list can be generated from a structured model metadata list, meaning the model metadata list is a complete metadata set containing the complete attributes of all key fields, while the key field list is a list of key fields extracted from the optimal data collection strategy that need to be presented on the web front-end collection interface (containing only core identification information such as field names), filters out only the key fields that are suitable for the current collection scenario and meet the requirements of the optimal collection strategy (excluding fields that do not need to be collected and displayed on the front end), field display priority (such as core identity information fields and key business fields should be displayed first), and collection scenario adaptation requirements (such as simplified layout for batch collection scenarios and emphasis on required fields for precise collection scenarios), etc., and uses this as the basis for layout planning; secondly, it receives the key fields covered in the rule expression. The system includes attributes, supplementary information on the importance of fields (e.g., primary important fields, secondary important fields; primary fields are mandatory, secondary fields are supplementary information), relationships (e.g., "province" is hierarchically related to "city" and "street," "contact person's name" and "contact number" are related), and required attributes (clearly indicating which fields are mandatory and clearly marking them on the interface). Finally, based on adaptive layout principles and combining the advantages of grid and flow layouts (grid layouts fix the position of core fields to ensure a neat interface structure; flow layouts adapt to different device display sizes, improving interface flexibility), the interface layout is planned according to preset rules: primary important fields are placed in the core area at the top of the interface, and secondary important fields are placed in the lower area; related fields are displayed in adjacent groups, such as grouping the "province," "city," "street," and "detailed address" fields related to "home address" into one group, with a group title; mandatory fields are marked with a "..." label next to the control. Prominent labels remind users to fill in the information first. Simultaneously, the interface automatically adjusts the field arrangement, control size, and spacing based on the display resolution and screen size of different devices such as PCs, tablets, and mobile phones, ensuring proper display on all devices and providing convenient and unobstructed user operation. Ultimately, it outputs a complete interactive web front-end data collection interface layout, providing a basic framework for subsequent UI control embedding.
[0067] For example, in personal information collection scenarios, the optimized optimal collection strategy clearly identifies "Name," "ID Number," and "Contact Information" as core collection fields that must be displayed first. The rule expression explicitly identifies these fields as first-level important fields, with "Name" and "ID Number" being required fields for association, and "Contact Information" (including mobile phone number and email address) being associated fields. Based on this, the interface layout engine will place the "Name" and "ID Number" fields in the core area at the top of the interface, arranged adjacently and marked as required; the "Contact Information" field will be placed below, with the mobile phone number and email address fields grouped and displayed adjacently; at the same time, it will automatically adapt to the display size of mobile and PC devices, using a single-column layout on mobile devices and a double-column layout on PC devices to ensure convenient operation for users on different devices.
[0068] For example, the interface layout engine employs an adaptive layout algorithm combining GridLayout and FlexLayout. It groups and lays out fields based on their importance and relationships, prioritizing important fields and arranging related fields adjacently. For instance, core identity information fields such as "Name," "ID Number," and "Contact Information" are placed at the top of the screen, while related fields such as "Province," "City," and "Street" for "Home Address" are grouped and displayed. Furthermore, it supports responsive design, automatically adjusting the layout across different devices such as PCs, tablets, and mobile phones to ensure optimal display.
[0069] The intelligent control matching unit, connected to the data model parsing module, is used to receive the data type of the key field and a preset mapping relationship library between data types and UI controls. Based on the mapping relationship library, it matches the corresponding UI controls for the key fields of different data types to obtain the UI control configuration corresponding to each key field. The mapping relationship library includes at least one of the following mapping relationships between string, integer / floating-point number, enumeration, boolean, date / time, file, array type and corresponding UI control:
[0070] Specifically, the intelligent control matching unit works by "precisely matching data types with UI controls". By matching suitable UI input controls for each key field, it improves the convenience of user data collection operations, while ensuring that the format of the collected data conforms to the constraints of regular expressions, thus reducing data entry errors.
[0071] The intelligent control matching unit first extracts the data type of each key field from the rule expression (data type is one of the attributes of the key field, such as "name" as a string type (short text), "age" as an integer type, "gender" as an enumeration type, "date of birth" as a date type, "home address" as a string type (long text), "attached materials" as a file type, "emergency contact" as an array type, etc.), using this as the basis for UI control matching; secondly, it loads a preset mapping relationship library between data types and UI controls (this mapping relationship library is preset based on common data collection scenarios and user operation habits, and can be flexibly adjusted according to business needs); finally, based on the mapping relationship library, it accurately matches the corresponding UI controls for key fields of different data types, and generates UI control configurations containing control type, control size, control prompt information, input constraint rules, etc., which are synchronously output to the interface layout engine, which embeds each UI control into the corresponding position according to the planned layout scheme to form a complete interface prototype.
[0072] The preset mapping rules between data types and UI controls are as follows: String types (short text, such as name, mobile phone number, ID number) correspond to single-line text boxes, with input length limits and format validation prompts configured; string types (long text, such as home address, remarks) correspond to multi-line text boxes, supporting automatic line wrapping and scrolling input; integer / floating-point types (such as age, amount, quantity) correspond to numeric input boxes, with step size configured (e.g., the step size for the amount field is 0.01, supporting two decimal places) and value range constraints; enumeration types (such as gender, education level, occupation) correspond to dropdown selection boxes or radio button groups. Dropdown selection boxes are suitable for scenarios with many enumeration values, while radio buttons... Button groups are suitable for scenarios with a small number of enumerated values (2-4); Boolean types (such as whether married or agreeing to an agreement) correspond to checkboxes or toggle buttons, with toggle buttons suitable for scenarios requiring a choice between two options and easy operation; Date / Time types (such as date of birth or collection time) correspond to date or time pickers, supporting quick selection of dates / times and avoiding errors in manual input formats; File types (such as attachments or ID photos) correspond to file upload controls with preview functionality, supporting file format validation and previewing after upload; Array types (such as emergency contacts or family member information) correspond to dynamically addable and removeable list controls, allowing users to add or delete input items according to actual needs, improving interface flexibility.
[0073] For example, the intelligent control matching unit establishes a mapping library between data types and UI controls. For instance, string types (short text) correspond to single-line text boxes, string types (long text) correspond to multi-line text boxes, integer / floating-point types correspond to numeric input boxes with configured step sizes, enumeration types correspond to drop-down selection boxes or radio button groups, boolean types correspond to check boxes or toggle buttons, date / time types correspond to date pickers or time pickers, file types correspond to file upload controls with preview functionality, and array types correspond to list controls that can be dynamically added or removed.
[0074] In addition, the web front-end data collection interface generation module may also include an interactive logic implementation unit.
[0075] The interaction logic implementation unit is connected to the control intelligent matching unit and the data model parsing module. It is used to receive the UI control configuration and linkage rules of each key field, as well as the user's custom interaction logic configuration requirements. It realizes control linkage interaction, real-time validation of key field input, and custom linkage rule configuration through the JavaScript script generation module. Combined with the interface layout and UI control configuration of the interface layout engine, it generates a complete interactive WEB front-end data collection interface. The linkage interaction includes at least the scenarios of key field selection linkage data loading and condition-triggered key field display / hide.
[0076] Specifically, the interaction logic implementation unit is one of the functional units of the WEB front-end data collection interface generation module. Its role is to take over the results of interface layout and control configuration. By building adaptive interaction logic and verification mechanisms, it integrates the static interface layout and scattered UI controls into a complete interactive WEB front-end data collection interface that is easy to operate and has data verification capabilities. This ensures the smoothness of user data collection operations and the standardization of collected data. It is a key link connecting interface display and user operation.
[0077] The interaction logic implementation unit connects with the intelligent control matching unit: the purpose of which is to receive the dedicated UI control configurations for each key field output by this unit, including core content such as control type, control size, and basic prompt information. These configurations are the basis for subsequent implementation of control linkage and input validation, ensuring that the interaction logic and control attributes are accurately matched (such as configuring linkage loading logic for drop-down selection boxes and format validation logic for text boxes).
[0078] The interaction logic implementation unit is connected to the data model parsing module. The purpose is to obtain the key field linkage rules covered in the rule expression (these rules are an important part of the rule expression and are derived from the data model parsing results), such as the hierarchical association between key fields and the conditional trigger association, to ensure that the interaction logic conforms to the constraints of the rule expression and avoids illegal interaction design.
[0079] The interaction logic implementation unit is indirectly associated with the interface layout engine: Although no direct connection is established, it receives the interface layout basics (including field arrangement order, grouping relationship, control embedding position, etc.) output by the interface layout engine to ensure that the implementation of the interaction logic is adapted to the interface layout (such as when the associated fields are arranged adjacently, the linkage interaction effect is more in line with the user's operating habits).
[0080] The interaction logic implementation unit can be implemented through the following steps: Step 1: Data Reception and Verification. This step receives UI control configurations from the intelligent control matching unit, key field linkage rules from the data model parsing module, and user-defined interaction logic configuration requirements. Compatibility verification is performed on these three types of data to ensure that user-defined rules do not violate rule expression constraints and that interaction logic is compatible with UI control types (e.g., configuring linkage loading logic only for dropdown selection boxes, and not configuring input validation logic for static text controls). If verification passes, the process proceeds to the next stage; if verification fails, the user is prompted to adjust their custom requirements. Step 2: Interaction Logic Conversion and Script Generation. Using a JavaScript script generation module, the validated linkage rules (including preset and user-defined rules) are converted into executable script code on the web front end. This core functionality comprises two main types: First, control linkage interaction, binding the association rules between fields to corresponding UI controls. For example, binding the "country-province" linkage rule to two dropdown selection controls to automatically load the corresponding province data after selecting a country. Second, real-time validation of key field inputs, binding validation scripts to fields requiring validation (such as phone number, email, ID number, etc.) to achieve real-time format validation and prompts during the input process (e.g., if the email address contains the "@" symbol, it will prompt that the format is correct; otherwise, it will prompt "Please enter the correct email format"). Step 3: Interface Integration and Effect Implementation. Integrate the generated interaction scripts with the interface layout basics output by the interface layout engine and the UI control configurations output by the control intelligent matching unit. Bind the interaction scripts to the corresponding interface controls and operation scenarios. For example, bind the "condition-triggered field show / hide" script to the selection operation of the corresponding condition field, and bind the "real-time input validation" script to the input operation of the corresponding field. At the same time, integrate the interaction logic of basic operation buttons such as submit, reset, and save draft (e.g., verify whether all required fields are filled in completely when submitting). Step 4: Output a complete interactive interface. After integration, a complete interactive web front-end data collection interface is generated. This interface features a well-organized layout and compatible controls, smooth interactive logic, and real-time input validation capabilities. Users can experience effects such as linked fields, real-time error prompts, and personalized interactive responses. This ensures ease of operation while effectively reducing data entry error rates, perfectly meeting the requirements of rule expression constraints and optimal data collection strategies.
[0081] For example, the interaction logic implementation unit uses a JavaScript script generation module to achieve interactive linkage between UI controls in the UI layout engine. For instance, when the "Country" selection box changes, the "Province / State" selection box automatically loads the corresponding country's province data; when "Do you have children?" is selected as "Yes", fields such as "Number of children" and "Child information" are dynamically displayed; when the "Email" field is entered, format validation is performed in real time and prompts are provided. Customizable interaction logic configuration is supported, allowing users to configure the linkage rules between fields through a visual interface.
[0082] The data acquisition execution module is connected to the WEB front-end acquisition interface generation module and is used to acquire target data in the interactive WEB front-end acquisition interface according to the optimal data acquisition strategy, and to verify and process the acquired target data.
[0083] The data acquisition and execution module includes a data storage and processing unit and a front-end data input and processing unit.
[0084] The data storage and processing unit is connected to the WEB front-end acquisition interface generation module. It is used to receive the optimized optimal data acquisition strategy, the front-end acquisition target data and the data source type, select the appropriate storage method according to the data source type, and perform field type mapping and encrypted storage of the target data.
[0085] Specifically, the data storage processing unit receives the target data collected from the front end, selects the appropriate storage method based on the data source type and the optimized data collection strategy, and completes the data type adaptation processing and secure encrypted storage. The goal is to ensure that the data storage conforms to the characteristics of the medium, the data format is standardized, and sensitive information is not leaked, thereby preventing problems such as data storage chaos, loss, or leakage.
[0086] The specific workflow of the data storage and processing unit is as follows: Step 1: After receiving the above three types of core data (optimized optimal data acquisition strategy, front-end acquisition target data, and data source type), the data storage and processing unit first performs a preliminary validity verification on the front-end acquisition target data to confirm that the data has passed the front-end input validation (no format errors, out-of-range issues, etc.). At the same time, it checks the compatibility between the data source type and the optimized optimal data acquisition strategy to ensure that the storage method selection has a clear basis. If the data is invalid or the compatibility is not good, it will return an error message and suspend the storage process. Step 2: Storage Method Selection. Based on the data source type, select a storage method that matches the characteristics of the storage medium. The core adaptation logic strictly follows the optimized optimal data acquisition strategy requirements. Specific adaptation scenarios are as follows: For relational database data sources: Generate SQL insert / update statements that conform to database normalization to ensure that data can be written to the database correctly; (2) For non-relational database data sources (such as MongoDB, Redis, etc.): Organize the front-end collected target data according to the JSON format specification to ensure that the field name and field value correspond one-to-one and the field format is adapted to the storage structure of the non-relational database, so as to facilitate subsequent data retrieval and reading; (3) For file-based data sources (such as Excel spreadsheets, PDF documents, text files, etc.): Generate standardized file naming rules (name format example: "collection batch-data source name-collection time-file extension", such as "20260121-personal information collection-09:30:00.xlsx"), and plan fixed storage paths (stored according to data source type and collection date, such as " / collection data / personal information / 20260121 / ") to ensure that the files are stored neatly and can be retrieved quickly.
[0087] Step 3: Data Encryption and Storage. To ensure the security of sensitive data (such as core information like names, ID numbers, mobile phone numbers, and bank account numbers), compliant data after type mapping is encrypted using the AES-256 encryption algorithm. The encryption method can be a standard encryption method, encrypting only sensitive fields or the entire dataset (as required by the policy), while non-sensitive data can be stored directly.
[0088] For example, taking personal information collection as an example, the optimized optimal data collection strategy is clearly defined: the data source type is a relational database, sensitive data (ID number, mobile phone number) needs to be encrypted and stored, and the field type mapping follows the relational database data type specification. After the user inputs data through the front-end interface, it is verified by the front-end data input processing unit and transmitted to the data storage processing unit: the data storage processing unit first verifies the validity of the data and confirms that there are no format errors; then, according to the relational database data source type, it generates an INSERTSQL statement to convert the "birth date (string: 2000-01-01)" transmitted by the front end into the DATE type and the "mobile phone number (string: 13800138000)" into the VARCHAR(11) type; then, the ID number and mobile phone number are encrypted using the AES-256 encryption algorithm to generate ciphertext; finally, the encrypted ciphertext data and other non-sensitive data are written into the relational database through SQL statements to complete the entire storage process.
[0089] For another example, the data storage processing unit selects the appropriate storage method based on the data source type. For relational databases, it generates SQL insert / update statements that conform to database normalization and handles field type mapping (such as converting front-end string dates to database DATE type); for non-relational databases, it organizes data for storage according to JSON format; for file storage, it generates standardized file naming rules and storage paths. Data is encrypted before storage (using the AES-256 encryption algorithm) to ensure the security of sensitive data.
[0090] The front-end data input processing unit is connected to the WEB front-end acquisition interface generation module. It is used to receive user input data from the interactive WEB front-end acquisition interface and verify the user input data against preset verification rules. If the verification passes, the user is allowed to input data; otherwise, the user is prompted to correct the input data.
[0091] Specifically, the workflow of the front-end data input processing unit is as follows: Step 1: Loading and Initializing Validation Rules. After the front-end data input processing unit starts, it loads the preset validation rules matching the current collection scenario through the web front-end collection interface generation module (the specific rules can be set as needed; this application does not limit the specific validation rules), clarifies the validation standards and validation types corresponding to each key field, and completes the validation rule initialization; among them, the preset validation rules include at least three core validation types, as follows: (1) Format validation: For key fields with fixed format requirements (such as mobile phone number, email, ID card number, date, etc.), validate whether the input data conforms to the preset format specifications (e.g., mobile phone number must be 11 digits, email must contain "@" symbol and domain name suffix, ID card number must be 18 digits (including X), date must be in "YYYY-MM-DD" format, etc.). (2) Data range validation: For key fields with value range constraints (such as age, amount, quantity, etc.), validate whether the input data is within the preset valid range (e.g., age must be between 0 and 120, amount must be greater than 0 and less than or equal to 1,000,000, quantity must be a non-negative integer, etc.). (3) Data type validation: For key fields with explicit data type requirements (such as integers, floating-point numbers, strings, etc.), validate whether the input data type is consistent with the preset type. If they are inconsistent, perform a forced type conversion (for example, force the user-input string number "100" to be converted to the numeric integer 100, and force the string decimal "99.99" to be converted to the numeric floating-point number 99.99). If the conversion fails, it is determined to be invalid data. Step 2: Real-time input event listening. Through the front-end native event listening mechanism, capture user input operations in real time (such as keyboard input completion, mouse click confirmation, drop-down selection, etc.). For each key field, after each input operation is completed (or triggered in real time during input, according to the strategy configuration), immediately extract the current input data of the key field and trigger the validation process; Step 3: Data and rule comparison and verification. The extracted user input raw data is compared and verified one by one with the preset verification rules corresponding to the key field. The verification process is carried out in the order of "format verification → data type verification → data range verification" (the verification order can be adjusted according to the strategy), and the data is judged one by one to determine whether it meets each verification standard. Step 4: Verification result feedback and processing.
[0092] (1) If the verification passes: allow the user to continue to enter the content of subsequent key fields, or allow the user to submit the overall collected data (the front-end interface can hide the prompt information or display a slight "verification passed" sign, without interfering with the user's operation). (2) If the verification fails: Immediately provide clear correction prompts next to the UI control of the corresponding key field (or in a prominent position on the interface). The prompts should include the error type and correction criteria (e.g., "Incorrect mobile phone number format, please enter 11 digits", "Age exceeds the valid range (0-120), please re-enter", "Incorrect email format, must include @ symbol and domain name suffix"). At the same time, intercept the user's submission operation. Disable the "Submit" button or refuse to accept the submission request until the user enters data that meets the preset verification rules. Once the verification is successful, the submission interception will be lifted.
[0093] For example, the front-end data input processing unit listens for user input events in real time on the web front-end data collection interface and performs instant validation on the input data. It uses methods such as regular expression validation (e.g., phone number format, email format), data range validation (e.g., age must be between 0-120), and data type forced conversion (e.g., converting string-type numbers to numeric types) to promptly prompt the user to correct any data that does not meet the requirements, thus reducing invalid submissions.
[0094] In addition, the data acquisition and execution module may also include an acquisition log recording unit.
[0095] The data collection and recording unit is connected to the data storage and processing unit and the front-end data input and processing unit, respectively. It is used to receive key information during the data collection process, record the collection time, collection batch, collection quantity, number of successful collections, number of failed collections, reason for failure, operating user, and data source information, and store the log data in a structured manner.
[0096] Specifically, the log recording unit receives key information generated by the data storage and processing unit and the front-end data input processing unit in real time during the operation of the data storage and processing unit, ensuring that the log recording covers the entire collection process without any information omissions.
[0097] Furthermore, the log collection and recording unit is connected to the front-end data input processing unit to receive key information during the front-end data input verification process. This information includes user input operation-related information (user account / name, terminal IP address, and timestamp), input verification-related information (verification field name, verification result (pass / fail), error type and reason for failure (e.g., format error, out-of-range, type mismatch), and the number of fields that passed / failed verification). All of this information is transmitted synchronously in real time by the front-end data input processing unit to ensure that the logs can fully trace every key step of the front-end input verification process.
[0098] The log collection and recording unit is connected to the data storage and processing unit: it receives key information during the data storage process, specifically including storage operation-related information (storage start time, storage end time, storage medium type (relational database / non-relational database / file), data source information (data source name, data source address, data source type)) and storage result-related information (total number of collections, number of successful storages, number of failed storages, reasons for failures (such as field type mapping failure, encryption failure, storage medium connection abnormality, insufficient permissions, etc.), collection batch number), etc. This information is synchronously transmitted by the data storage and processing unit during the execution of the storage process, ensuring that the logs can completely trace the entire process status of data storage.
[0099] It should be noted that all key information received by the data collection and logging unit is associated with the core data of the data collection and execution module (optimized optimal data collection strategy and front-end collection target data). The log data will contain information such as collection batch number and strategy association identifier, which facilitates subsequent association and traceability of log data with corresponding collection tasks and collection strategies.
[0100] For example, the data collection and logging unit records key information in detail during the data collection process, including collection time, collection batch, collection quantity, number of successful collections, number of failed collections, reason for failure, operating user, data source information, etc. The log data is stored in a structured manner and supports querying and statistical analysis by time, type, status and other dimensions.
[0101] In one embodiment, an intelligent data model acquisition system based on a web front-end may further include a data model update monitoring and adaptive module, which is connected to the data model parsing module, the acquisition strategy generation module, the web front-end acquisition interface generation module, and the data acquisition execution module, respectively. This module is used to detect changes in the data model and, based on the changes, drive the data model parsing module, the acquisition strategy generation module, the web front-end acquisition interface generation module, and the data acquisition execution module to make adaptive adjustments.
[0102] Furthermore, the data model update monitoring and adaptation module includes a model change monitoring unit, a change analysis unit, a data acquisition strategy adjustment unit, and an interface dynamic update unit.
[0103] Specifically, the model change monitoring unit is connected to the data model parsing module and is used to monitor the rule expression output by the data model parsing module to obtain model change notifications.
[0104] The model change monitoring unit is responsible for real-time monitoring of the rule expressions (including a structured list of model metadata) output by the data model parsing module, accurately capturing data model change signals, generating standardized data model change notifications, providing a trigger basis for subsequent difference analysis and adjustments to various modules, and ensuring that data model changes are identified in a timely manner.
[0105] The model change monitoring unit connects with the data model parsing module to obtain its output rule expressions and structured model metadata list in real time (the core carrier of the data model, in which any changes to the model will be directly reflected).
[0106] The model change monitoring unit works as follows: The first step is to load the preset monitoring configuration parameters after the model change monitoring unit is started, including the rule expression comparison cycle (such as once a day, which can be flexibly adjusted according to system needs), change judgment criteria (such as addition or deletion of rule expression content, changes in metadata list field attributes, etc.), and the standardized format of model change notification. At the same time, the rule expression and metadata list output by the current data model parsing module are cached as the benchmark data for subsequent comparisons. Step 2: According to the preset cycle, obtain the latest rule expressions and metadata list from the data model parsing module in real time, and compare them with the local cached baseline data field by field and attribute by attribute to determine whether there are any changes to the data model. Step 3: If the comparison results show that the latest data is completely consistent with the baseline data, it is determined that there is no model change, the parsing timestamp of the local cache is updated, and the next round of comparison is awaited; if the comparison results show that there are information differences (such as field addition, type modification, constraint adjustment, etc.), it is determined that the data model has changed, and the change notification generation process is triggered. Step 4: Generate a standardized model change notification according to a preset format. The core content of the notification may include: change trigger time, model association identifier, change perception source (rule expression comparison in the data model parsing module), and preliminary change signal (only indicating that a change exists, without including specific change content), ensuring that the subsequent change analysis unit can quickly obtain core basic information. After generation, the change notification is synchronously transmitted to the change analysis unit, and the locally cached benchmark data is updated (replaced with the latest rule expression and metadata list), laying the foundation for the next round of monitoring and comparison.
[0107] For example, the model change monitoring unit adopts a monitoring mechanism that combines timed polling with event notification. It periodically polls the data model storage address (such as a version control system or model server) and compares the hash value or version number of the model file; at the same time, it receives model change event notifications sent by the data source (based on the WebHook mechanism), and immediately triggers the model update process when a model change is detected.
[0108] The change analysis unit, connected to the model change monitoring unit, is used to receive the old and new data models, perform a difference comparison analysis on the old and new data models through a tree structure comparison algorithm, identify the changes, and generate a change report.
[0109] Specifically, the change analysis unit connects with the model change monitoring unit to receive model change notifications (obtaining basic information such as change trigger time and model association identifier) and obtain the data models before and after the change. The data before the change (i.e., the baseline data cached by the model change monitoring unit, which is the rule expression and structured model metadata list before the change) and the data after the change (i.e., the latest rule expression and metadata list synchronized by the model change monitoring unit) are obtained.
[0110] The model change notification, the rule expression before the change (including the metadata list), and the rule expression after the change (including the metadata list) are used together as the basis for the difference comparison analysis.
[0111] Essentially, it involves comparing the current data with historical data to determine the specific differences. Since the change analysis unit can receive model change notifications, it means that there must be differences between the current data and historical data.
[0112] For example, the change analysis unit performs a comparative analysis of the differences between the old and new data models. Through a tree-structured comparison algorithm, it identifies changes such as the addition of fields, deletion of fields, changes in field types, modification of field constraints, and adjustment of relationships, and generates a detailed change report.
[0113] The data acquisition strategy adjustment unit is connected to both the change analysis unit and the data acquisition strategy generation module. It is used to receive the change report and issue instructions to the data acquisition strategy generation module to update the data acquisition strategy.
[0114] Specifically, the strategy adjustment unit is connected to the change analysis unit, receives the change reports transmitted by the unit, and obtains the change type, detailed list, scope of impact, and adjustment suggestions, which serve as the sole basis for generating strategy update instruction information. The data acquisition strategy adjustment unit is connected to the data acquisition strategy generation module: it sends strategy update instructions to the module and receives the strategy update status (success / failure) from the module, ensuring a closed loop in the strategy update process.
[0115] For example, the data acquisition strategy adjustment unit changes the model content so that the data acquisition strategy generation module automatically updates the acquisition strategy. For newly added fields, it generates corresponding acquisition control configurations and validation rules; for changes in field type, it updates data input controls and data processing rules; for deleted fields, it marks them as obsolete in the acquisition strategy and stops related acquisition operations. It can be understood that the data acquisition strategy adjustment unit can connect to the data acquisition strategy generation module to send instruction information, which may include how to adjust and the content of the changes.
[0116] For another example, after receiving the change report transmitted by the change analysis unit, the data collection strategy adjustment unit analyzes the change type, detailed list, scope of impact and adjustment suggestions one by one to clarify the data collection strategy adjustment requirements corresponding to each type of change. At the same time, it combines the current strategy status fed back by the data collection strategy generation module to ensure the relevance and feasibility of the instruction information. Based on the analyzed adjustment requirements, standardized strategy update instruction information is generated according to the change type. The instruction types and corresponding contents are as follows: (1) For the "field addition" change: generate "new collection strategy configuration instruction", the instruction collection strategy generation module configures the corresponding collection control type (such as string field matching single-line text box), data validation rules (such as format validation, non-empty validation), and data processing rules (such as type mapping rules to adapt to subsequent storage requirements) for the new field, and integrates the new configuration into the original collection strategy; (2) Regarding the "field deletion" change: Generate an "obsolete collection strategy configuration instruction". The instruction collection strategy generation module marks the configuration item of the corresponding deleted field as obsolete in the original collection strategy and stops the related collection configuration of the field (such as no longer generating collection control configuration and deleting the corresponding verification rule). (3) For the "Field type change" change: Generate "Update Collection Strategy Configuration Instruction", and the instruction collection strategy generation module updates the collection control type of the corresponding field (such as integer field matching numeric input box), data processing rules (such as type conversion rule adjustment), and verification rules (such as value range verification to adapt to new types); (4) For the change of “field constraint modification / association relationship adjustment”: generate “optimized collection strategy configuration instruction”, and the instruction collection strategy generation module updates the corresponding field’s verification rules (such as value range adjustment, mandatory field verification modification) and the collection linkage rules of associated fields (such as linkage logic after the association relationship adjustment).
[0117] The generated policy update instruction information is verified for compliance. Once the verification is successful, the instruction information is sent to the policy generation module. The data acquisition strategy adjustment unit receives the strategy update status (success / failure) from the data acquisition strategy generation module: If the update is successful, the updated data acquisition strategy information (such as new verification rules and control configuration items) is synchronously transmitted to the data acquisition execution module to drive it to adjust the execution logic such as input verification and data processing; if the update fails, the unit sends out exception information (including the reason for failure, such as instruction compatibility issues), which can prompt maintenance personnel to intervene and investigate.
[0118] The interface dynamic update unit is connected to the acquisition strategy adjustment unit and the WEB front-end acquisition interface generation module. It is used to receive updated acquisition strategies and update the interface controls of the WEB front-end acquisition interface generation module using an incremental update method.
[0119] The interface dynamic update unit is connected to the data collection strategy adjustment unit, and receives the updated data collection strategy transmitted by it (focusing on obtaining information such as the changed data collection control configuration, verification rules, and field association relationships), which serves as the core basis for front-end interface updates.
[0120] The interface dynamic update unit connects with the WEB front-end acquisition interface generation module: it sends interface update driving instructions to the module and receives feedback on the interface update status (success / failure, update progress) to ensure a closed loop in the interface update process.
[0121] For example, the web front-end data collection interface generation module dynamically updates the interface controls according to the updated data collection strategy, allowing for interface adjustments without requiring the user to refresh the page. Incremental update technology is used, updating only the changed interface elements, reducing resource consumption.
[0122] In addition, the data model update monitoring and adaptive module also includes a historical data migration unit, which is connected to the change analysis unit and the data acquisition and execution module respectively. The historical data migration unit is used to receive the change report, extract the historical data and corresponding change rules involved in the data model change, and perform migration processing on the historical data according to the change rules.
[0123] Specifically, the historical data migration unit extracts the historical data (stored in the storage medium associated with the data acquisition execution module, such as the data storage and processing unit) and the corresponding change migration rules involved in the model change from the report. Based on the change type (field type change, field splitting, field merging, etc.), the historical data is migrated in a targeted manner according to the preset migration rules. During the migration process, data consistency verification is carried out in real time (including data integrity, accuracy and format compatibility verification before and after migration) to ensure that the historical data after migration is completely matched with the changed data model and the updated acquisition strategy and is accurate.
[0124] In one embodiment, an intelligent data model acquisition system based on a web front-end further includes a system management and monitoring module, which is connected to the data model parsing module, the acquisition strategy generation module, the web front-end acquisition interface generation module, and the data acquisition execution module, respectively, for user management, access control, operation monitoring, and log analysis.
[0125] The system management and monitoring module includes a user and permission management unit, an operation status monitoring unit, and a log analysis and reporting unit.
[0126] Furthermore, the user and permission management unit is connected to the WEB front-end data collection interface generation module, the data collection strategy generation module, and the data collection execution module, respectively, and is used to manage user permissions based on the role-based access control model.
[0127] Specifically, the user and permission management unit is connected to the WEB front-end data collection interface generation module. First, it pushes permission configuration information for different roles to drive the front-end interface to dynamically display operable functions (such as hiding buttons and menus that do not have permission). Second, it receives login and operation requests from front-end users and performs permission verification. The user and permission management unit is connected to the data collection strategy generation module: pushes policy configuration permission information and controls user permissions for adding, modifying, and deleting data collection strategies; The user and permission management unit connects to the data acquisition and execution module: it pushes data operation permission information and manages user permissions such as viewing, exporting, and accessing logs of collected data.
[0128] The operation status monitoring unit is connected to the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module, respectively, and is used to monitor the operation status of the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module through preset operation status monitoring thresholds.
[0129] Specifically, the operation status monitoring unit establishes connections with the data model parsing module, the data acquisition strategy generation module, the web front-end data acquisition interface generation module, and the data acquisition execution module, and monitors the real-time operation status data of each module. For example, hardware resource indicators: CPU utilization, memory usage, disk storage space, and network bandwidth usage; module operation indicators: response time of each module, task processing throughput, and interface call success rate; business operation indicators: data acquisition task execution progress, task success rate, data source connection status, and interface update success rate.
[0130] The log analysis and reporting unit is connected to the data model parsing module, the collection strategy generation module, the web front-end collection interface generation module, and the data collection execution module, respectively. It is used to analyze the logs generated by the data model parsing module, the collection strategy generation module, the web front-end collection interface generation module, and the data collection execution module, and generate various core reports.
[0131] Specifically, the log analysis and reporting unit can include two types of logs: Data collection log: generated by the data collection execution module, including information such as collection task number, execution time, number of data collected, number of successful data collections, reason for failure, and user of operation; System operation log: jointly generated by the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module, including module start / stop time, interface call records, exception error messages, configuration modification records, and other information.
[0132] For example, the system management and monitoring module is responsible for user management, access control, operation monitoring, and log analysis of the system.
[0133] User and Access Control: Based on the RBAC (Role-Based Access Control) model, this system implements user registration, login, role assignment, and access control configuration. Different roles are assigned different access permissions; for example, administrators have system configuration and user management permissions, data collectors have data collection and viewing permissions, and auditors have log viewing permissions, ensuring system operational security.
[0134] Operational Status Monitoring: The system's operational status is displayed in real time through a monitoring dashboard, including metrics such as CPU utilization, memory usage, network bandwidth, module response time, data acquisition task execution progress, and data source connection status. A threshold alarm mechanism is configured to promptly notify the administrator via email, SMS, and in-system notifications when metrics exceed thresholds.
[0135] Log analysis and reporting: Perform multi-dimensional analysis on collected logs and system operation logs to generate data collection efficiency reports (such as daily collection volume and average collection time), data quality reports (such as data error rate and error type distribution), system performance reports (such as module response time trend), etc., to provide data support for system optimization and decision-making.
[0136] Example Taking a financial institution's customer information collection system as an example: Application scenario: A commercial bank needs to collect basic information, financial information, and risk assessment information of individual customers. The customer information model must comply with regulatory requirements and will change with policy adjustments and business expansion, such as adding a "personal tax residency declaration" field and refining the "occupation information" dimension.
[0137] System features: Emphasis is placed on data security and compliance, sensitive information must be stored in encrypted form, the data collection process must be traceable and auditable, and both online data collection by account managers and self-service data collection by customers are supported.
[0138] Data Model Parsing: Parse the XML model file containing fields such as customer name (string, required), ID number (string, must conform to 18-digit verification rules), annual income (numerical, tiered enumeration), and risk tolerance (enumeration type, conservative / moderate / aggressive). The focus is on extracting the ID number verification rules and the logical relationships between risk assessment fields (such as the matching rules between annual income and risk tolerance).
[0139] Security data collection strategy: The ID number input field is masked (only the first and last characters are displayed), and the public security interface is called for real-name verification after input; sensitive fields are transmitted using HTTPS encryption and stored using the AES-256 encryption algorithm; the collection log records detailed operation traces, including input modification history.
[0140] Model Change Handling: When regulatory requirements add an "Anti-Money Laundering Risk Level" field, the system detects the model change, automatically adds a drop-down selection box (low / medium / high) for this field to the data collection interface, and generates correlation verification rules with other risk fields; at the same time, it marks existing customer data and prompts account managers to collect information for this field.
[0141] Implementation results: The completeness rate of customer information collection has increased to 98%, the data error rate has been reduced to 0.2%, meeting regulatory compliance requirements; the response time for model changes has been shortened from 3 days to 15 minutes, reducing technical maintenance costs by approximately 500,000 yuan per year.
[0142] Example An algorithm and design logic for an intelligent data model acquisition system based on a web front-end: 1. Data extraction algorithm The core of data extraction is to extract the required fields or information from the original data source according to rules, supporting multiple extraction modes: /
[0143] Data extraction algorithm @param {Array|Object} data - Raw data @param {Object|Array} rules - Extraction rules @param {String} mode - Extraction mode: 'include' or 'exclude' @returns {Array|Object} Extracted data / function extractData(data, rules, mode = 'include') { / / Processing array data if (Array.isArray(data)) { return data.map(item =>extractData(item, rules, mode)); } / / Process object data if (typeof data === 'object'&&data !== null) { const result = {}; const keys = Object.keys(data); if (mode === 'include') { / / Include pattern: Only keep the fields specified in the rule if (Array.isArray(rules)) { / / Simple field list rules rules.forEach(key =>{ if (keys.includes(key)) { result[key] = data[key]; } }); } else if (typeof rules === 'object') { / / Complex rules, supporting nested extraction Object.keys(rules).forEach(key =>{ if (keys.includes(key)) { / / If the rule is an object, it means that nested data needs to be extracted further. if (typeof rules[key] === 'object') { result[key] = extractData(data[key], rules[key], mode); } else if (rules[key] === true) { / / Directly include this field result[key] = data[key]; } } }); } } else { / / Exclusion mode: Remove the field specified in the rule keys.forEach(key =>{ if (!rules.includes(key)) { result[key] = data[key]; } }); } return result; } / / Non-object / array data is returned directly return data; } 2. Data overlay algorithm Data overlay is used to merge multiple datasets according to certain rules, supporting field mapping and conflict handling. /
[0144] Data overlay algorithm @param {Array} datasets - Array of datasets @param {Object} options - Overlay options @param {Object} options.mapping - Field mapping relationships @param {String} options.conflict - Conflict handling strategy: 'overwrite', 'merge', 'skip' @param {String} options.key - The primary key field used for matching. @returns {Array} Stacked dataset / function overlayData(datasets, options = {}) { const { mapping = {}, conflict = 'overwrite', key = null } = options; / / Reverse the mapping relationship for reverse lookup const reverseMapping = {}; Object.keys(mapping).forEach(targetKey =>{ reverseMapping[mapping[targetKey]] = targetKey; }); let result = []; datasets.forEach datasetLoop =>(dataset, datasetIndex) =>{ dataset.forEach(item =>{ / / Handle field mapping const mappedItem = {}; Object.keys(item).forEach(field =>{ const targetField = reverseMapping[field] || field; mappedItem[targetField] = item[field]; }); if (key && key in mappedItem) { / / Given a primary key, attempt to match an existing record. const existingIndex = result.findIndex( r =>r[key] === mappedItem[key] ); if (existingIndex>= 0) { / / Already exists, handle according to conflict policy. switch (conflict) { case 'overwrite': result[existingIndex] = { ...result[existingIndex], ...mappedItem}; break case 'merge': / / Merge fields of object type Object.keys(mappedItem).forEach(field =>{ if (typeof result[existingIndex][field]=== 'object'&& typeof mappedItem[field] === 'object') { result[existingIndex][field]= { ...result[existingIndex][field], ...mappedItem[field] }; } else { result[existingIndex][field]= mappedItem[field]; } }); break case 'skip': / / Retain existing data without processing. break } } else { / / Does not exist, add directly result.push(mappedItem); } } else { / / No primary key, add directly result.push(mappedItem); } }); }); return result; } 3. Group Extraction Algorithm Grouping and extraction allows data to be grouped according to specified dimensions, and supports extracting specific information from each group: /
[0145] Grouping Extraction Algorithm @param {Array} data - Raw data @param {String|Array} groupBy - Grouping field or array of fields @param {Object} extractRules - Extraction rules, defining the information to be extracted from each group. @returns {Object} Grouped Extraction Results / function groupExtract(data, groupBy, extractRules) { / / Ensure groupBy is an array const groupFields = Array.isArray(groupBy) ? groupBy : [groupBy]; / / Create a grouped container const groups = {}; data.forEach(item =>{ / / Generate group keys const groupKeyParts = groupFields.map(field =>{ / / Supports nested fields, such as 'user.name' return field.split('.').reduce((obj, key) =>{ return obj&&obj[key] !== undefined ? obj[key]: null; }, item); }); const groupKey = groupKeyParts.join('||'); / / Create the group if it does not exist. if (!groups[groupKey]) { / / Initialize group information, including group key-value pairs groups[groupKey] = { _groupKeys: groupKeyParts.reduce((obj, val, index) =>{ obj[groupFields[index]] = val; return obj; }, {}) }; / / Initialize the fields defined in the extraction rules Object.keys(extractRules).forEach(key =>{ const rule = extractRules[key]; if (rule.type === 'array') { groups[groupKey][key] = []; } else if (rule.type === 'count') { groups[groupKey][key] = 0; } else if (rule.type === 'sum') { groups[groupKey][key] = 0; } else if (rule.type === 'first') { groups[groupKey][key] = null; } }); } / / Apply extraction rules Object.keys(extractRules).forEach(key =>{ const rule = extractRules[key]; const fieldValue = rule.field.split('.').reduce((obj, k) =>{ return obj&&obj[k] !== undefined ? obj[k]: null; }, item); switch (rule.type) { case 'array': / / Collect all values into an array groups[groupKey][key].push(fieldValue); break case 'count': / / Count groups[groupKey][key]++; break case 'sum': / / Summation if (typeof fieldValue === 'number') { groups[groupKey][key]+= fieldValue; } break case 'first': / / Get the first value if (groups[groupKey][key]=== null) { groups[groupKey][key]= fieldValue; } break case 'unique': / / Get unique value if (!groups[groupKey][key].includes(fieldValue)) { groups[groupKey][key].push(fieldValue); } break } }); }); / / Return as an array return Object.values(groups); } 4. Hierarchical summary algorithm Hierarchical summarization performs statistical summarization layer by layer according to the hierarchical structure of the data: /
[0146] Hierarchical summarization algorithm @param {Array} data - Raw data @param {Array} levels - Hierarchical definition, such as [{field: 'region'}, {field: 'city'}, {field: 'district'}] @param {Object} metrics - Summary metric definition @returns {Object} Hierarchical summary results / function hierarchicalSummary(data, levels, metrics) { / / Construct hierarchical mapping relationship const levelFields = levels.map(level =>level.field); / / Recursive summary function function summarize(data, levelIndex) { if (levelIndex>= levels.length) { / / Reach the lowest level and calculate the metrics const result = {}; Object.keys(metrics).forEach(metricKey =>{ const metric = metrics[metricKey]; switch (metric.type) { case 'count': result[metricKey] = data.length; break case'sum': result[metricKey] = data.reduce((sum, item) =>{ return sum + (item[metric.field] || 0); }, 0); break; case 'avg': const sum = data.reduce((sum, item) =>{ return sum + (item[metric.field] || 0); }, 0); result[metricKey] = data.length>0? sum / data.length : 0; break; case'max': result[metricKey] = data.reduce((max, item) =>{ const val = item[metric.field] || -Infinity; return val>max? val : max; }, -Infinity); break; case'min': result[metricKey] = data.reduce((min, item) =>{ const val = item[metric.field] || Infinity; return val<min? val : min; }, Infinity); break; } }); return result; } / / Current level fields const currentLevel = levels[levelIndex]; const currentField = currentLevel.field; / / Group by current level const groups = {}; data.forEach(item =>{ const key = item[currentField] || 'unknown'; if (!groups[key]) { groups[key] = []; } groups[key].push(item); }); / / Summarize each group and recursively process the next level. const result = {}; Object.keys(groups).forEach(key =>{ const groupData = groups[key]; / / Calculate the metrics for the current level const currentMetrics = {}; Object.keys(metrics).forEach(metricKey =>{ const metric = metrics[metricKey]; switch (metric.type) { case 'count': currentMetrics[metricKey] = groupData.length; break case 'sum': currentMetrics[metricKey] = groupData.reduce((sum, item) =>{ return sum + (item[metric.field] || 0); }, 0); break / / Other indicator types are similar... } }); / / Recursively process the next layer const children = summarize(groupData, levelIndex + 1); result[key] = { ...currentMetrics, [currentField]: key, children: Object.keys(children).length>0 ? children : null }; }); return result; } / / Summarize starting from the first layer return summarize(data, 0); } In summary, the intelligent data model acquisition system based on a web front-end provided in this application has the following beneficial effects: 1. Significantly improves data acquisition efficiency: Through automatic data model parsing and strategy generation, the adaptation time of new data models is shortened from the traditional 1-3 days to 5-10 minutes, improving efficiency by more than 95%; the batch data import function improves the efficiency of large-scale data acquisition by 5-10 times; asynchronous transmission and breakpoint resume technology reduce data transmission time loss, shortening the overall data acquisition cycle by 60%-80%.
[0147] 2. Significantly improve data quality: Multi-dimensional data verification mechanisms (real-time front-end verification, batch import verification, and business rule verification) reduce the data error rate from the traditional 5%-10% to below 0.5%; data cleaning strategies effectively remove redundant data and correct abnormal data, improving data availability by more than 70% and providing a reliable foundation for subsequent data analysis.
[0148] 3. Extremely strong system adaptability: The data model change monitoring and adaptive adjustment function realizes "the system changes automatically when the model changes", without the need for manual modification of code or configuration. The response speed to adapt to business changes is improved by more than 90%, which is especially suitable for industry scenarios where data models change frequently.
[0149] 4. Reduced labor costs: The automated data collection process reduces manual intervention by more than 80%, saving a significant amount of manpower costs for data entry, script writing, and model adaptation; the user-friendly interface design reduces the learning cost for operators, and ordinary employees can be competent in data collection work after simple training.
[0150] 5. High security and maintainability: Encrypted data storage and transmission ensure data security; comprehensive logging and version management functions facilitate system problem troubleshooting and tracing; modular design makes the responsibilities of each module clear, facilitating function expansion and maintenance upgrades.
[0151] It should be noted that the modules and units involved in the intelligent data model acquisition system based on a web front-end in this application are... Obviously, those skilled in the art should understand that the various units or steps of this application described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. Optionally, they can be implemented using computer-executable program code, thereby storing them in a storage device for execution by a computing device, or fabricating them separately as individual integrated circuit modules, or fabricating multiple modules or steps into a single integrated circuit module. Thus, this application is not limited to any particular combination of hardware and software.
Claims
1. A web-based intelligent data model acquisition system, characterized in that, include: The data model parsing module is used to parse the definition rules of the data model and form rule expressions; The data acquisition strategy generation module is connected to the data model parsing module and is used to generate the optimal data acquisition strategy based on the rule expression. The WEB front-end data acquisition interface generation module is connected to the data acquisition strategy generation module and is used to generate an interactive WEB front-end data acquisition interface based on the rule expression and the optimal data acquisition strategy. The data acquisition execution module is connected to the WEB front-end acquisition interface generation module and is used to acquire target data in the interactive WEB front-end acquisition interface according to the optimal data acquisition strategy, and to verify and process the acquired target data.
2. The intelligent data model acquisition system based on a web front-end as described in claim 1, characterized in that, The data model parsing module includes: The hierarchical relationship parsing unit is used to parse the original multi-format model definition file layer by layer using a recursive parsing algorithm, identify the inclusion relationship and inheritance relationship between the parent model and the child model, as well as the association rules between the fields of the child model and the fields of the parent model, and output a structured model hierarchical relationship graph. The format unification processing unit, connected to the hierarchical relationship parsing unit, is used to receive the original multi-format model definition file and the structured model hierarchical relationship graph, unify the file format of the original multi-format model definition file, and combine it with the structured model hierarchical relationship graph to form a unified intermediate format model definition file, wherein the unified intermediate format model definition file represents a rule description containing complete hierarchy and inheritance relationship; The metadata extraction unit, connected to the format unification processing unit, is used to use a lexical analyzer to identify the attributes of key fields in the unified intermediate format model definition file to obtain a structured model metadata list. The attributes of the key fields represent at least one of the following: field name, data type, length limit, value range, required attributes, default value, and associated fields. The rule expression generation unit is connected to the metadata extraction unit and the format unification processing unit, respectively, and is used to combine the structured model metadata list with the unified intermediate format model definition file to form a rule expression.
3. The intelligent data model acquisition system based on a web front-end as described in claim 2, characterized in that, The data acquisition strategy generation module includes: A multi-source acquisition and adaptation unit is connected to the data model parsing module. It is used to receive the rule expression and various types of data sources, and generate corresponding acquisition and adaptation strategies for each type of data source in combination with the rule expression. It can also generate multiple data acquisition and adaptation strategies for the same type of data source. The strategy decision engine unit, connected to the multi-source acquisition adaptation unit, is used to generate multiple data acquisition adaptation strategies based on the rule expression and for the same type of data source. In the preset strategy generation rule base, the optimal strategy is selected for multiple data acquisition adaptation strategies for the same type of data source by using the strategy selection experience and effect indicators of historical acquisition data, so as to obtain the preliminary optimal data acquisition strategy for the same type of data source. A data acquisition scheduling strategy unit, connected to the strategy decision engine unit, is used to integrate the current data scheduling parameters into the preliminary optimal data acquisition strategy to obtain the optimized optimal data acquisition strategy for the same type of data source. The data scheduling parameters include at least one of data size, update frequency, and priority. The data cleaning strategy unit, connected to the acquisition scheduling strategy unit, is used to extract the attributes of key fields that match the optimized optimal data acquisition strategy from the metadata extraction unit according to the optimized optimal data acquisition strategy of the same type of data source, and generate corresponding data cleaning rules. The data cleaning rules represent at least one of the following: removing spaces, filtering special characters, standardizing date formats, and converting numerical units.
4. The intelligent data model acquisition system based on a web front-end as described in claim 3, characterized in that, The WEB front-end data acquisition interface generation module includes: The interface layout engine is connected to the data model parsing module and the acquisition strategy generation module respectively. It is used to receive the attributes of key fields in the rule expression and the optimized data acquisition strategy of the same type of data source. It uses an adaptive layout algorithm that combines grid layout and flow layout to group and lay out the key fields according to their importance to obtain an interactive WEB front-end acquisition interface. The intelligent control matching unit, connected to the data model parsing module, is used to receive the data type of the key field and a preset mapping relationship library between data types and UI controls. Based on the mapping relationship library, it matches the corresponding UI controls for the key fields of different data types to obtain the UI control configuration corresponding to each key field. The mapping relationship library includes at least one of the following mapping relationships between string, integer / floating-point number, enumeration, boolean, date / time, file, array type and corresponding UI control:
5. The intelligent data model acquisition system based on a web front-end as described in claim 1, characterized in that, The data acquisition and execution module includes: The data storage and processing unit is connected to the WEB front-end acquisition interface generation module. It is used to receive the optimized optimal data acquisition strategy, the front-end acquisition target data and the data source type, select the appropriate storage method according to the data source type, and perform field type mapping and encrypted storage of the target data. The front-end data input processing unit is connected to the WEB front-end acquisition interface generation module. It is used to receive user input data from the interactive WEB front-end acquisition interface and verify the user input data against preset verification rules. If the verification passes, the user is allowed to input data; otherwise, the user is prompted to correct the input data.
6. The intelligent data model acquisition system based on a web front-end as described in claim 1, characterized in that, It also includes a data model update monitoring and adaptation module, which is connected to the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module, respectively. This module is used to detect changes in the data model and, based on the changes, drive the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module to make adaptive adjustments.
7. The intelligent data model acquisition system based on a web front-end as described in claim 6, characterized in that, The data model update monitoring and adaptive module includes: The model change monitoring unit is connected to the data model parsing module and is used to monitor the rule expression output by the data model parsing module and obtain model change notifications. The change analysis unit, connected to the model change monitoring unit, is used to receive the old and new data models, perform a difference comparison analysis on the old and new data models through a tree structure comparison algorithm, identify the change content, and generate a change report. The data acquisition strategy adjustment unit is connected to both the change analysis unit and the data acquisition strategy generation module, and is used to receive the change report and issue an instruction to the data acquisition strategy generation module to update the data acquisition strategy. The interface dynamic update unit is connected to the acquisition strategy adjustment unit and the WEB front-end acquisition interface generation module. It is used to receive updated acquisition strategies and update the interface controls of the WEB front-end acquisition interface generation module using an incremental update method.
8. The intelligent data model acquisition system based on a web front-end as described in claim 1, characterized in that, It also includes a system management and monitoring module, which is connected to the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module, respectively, for user management, access control, operation monitoring, and log analysis.
9. The intelligent data model acquisition system based on a web front-end as described in claim 8, characterized in that, The system management and monitoring module includes: The user and permission management unit is connected to the WEB front-end acquisition interface generation module, the acquisition strategy generation module, and the data acquisition execution module, respectively, and is used to manage user permissions based on the role-based access control model. The operation status monitoring unit is connected to the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module, respectively, and is used to monitor the operation status of the data model parsing module, the acquisition strategy generation module, the WEB front-end acquisition interface generation module, and the data acquisition execution module through a preset operation status monitoring threshold. The log analysis and reporting unit is connected to the data model parsing module, the collection strategy generation module, the web front-end collection interface generation module, and the data collection execution module, respectively. It is used to analyze the logs generated by the data model parsing module, the collection strategy generation module, the web front-end collection interface generation module, and the data collection execution module, and generate various core reports.
10. A computer program, characterized in that, When the computer program is executed by a processor, it implements the system according to any one of claims 1 to 9.