Data refreshing method and device, computer device, and storage medium

By combining a distributed scheduling engine and a message queue, the problem of low data refresh efficiency caused by missing or erroneous data fields in the business system was solved, achieving efficient data refresh and improved system stability.

CN122309531APending Publication Date: 2026-06-30PING AN HEALTH INSURANCE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
PING AN HEALTH INSURANCE CO LTD
Filing Date
2026-03-30
Publication Date
2026-06-30

Smart Images

  • Figure CN122309531A_ABST
    Figure CN122309531A_ABST
Patent Text Reader

Abstract

This invention relates to the field of data processing technology and discloses a data refresh method and apparatus, computer equipment, and storage medium, comprising: obtaining creation time information of a target field in an initial business table from a business system; segmenting the data according to the creation time information to obtain a data sequence time segment table, wherein the data sequence time segment table includes at least two time sub-intervals; reading data from at least two data sources through a distributed scheduling engine to obtain target data, and placing the target data into a message queue; in response to detecting that the message queue is not empty, generating target field data for the time sub-intervals based on the target data in the message queue; and refreshing the target field according to the target field data of all time sub-intervals to obtain a target business table. This invention can be applied to business data refresh scenarios in fintech and healthcare, solving the technical problem of low efficiency in reading data from data sources, which in turn leads to low data refresh efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology and can be applied to business data refresh scenarios in fintech and healthcare. In particular, it relates to a data refresh method and apparatus, computer equipment, and storage medium. Background Technology

[0002] Business systems can provide relevant services to users. For example, financial business systems (such as insurance service rights systems) can provide users with financial services such as insurance application and claims. Similarly, medical business systems (such as medical service rights systems) provide customers with medical-related services.

[0003] Currently, the business system is handling too much data, resulting in missing, incorrect, and inaccurate data fields, which makes it impossible to accurately provide services to customers. The relevant technology uses a relatively traditional Spring Boot scheduling task to retrieve data from the data source for data refresh, but the data refresh efficiency is poor. Summary of the Invention

[0004] This invention provides a data refresh method and apparatus, computer equipment, and storage medium to solve the technical problem of low efficiency in reading data from a data source, which in turn leads to low data refresh efficiency.

[0005] Firstly, a data refresh method is provided, the method comprising: Retrieve the creation time information of the target field from the initial business table in the business system; The data sequence time segment table is obtained by segmenting the data based on the creation time information, wherein the data sequence time segment table includes at least two time sub-intervals; The target data is obtained by reading data from at least two data sources through a distributed scheduling engine, and the target data is placed into a message queue; wherein the creation time of the target data is located within the time sub-interval. In response to detecting that the message queue is not empty, target field data for the time sub-interval is generated based on the target data in the message queue; The target field is refreshed based on the target field data of all the time sub-intervals to obtain the target business table.

[0006] Secondly, a data refresh device is provided, the device comprising: The time information acquisition module is used to obtain the creation time information of the target field in the initial business table from the business system; The time segmentation module is used to segment the data sequence according to the creation time information to obtain a data sequence time segment table, wherein the data sequence time segment table includes at least two time sub-intervals. The data reading module is used to read data from at least two data sources through a distributed scheduling engine, obtain target data, and put the target data into a message queue; wherein the creation time of the target data is located within the time sub-interval. The data consumption module is used to generate target field data for the time sub-interval based on the target data in the message queue in response to the detection that the message queue is not empty; The data refresh module is used to refresh the target field based on the target field data of all the time sub-intervals to obtain the target business table.

[0007] Thirdly, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above-described data refresh method when executing the computer program.

[0008] Fourthly, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the aforementioned data refresh method.

[0009] In the aforementioned data refresh method, data refresh device, computer equipment, and storage medium, the creation time information of the target field in the initial business table is first obtained from the business system. Then, the data is segmented according to the creation time information to obtain a data sequence time segment table, which includes at least two time sub-intervals. Next, a distributed scheduling engine reads data from at least two data sources to obtain the target data, and places the target data into a message queue. The creation time of the target data is located within the time sub-interval. Then, in response to detecting that the message queue is not empty, target field data for the time sub-interval is generated based on the target data in the message queue. Finally, the target field is refreshed according to the target field data of all time sub-intervals to obtain the target business table. In this invention, the creation time information is divided into multiple time sub-intervals, and the target data corresponding to each interval can be read independently and in parallel from the data source. Furthermore, the distributed scheduling engine schedules multiple data sources to concurrently execute reading tasks, theoretically increasing the data reading throughput to several times the single-machine processing capacity. Moreover, the message queue decouples the "reading" and "target field data generation," allowing subsequent stages to be independently expanded and optimized. In summary, this invention improves the efficiency of reading data from the data source, thereby improving the data refresh efficiency. Attached Figure Description

[0010] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 This is a schematic diagram of an application environment for a data refresh method according to an embodiment of the present invention; Figure 2 This is a flowchart illustrating a data refresh method according to an embodiment of the present invention; Figure 3 yes Figure 2 A schematic diagram of a specific implementation method for step S10; Figure 4 yes Figure 2 A schematic diagram of a specific implementation method for step S20; Figure 5 yes Figure 2 A schematic diagram of a specific implementation method for step S30; Figure 6 This is a schematic diagram of a data refresh device in one embodiment of the present invention; Figure 7 This is a schematic diagram of the structure of a computer device according to an embodiment of the present invention; Figure 8 This is another structural schematic diagram of a computer device according to one embodiment of the present invention. Detailed Implementation

[0012] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0013] To adapt to business development needs, business systems require architectural upgrades and system replacements, such as the medical service benefits system. Specifically, the medical service benefits system provides health insurance customers with medical-related services, primarily reflected in medical vouchers and performance verification. However, the service benefits system is a later-developed system with significant data aggregation work, including merging large amounts of benefits data from the insurance and service contract sides. The data volume is substantial, and the data is distributed evenly over time. Due to historical reasons, the data in the medical service benefits system may contain missing, incorrect, or inaccurate fields. This can lead to inaccurate vouchers and performance verification errors when providing medical vouchers and performance verification to customers, resulting in poor customer experience and potential financial losses for the company.

[0014] In related technologies, data refresh is used to address situations where data fields are missing, incorrect, or inaccurate. However, this approach uses a relatively traditional Spring Boot task-based table lookup method, which results in very poor data refresh efficiency and excessive database pressure, leading to a large number of slow SQL queries and causing alarms and performance degradation in the business system.

[0015] To address the problems of the existing technologies mentioned above, this invention proposes a data refresh method, a data refresh device, a computer device, and a storage medium. Specifically, this invention proposes a technology for refreshing large batches of data based on a scheduling engine and message queues. This technology utilizes middleware such as a distributed scheduling engine, message queues, and a distributed database to improve data refresh efficiency, enhance the robustness and fault tolerance of the business system, and reduce the incidence of production accidents, slow SQL queries, and system problems. When the business system is a medical service rights system, this invention can also improve the customer's medical experience and reduce the company's loss rate.

[0016] The data refresh method provided in this embodiment of the invention can be applied to, for example, Figure 1In this application environment, the client communicates with the server via a network. The client can initiate a refresh request to the server. After receiving the refresh request, the server obtains the creation time range corresponding to the target field of the initial business table in the business system. The server segments the creation time range to obtain a data sequence time segment table, which includes at least two time sub-intervals. The server uses a distributed scheduling engine to read data from at least two data sources using the time sub-intervals to obtain the target data whose creation time is within the time sub-interval, and puts the target data into a message queue. In response to detecting that the message queue is not empty, the server generates a target data table for the time sub-interval based on the target data in the message queue. The server refreshes the target field according to the target data tables of all time sub-intervals to obtain the target business table. In this invention, the creation time range is divided into multiple time sub-intervals, and the target data corresponding to each interval can be read independently and in parallel from the data source. Furthermore, a distributed scheduling engine schedules multiple data sources to concurrently execute reading tasks, theoretically increasing the throughput of data reading and initial filtering to several times the processing capacity of a single machine (limited by network bandwidth, database concurrency capabilities, downstream write capabilities, etc.). Moreover, a message queue decouples the "reading" and "target field data generation," allowing subsequent stages to be independently expanded and optimized. In summary, this invention improves the efficiency of data source reading, thereby improving data refresh efficiency. The client can be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented using a standalone server or a server cluster consisting of multiple servers. The invention will be described in detail below through specific embodiments.

[0017] Please see Figure 2 As shown, Figure 2 A flowchart illustrating a data refresh method provided in an embodiment of the present invention includes steps S10-S50.

[0018] S10: Obtain the creation time information of the target field in the initial business table from the business system.

[0019] The data refresh method provided by this invention can be applied to data refresh engines in various application scenarios. The data refresh engine can be implemented through a server. For example, in the healthcare field, the medical service rights system involves a large amount of data aggregation work, including merging a large amount of rights data from the insurance side and service contract side. The data scale is large, and the data distribution is evenly distributed over time. Due to historical reasons, the data contains missing, incorrect, or inaccurate fields. When providing medical vouchers and verifying performance for customers, inaccurate vouchers and performance verification errors may occur, leading to poor customer experience and company losses. Therefore, refreshing the data in the medical service rights system through a data refresh engine can not only improve data refresh efficiency but also enhance the robustness and fault tolerance of the business system, reduce the incidence of production accidents, slow SQL queries, and system problems, thereby improving the customer's medical experience and reducing the company's loss rate.

[0020] Business Systems: Application systems that support daily business operations. For example, business systems include e-commerce transaction systems and medical service rights systems. E-commerce Transaction System: The core transaction processing platform supporting e-commerce business, encompassing a complete set of back-end capabilities from user order placement to payment, delivery, and after-sales service. Medical Service Rights System: A comprehensive information system provided to medical institutions, insurers, pharmaceutical companies, and other participants for managing, distributing, and reconciling "medical service rights."

[0021] Initial Business Tables: Core data tables used or generated in the initial stages of a business system, serving as the baseline for subsequent data governance, updates, or migrations. Initial business tables include multiple fields. For example, the rights and benefits rule table in a healthcare service rights system (defining various available healthcare service rights and their triggering rules) includes the following fields: rule_id: Unique identifier for the rights and benefits rule; name: Rule name (e.g., "outpatient drug cost reduction"); category: Rights and benefits category (medical insurance, commercial insurance, drug benefits, etc.); scope: Scope of application (whole hospital, specific department, specific drug catalog, etc.); amount_type: Amount / proportion type (amount deduction, proportion deduction, cap, etc.); amount_value: Amount or proportion value; ceiling: Maximum limit; start_date / end_date: Effective time range; validity_period: Validity period / time granularity; status: Status (enabled, disabled, draft); created_at / updated_at: Creation and modification time; data_source: Data source.

[0022] Target fields: Fields that need to be monitored, refreshed, or synchronized in the initial business table, typically representing important business metrics or system status. For example, target fields in the equity rules table may include equity category, scope of application, amount / proportion type, etc.

[0023] Creation time range: Indicates the upper and lower bounds of the creation time of the target field within a certain time period. For example, the creation time range of field A is from 2025-12-01 00:00:00 to 2026-02-28 23:59:59.

[0024] In one example, the business system has an initial business table named Orders, which contains fields such as order_id, order_created_time, and order_status. It is necessary to retrieve the creation time information of the order_created_time field in order to ultimately refresh the data for that field.

[0025] In some embodiments of the present invention, reference is made to... Figure 3 Step S10, which is to obtain the creation time range corresponding to the target field of the initial business table in the business system, may include the following steps: S11: Obtain the initial business tables from the business system; S12: Perform field checks on the initial business table to obtain target fields, including fields with missing data and fields with data errors; S13: Retrieve the creation time information of the target field in the initial business table.

[0026] In step S11, the server can analyze the refresh request sent by the client to obtain the business table that the user intends to refresh, i.e., the initial business table. In one example, the refresh request can be generated based on natural language, such as "Please help refresh the data related to the rights and benefits rules in the medical service rights and benefits system"; the server analyzes the refresh request through a large language model to obtain the initial business table as the rights and benefits rules table.

[0027] The Large Language Model (LLM) here refers to a model built using deep learning methods and trained on large-scale text data that can understand and generate natural language. It has the ability to perform reasoning, answering questions, writing text, translating, summarizing, and conversing on various language tasks, and usually has strong versatility and transfer learning capabilities.

[0028] In step S12, field detection refers to performing a certain type of behavior or operation on a specified object. In this embodiment, it refers to performing detection and analysis on the fields of the "initial business table". For example, performing quality, completeness, and correctness checks on each field of the Orders table, such as creation time and amount.

[0029] When performing field checks, the following dimensions can be used: missing values, outliers, format errors, out-of-bounds range, uniqueness violations, etc. For example, check if the order_amount field in the Orders table contains null values, negative numbers, or values ​​that do not conform to the business scope. When problematic fields or specific problem types are identified during the check process, they are marked as target fields.

[0030] In one example, the output could include a list of fields with multiple target fields, along with sorting and grouping rules, to ensure traceability (each field has a problem description and evidence). For instance, the field list might include: order_amount (missing) and order_created_time (abnormal date format).

[0031] In step S13, the creation time information refers to the time interval during which the target field is first created or first becomes available in the data source. The time of field release or deployment is typically recorded in the field metadata table or system log audit table, and the creation time information of the target field can be obtained by querying these tables. Field metadata table: Field-level metadata, typically containing information such as the field's creation time, last modification time, and availability.

[0032] In one embodiment, step S13 may include: querying the creation time field in the field metadata table according to the target field to obtain the creation time information; if the field metadata table is missing, then attempting to deduce the first appearance time from the version history, logs, and audit tables to obtain the creation time information.

[0033] As can be seen from the above steps S11-S13, the innovative approach of sequentially locating the business table, then the target field, and finally the creation time information during the data refresh process facilitates subsequent steps to access the data source based on the creation time information. Compared with related technologies that access the data source based on the table name / identifier of the business table or the field name of the field, which leads to more complex data reading, this approach can reduce the pressure on the business system and the data source, effectively alleviate the defects of a large number of slow SQL queries, and help improve data reading efficiency.

[0034] In some embodiments of the present invention, step S13, namely obtaining the creation time information of the target field in the initial business table, may include the following steps: Obtain the creation time of the target field in the initial business table to get the field's creation time point; By expanding forward based on the field's creation time, we obtain the execution start time; and by expanding backward based on the field's creation time, we obtain the execution end time. The creation time information is obtained by merging the execution start time and execution end time.

[0035] Specifically, the field creation time can be obtained from the field metadata table or system log audit table in the business system. After obtaining the field creation time, it can be extended forward based on a preset time step (e.g., 1 minute or 1 hour) to obtain the execution start time, and similarly extended backward based on the same preset time step (e.g., 1 minute or 1 hour) to obtain the execution end time. For example, if the field creation time is 2026-02-06 14:07:00, and assuming a preset time step of 2 minutes, then the execution start time is 2026-02-06 14:05:00, and the execution end time is 2026-02-06 14:09:00. Finally, the execution start and end times are combined to obtain the creation time information.

[0036] As can be seen from the above steps S131-S133, the creation time information can be obtained by expanding based on the field creation time point, thereby improving the data coverage of subsequent data read from the data source based on the creation time information, avoiding data omission due to different time granularity or precision, and improving the accuracy of data refresh.

[0037] S20: Segment the data based on the creation time information to obtain a data sequence time segment table, which includes at least two time sub-intervals.

[0038] In some embodiments of the present invention, the time granularity of the execution start time is minutes, and the time granularity of the execution end time is minutes; see reference. Figure 4 Step S20, which involves segmenting the data based on the creation time information to obtain a data sequence time segment table, may include the following steps: S21: Divide the creation time information into segments by minute to obtain time segments; S22: The time granularity of the start time and end time of the time segment is padded to the second to obtain the time sub-interval.

[0039] In steps S21-S22, time granularity refers to the scale by which the creation time is divided into fixed time units (such as days, hours, minutes). For example, the creation time information is divided into minute granularity, with each minute being a time sub-interval. For example, if the creation time information is from 2026-01-01 00:00:00 to 2026-01-01 00:02:59, after segmentation and padding to the second, the following time sub-intervals can be obtained: 2026-01-01 00:00:00 to 2026-01-01 00:00:59, 2026-01-01 00:01:00 to 2026-01-01 00:01:59, and 2026-01-01 00:02:00 to 2026-01-01 00:02:59.

[0040] As can be seen from the above steps S21-S22, dividing the creation time information into multiple time sub-intervals for parallel data reading, processing and writing helps to improve data reading efficiency, thereby improving data refresh efficiency.

[0041] After step S21, the data refresh method may further include: determining whether the current time segment overlaps with the previous time segment; if they do not overlap, inserting the time segment into the data sequence time segment table; if they overlap, then no further segmentation of the creation time information. In this way, by detecting overlaps, the occurrence of duplicates in subsequent data readings is reduced, thereby improving data reading efficiency.

[0042] In another embodiment, in step S20, the creation time information can be segmented according to the time granularity as daily or hourly to obtain a data sequence time segment table. For example, 2026-03-01 00:00:00—2026-03-01 23:59:59 is a time sub-interval with a daily granularity.

[0043] In one example, the initial business table is the Orders table, and the target field corresponds to the creation time information, such as from 2026-01-01 00:00:00 to 2026-01-31 23:59:59. The creation time information can be segmented by daily granularity (one time sub-interval per day). Examples of sub-intervals are as follows: 2026-01-01 00:00:00—2026-01-01 23:59:59, 2026-01-02 00:00:00—2026-01-02 23:59:59, etc.

[0044] S30: Read data from at least two data sources through a distributed scheduling engine to obtain the target data, and put the target data into a message queue. The creation time of the target data is located in a time sub-interval.

[0045] Specifically, the Distributed Scheduling Engine (DSE) supports distributed scheduled task scheduling, supports internationally recognized time expressions (accurate to the minute), supports task execution lifecycle management and historical execution record query, and solves the single point of failure problem of traditional scheduled tasks.

[0046] The distributed scheduling engine is configured as shown in Table 1 below. The `corn` scheduler runs every 5 minutes and employs a load balancing strategy. If blocking occurs, new tasks are abandoned, allowing older tasks to continue executing. On failure, one retry is performed, with a timeout of 120 minutes.

[0047] Table 1

[0048] The internal logic of the distributed scheduling engine involves reading a data sequence time segment table in batches, where the batch size is configurable and can be controlled. It retrieves time sub-intervals from the table and then uses the execution start and end times as the basis for querying the creation time of the target data, as the creation time of the data is immutable. The engine queries every minute to obtain the target data corresponding to the time sub-intervals. The advantages of this approach are that it utilizes time indexes, avoids slow SQL queries, and places minimal pressure on the data source and business system memory, ensuring the stability of the business system.

[0049] In one embodiment, the data source includes a product center, a policy center, and a customer center. The product center centrally manages the company's product information and its lifecycle. It covers product definitions, attributes, prices, rules, new product launches / delistings, version control, etc., and is related to front-end product display and back-end business rules. The product center can interface with platforms such as the policy center, customer center, claims, underwriting, and risk control. The policy center centrally manages policy and lifecycle-related information. It covers the entire lifecycle from application, underwriting, payment, changes, claims, renewal to policy archiving. The customer center provides an external view and self-service entry point for customers (policyholders, insured persons, beneficiaries, potential customers, etc.). It integrates functions such as customer information, service requests, communication records, billing and payment, and product / policy inquiries.

[0050] In some embodiments of the present invention, reference is made to... Figure 5 Step S30, which involves reading data from at least two data sources using a distributed scheduling engine to obtain the target data, may include the following steps: S31: Use the distributed scheduling engine to read data from the product center using time sub-intervals to obtain product center data; S32: Use the distributed scheduling engine to read data from the policy center using time sub-intervals to obtain the policy center data; S33: The distributed scheduling engine reads data from the customer center using time sub-intervals to obtain customer center data; S34: Data fusion is performed based on product center data, policy center data, and customer center data to obtain target data.

[0051] For example, the process of reading data from various data sources using time sub-intervals can be represented as: `select * from tager_table a where a.created_date>'2025-05-12 15:00:00' and a.created_date<'2025-05-12 15:00:59' and is_deleted=0`. This SQL example means querying all fields from the table named `tager_table`, aliasing the table as `a`, and retaining only records whose creation time (`created_date`) falls within the specified time window, with the additional condition that the `is_deleted` field equals 0, typically indicating records that have not been deleted (records with a logical deletion flag of 1 are excluded).

[0052] It should be noted that when the distributed scheduling engine reads data from the data source, it can read data from the data table of the data source based on the time sub-interval and the table identifier (such as the table name) of the initial business table in order to obtain the target data.

[0053] As can be seen from the above scheme, the distributed scheduling engine can be used to read data from multiple data sources in parallel and to use multiple time sub-intervals for each data source in parallel, which greatly improves the data reading efficiency.

[0054] In step S30, after obtaining the target data, the target data is placed into the message queue.

[0055] Message Queues (MQ) are low-latency, high-concurrency, and highly available distributed message middleware built on RocketMQ. MQ can provide distributed application systems with asynchronous decoupling and peak-shaving capabilities, while also featuring massive message backlog, high throughput, and reliable retries.

[0056] The MQ sending and listening configuration is shown in Table 2. The sending method is synchronous sending, and the sending result is obtained in real time. The number of consumption batches is 4, the number of consumption threads is 4, and the maximum number of retries for failure is 3.

[0057] Table 2

[0058] When the distributed scheduling engine obtains the target data, it sends the target data via MQ, and the data format can be JSON.

[0059] In some embodiments of the present invention, the data sequence time segment table includes a read flag for each time sub-interval, which is initialized to unprocessed. After the target data is placed into the message queue, the data refresh method may further include: obtaining the creation time of the target data in the message queue, obtaining the data creation time, and using the time sub-interval where the data creation time is located as the target time sub-interval; updating the read flag of the target time sub-interval to processed. In this way, when the distributed scheduling engine reads data from the data source based on the time sub-intervals in the data sequence time segment table, it will not use the time sub-intervals with the read flag set to processed, reducing the risk of duplicate data reading.

[0060] S40: In response to detecting that the message queue is not empty, target field data for the time sub-interval is generated based on the target data in the message queue. Specifically, the application service of the business system listens to this MQ to realize the self-production and self-consumption of data. Business logic is processed in the consumption instance, which realizes peak smoothing and valley filling of data processing, avoiding the abnormal fluctuations of service CPU and memory caused by large-scale business processing occupying service threads and memory.

[0061] S50: Refresh the target fields based on the target field data across all time sub-intervals to obtain the target business table. Specifically, after MQ consumes data, the business table refresh task is executed. For example, the medical service benefits system needs to call the policy center interface to obtain static policy data, call the product center interface to obtain static service data from the product center, and call the customer center interface to obtain target data such as customer numbers from the customer center. Then, based on the target data, the target field data is reassembled, and the initial business table is updated to complete the data refresh.

[0062] As can be seen, in the above steps S10-S50, the creation time range is divided into multiple time sub-intervals, and the target data corresponding to each interval can be read independently and in parallel from the data source. Furthermore, a distributed scheduling engine schedules multiple data sources to concurrently execute reading tasks, theoretically increasing the throughput of data reading and initial filtering to several times the processing capacity of a single machine (limited by network bandwidth, database concurrency capabilities, downstream write capabilities, etc.). Moreover, the use of message queues decouples the "reading" and "target field data generation," allowing subsequent stages to be independently expanded and optimized. In summary, this invention improves the efficiency of data source reading, thereby improving data refresh efficiency.

[0063] In some embodiments of the present invention, after step S50, the data refresh method may further include: In response to the detection of abnormal data during the data refresh process, the system retrieves the abnormal record table of the initial business table; the abnormal record table includes the business identifier, retry flag, and retry count of the initial business table. In response to the retry flag being marked as allowed and the number of retries being less than a predetermined threshold, an abnormal retry is performed on the target field based on the abnormal data, and the retry count is incremented by 1; If an abnormal retry is detected and succeeded, the retry flag is updated to disable retrying.

[0064] Specifically, during the execution of a data refresh task, data anomalies may inevitably occur due to factors such as external call failures, system exceptions, or service shutdowns and restarts. In this process, the abnormal data will be recorded in an anomaly log table. The anomaly log table includes the following information: business identifier, retry flag (whether retries are allowed, N-no Y-yes), number of retries, whether it was deleted (an audit field, deletion is rare), creation time, creator, update time, and updater.

[0065] In one example, for the acquired abnormal data, the retry flag is used to determine whether to retry (Y). If it is Y, the number of retries is scheduled. If it is not less than a predetermined threshold (e.g., 3 times), the data refresh task is executed. If the execution is successful, the retry flag is updated to N. If it is unsuccessful, the number of retries is increased by 1.

[0066] As can be seen from the above scheme, by setting up an abnormal data handling and retry mechanism, the fault tolerance rate of the data refresh task is improved, thereby improving the data refresh success rate.

[0067] When applied to production operations, this invention offers the following beneficial effects: it significantly improves data refresh efficiency, potentially by as much as 10 times. In one scenario, slow SQL alerts decreased from 20 to 0, business system anomalies decreased from 5 to 0, batch execution volume and frequency became controllable, and both anomaly data handling and retry mechanisms were implemented. In summary, this invention achieves comprehensive improvements in various aspects, including data refresh efficiency, system stability during execution, database robustness, and anomaly data handling.

[0068] Furthermore, the method for large-scale data refresh based on scheduling engine and message queue provided by this invention enables the processing of 1 billion-level data, improves data refresh efficiency, business system robustness and fault tolerance, and reduces the occurrence of production accidents, slow SQL, and system problems.

[0069] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0070] In one embodiment, a data refresh device is provided, which corresponds one-to-one with the data refresh methods described in the above embodiments. For example... Figure 6 As shown, the data refresh device includes a time information acquisition module 101, a time segmentation module 102, a data reading module 103, a data consumption module 104, and a data refresh module 105. Detailed descriptions of each functional module are as follows: The time information acquisition module 101 is used to obtain the creation time information of the target field in the initial business table from the business system; The time segmentation module 102 is used to segment the data sequence according to the creation time information to obtain a data sequence time segment table, wherein the data sequence time segment table includes at least two time sub-intervals. The data reading module 103 is used to read data from at least two data sources through a distributed scheduling engine, obtain target data, and put the target data into a message queue; wherein the creation time of the target data is located within the time sub-interval. Data consumption module 104 is used to generate target field data for the time sub-interval based on the target data in the message queue in response to detecting that the message queue is not empty; The data refresh module 105 is used to refresh the target field based on the target field data of all the time sub-intervals to obtain the target business table.

[0071] In one embodiment, the time information acquisition module 101 is specifically used for: acquiring the initial business table of the business system; performing field detection on the initial business table to obtain target fields, wherein the target fields include fields with missing data and fields with data errors; and acquiring the creation time information of the target fields in the initial business table.

[0072] In one embodiment, the time information acquisition module 101 is specifically used to: acquire the creation time of the target field in the initial business table to obtain the field creation time point; expand forward based on the field creation time point to obtain the execution start time, and expand backward based on the field creation time point to obtain the execution end time; and merge the execution start time and the execution end time to obtain the creation time information.

[0073] In one embodiment, the execution start time has a time granularity of minutes, and the execution end time has a time granularity of minutes; the time segmentation module 102 is specifically used to: segment the creation time information according to minutes to obtain time segments; and pad the time granularity of the start time and the time granularity of the end time of the time segment to seconds to obtain the time sub-interval.

[0074] In one embodiment, after refreshing the target field based on the target field data of all the time sub-intervals to obtain the target business table, the data refresh device may further include: an exception retry module, configured to: in response to detecting abnormal data during the data refresh process, obtain an exception record table of the initial business table; wherein, the exception record table includes the business identifier, retry flag, and retry count of the initial business table; in response to the retry flag being set to allow retry and the retry count being less than a predetermined threshold, perform an exception retry on the target field based on the abnormal data and increment the retry count by 1; in response to detecting a successful exception retry, update the retry flag to disable retry.

[0075] In one embodiment, the data sequence time segment table includes a read flag for each time sub-interval, and the read flag is initialized to unprocessed; after the target data is placed into the message queue, the data refresh device may further include: a segment table update module, specifically used to: obtain the creation time of the target data in the message queue, obtain the data creation time, and take the time sub-interval where the data creation time is located as the target time sub-interval; update the read flag of the target time sub-interval to processed.

[0076] In one embodiment, at least two data sources include a product center, a policy center, and a customer center; the data reading module 103 is specifically used for: reading data from the product center using the time sub-interval through a distributed scheduling engine to obtain product center data; reading data from the policy center using the time sub-interval through a distributed scheduling engine to obtain policy center data; reading data from the customer center using the time sub-interval through a distributed scheduling engine to obtain customer center data; and performing data fusion based on the product center data, the policy center data, and the customer center data to obtain the target data.

[0077] This invention provides a data refresh device that divides the creation time range into multiple time sub-intervals, each of which can independently and in parallel read target data from the data source. Furthermore, a distributed scheduling engine schedules multiple data sources to concurrently execute reading tasks, theoretically increasing the throughput of data reading and initial filtering to several times the processing capacity of a single machine. Moreover, a message queue decouples the "reading" and "target field data generation" processes, allowing subsequent stages to be independently expanded and optimized. In summary, this invention improves the efficiency of data source reading, thereby improving data refresh efficiency.

[0078] Specific limitations regarding the data refresh device can be found in the limitations of the data refresh method described above, and will not be repeated here. Each module in the aforementioned data refresh device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in hardware or independently of the processor in the computer device, or stored in software in the memory of the computer device, so that the processor can call and execute the operations corresponding to each module.

[0079] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 7 As shown, the computer device includes a processor, memory, network interface, and database connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile and / or volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The network interface is used to communicate with external clients via a network connection. When the computer program is executed by the processor, it implements a data refresh method, server-side function, or step.

[0080] In one embodiment, a computer device is provided, which may be a client, and its internal structure diagram may be as follows: Figure 8 As shown, the computer device includes a processor, memory, network interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The network interface is used to communicate with an external server via a network connection. When the computer program is executed by the processor, it implements a data refresh method or client-side function or step.

[0081] In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to perform the following steps: Retrieve the creation time information of the target field from the initial business table in the business system; The data sequence time segment table is obtained by segmenting the data based on the creation time information, wherein the data sequence time segment table includes at least two time sub-intervals; The target data is obtained by reading data from at least two data sources through a distributed scheduling engine, and the target data is placed into a message queue; wherein the creation time of the target data is located within the time sub-interval. In response to detecting that the message queue is not empty, target field data for the time sub-interval is generated based on the target data in the message queue; The target field is refreshed based on the target field data of all the time sub-intervals to obtain the target business table.

[0082] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, the computer program performing the following steps when executed by a processor: Retrieve the creation time information of the target field from the initial business table in the business system; The data sequence time segment table is obtained by segmenting the data based on the creation time information, wherein the data sequence time segment table includes at least two time sub-intervals; The target data is obtained by reading data from at least two data sources through a distributed scheduling engine, and the target data is placed into a message queue; wherein the creation time of the target data is located within the time sub-interval. In response to detecting that the message queue is not empty, target field data for the time sub-interval is generated based on the target data in the message queue; The target field is refreshed based on the target field data of all the time sub-intervals to obtain the target business table.

[0083] It should be noted that the functions or steps that can be implemented by the computer-readable storage medium or computer device described above can be referred to the relevant descriptions on the server side and client side in the foregoing method embodiments. To avoid repetition, they will not be described one by one here.

[0084] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0085] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above.

[0086] It should be noted that any AI models, software tools, or components not belonging to this company appearing in the embodiments of this application are merely illustrative examples and do not represent actual use. All user personal information involved in the embodiments of this application has been authorized (with the knowledge and consent) by the relevant parties or has been fully authorized by all parties, and the executing entity may obtain it through various legal and compliant means. The collection, storage, use, processing, transmission, provision, and disclosure of the information, data, and signals involved all comply with relevant laws and regulations and do not violate public order and good morals.

[0087] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A data refresh method, characterized in that, The method includes: Retrieve the creation time information of the target field from the initial business table in the business system; The data sequence time segment table is obtained by segmenting the data based on the creation time information, wherein the data sequence time segment table includes at least two time sub-intervals; The target data is obtained by reading data from at least two data sources through a distributed scheduling engine, and the target data is placed into a message queue; wherein the creation time of the target data is located within the time sub-interval. In response to detecting that the message queue is not empty, target field data for the time sub-interval is generated based on the target data in the message queue; The target field is refreshed based on the target field data of all the time sub-intervals to obtain the target business table.

2. The method according to claim 1, characterized in that, The step of obtaining the creation time information of the target field in the initial business table from the business system includes: Obtain the initial business tables from the business system; The initial business table is subjected to field detection to obtain target fields, wherein the target fields include fields with missing data and fields with data errors; Obtain the creation time information of the target field in the initial business table.

3. The method according to claim 2, characterized in that, The step of obtaining the creation time information of the target field in the initial business table includes: Obtain the creation time of the target field in the initial business table to get the field creation time point; Based on the creation time point of the field, the execution start time is obtained by extending forward, and based on the creation time point of the field, the execution end time is obtained by extending backward. The creation time information is obtained by merging the execution start time and the execution end time.

4. The method according to claim 3, characterized in that, The execution start time has a time granularity of minutes, and the execution end time has a time granularity of minutes; The step of segmenting the data based on the creation time information to obtain a data sequence time segment table includes: The creation time information is segmented into minutes to obtain time segments; The time granularity of the start time and the time granularity of the end time of the time segment are padded to the second to obtain the time sub-interval.

5. The method according to claim 3, characterized in that, After refreshing the target field based on the target field data of all the time sub-intervals to obtain the target business table, the method further includes: In response to the detection of abnormal data during the data refresh process, an abnormal record table of the initial business table is obtained; wherein, the abnormal record table includes the business identifier, retry flag, and retry count of the initial business table; In response to the retry flag being set to allow retry and the number of retries being less than a predetermined threshold, an abnormal retry is performed on the target field based on the abnormal data, and the number of retries is incremented by 1; In response to the detection of an abnormal retry success, the retry flag is updated to prohibit retrying.

6. The method according to any one of claims 1 to 5, characterized in that, The data sequence time segment table includes a read flag for each of the time sub-intervals, and the read flag is initialized to unprocessed; After placing the target data into the message queue, the method further includes: Obtain the creation time of the target data in the message queue, get the data creation time, and take the time sub-interval where the data creation time is located as the target time sub-interval; Update the read flag of the target time sub-interval to "processed".

7. The method according to any one of claims 1 to 5, characterized in that, At least two data sources are required, including the product center, policy center, and customer center; The process of reading data from at least two data sources using a distributed scheduling engine to obtain the target data includes: The product center data is obtained by reading data from the product center using the time sub-interval through the distributed scheduling engine; The policy center data is obtained by reading data from the policy center using the time sub-interval through the distributed scheduling engine; The customer center data is obtained by reading data from the customer center using the time sub-interval through the distributed scheduling engine; The target data is obtained by fusing the product center data, the policy center data, and the customer center data.

8. A data refresh device, characterized in that, The device includes: The time information acquisition module is used to obtain the creation time information of the target field in the initial business table from the business system; The time segmentation module is used to segment the data sequence according to the creation time information to obtain a data sequence time segment table, wherein the data sequence time segment table includes at least two time sub-intervals. The data reading module is used to read data from at least two data sources through a distributed scheduling engine, obtain target data, and put the target data into a message queue; wherein the creation time of the target data is located within the time sub-interval. The data consumption module is used to generate target field data for the time sub-interval based on the target data in the message queue in response to the detection that the message queue is not empty; The data refresh module is used to refresh the target field based on the target field data of all the time sub-intervals to obtain the target business table.

9. A computer device, characterized in that, The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method according to any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 7.