A service call management method based on Flink and ClickHouse
By using Flink and ClickHouse service call management methods, combined with data processing from Kafka, HBase, and ClickHouse, the problem of lack of monitoring for internal and external service calls was solved. Rule limits and automatic alerts were implemented for the number of calls and usage, improving the efficiency and security of service call management.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FOCUS TECH
- Filing Date
- 2023-06-29
- Publication Date
- 2026-06-12
AI Technical Summary
In existing technologies, the lack of unified monitoring and management of internal services and external third-party services leads to the inability to effectively identify callers, count usage, and trigger alarms, resulting in the risk of economic losses and difficulty in implementing circuit breaker mechanisms.
The approach combines Flink and ClickHouse. Flink processes service call information in real time, Kafka is used for data transmission, and HBase and ClickHouse are used for data storage and analysis. Alarms and circuit breakers are implemented based on usage limit rules, and FineReport is used for visualization.
It implements rule-based restrictions on the number and usage of internal and external service calls, automatic alarms and circuit breakers, reduces economic losses, improves the efficiency of service call traffic management, and supports rapid location and resolution of problems.
Smart Images

Figure CN116644138B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of big data risk control, and in particular to a service call management method based on Flink and ClickHouse. Background Technology
[0002] Flink is currently the only open-source distributed stream processing framework that combines high throughput, low latency, and high performance. Flink is a framework and distributed processing engine for stateful computation on unbounded and bounded data streams. It can run in most common cluster environments and perform computations at memory speeds and arbitrary scales. Flink can scale to thousands of cores, with state reaching terabyte levels, while still maintaining high throughput and low latency. Stream processing applications are widely used. Apache Flink is a framework and distributed processing engine for stateful computation on unbounded and bounded data streams. Flink can run in all common cluster environments and perform computations at memory speeds and arbitrary scales. Apache Flink is powerful and supports the development and running of many different types of applications. Its key features include batch-stream integration, precise state management, event-time support, and exactly-once state consistency guarantees. Flink can run on various resource management frameworks, including YARNQ, Mesos, and Kubernetes, and also supports independent deployment on bare-metal clusters. With high availability options enabled, it eliminates the single point of failure issue. Flink has proven capable of scaling to thousands of cores and reaching terabyte-scale states while maintaining high throughput and low latency. Many demanding streaming applications worldwide run on Flink.
[0003] ClickHouse is an open-source database management system (DBMS) for online analytical processing (OLAP). ClickHouse is fast because it uses a parallel processing mechanism, where even a single query will be executed using half of the server's CPU. Therefore, ClickHouse cannot support high-concurrency use cases. However, it is a columnar database management system (DBMS) for online analytical processing (OLAP).
[0004] CN2021113090152 discloses a method and apparatus for efficiently synchronizing real-time data to ClickHouse based on Flink. The method queries the link list of all shards in the ClickHouse cluster, finds the machine corresponding to the corresponding shard based on preset sharding rules, and writes the changed data to the local table of the corresponding shard machine. Existing Flink does not support writing data to local tables according to sharding rules; writing data to a distributed table and then synchronizing it to the local table results in significant performance issues. This method provides a technical means to directly write data to the local table, avoiding the need for writing through a distributed table, thus greatly improving write speed and eliminating data consistency problems.
[0005] Within a company's development department, there are numerous scenarios where internal services call external third-party services or where internal services call each other. For example, the company's translation service (internal service) might call Google Translate (third-party service) or Baidu Translate (third-party service), or internal service A might call internal service B. Without unified monitoring and management, several pain points arise, such as: lack of call source records (unclear who is calling the third-party or internal service); lack of usage details (unclear usage amount); lack of alerting mechanisms (unclear when thresholds are exceeded); and lack of circuit breaker mechanisms (unclear when usage exceeds limits). These risks include: economic losses due to delayed detection of excessive third-party usage; difficulty in monitoring and blocking malicious calls; difficulty in statistically analyzing call volume, frequency, and trends, making it impossible to identify and mitigate risks in advance; and the lack of a unified management process and alerting mechanism for callers and callees hinders rapid and effective decision-making.
[0006] Therefore, a service call management method is needed to impose rules and control risks on the number of calls and usage of internal services and external third-party services, automatically issue alerts or circuit breakers for callers that exceed the threshold, and notify the relevant managers to help them quickly and accurately locate the problem and follow up on the solution. Summary of the Invention
[0007] The technical problem this invention aims to solve is to implement risk control over the frequency and usage of existing internal and external service calls within a company, providing a service call management method based on Flink and ClickHouse. It also utilizes Kafka, a distributed, publish / subscribe-based messaging system.
[0008] To address the aforementioned technical issues, this invention provides a service call management method based on Flink and ClickHouse. Flink processes each service call in real time and determines whether a threshold has been exceeded based on usage limit rules configured by administrators, thus triggering an alert. Simultaneously, data is integrated into ClickHouse and pushed to FineReport for real-time monitoring of call statistics up to the current stage. The method specifically includes the following steps:
[0009] Step 1: Before calling internal or external services, developers first connect to the unified open service platform, then call the required services, recording metadata of the call requests and return results to form a call log information, which is then reported to Kafka. Kafka can process all action stream data from consumers on the website. These actions (web browsing, searching, and other user actions) are a key factor in many social functions on the modern web. This data is handled through log processing and log aggregation due to throughput requirements.
[0010] Step 2: Obtain data from the Kafka topic that records call log information, and use this data as the data source for our service call analysis and management. Real-time and offline processing are performed in parallel, with different consumer groups consuming the same batch of data from the topic. Topic is a communication intermediary between message publishers (Pub) and subscribers (Sub). Devices can send and receive messages through Topics, thereby enabling communication between the server and the device.
[0011] Step 3: Consume the call log data from Kafka in real time using Flink, and lightly aggregate the data into intermediate data according to business rules before storing it in HBase. HBase It is a distributed, column-oriented open-source database.
[0012] Step 4: Based on the rules set in the usage (data usage traffic) limit rule table, determine whether the alarm or circuit breaker threshold has been exceeded. If it has been exceeded, write a message to the alarm / circuit breaker topic to trigger the alarm or circuit breaker, remind administrators and callers via email, and automatically limit or breaker the service.
[0013] Step 5: Create a Kafka engine table for ClickHouse, configure the data source Topic from Step 2, and persist the Kafka data source data as ClickHouse table data for the MergeTree engine through materialized views. MergeTree The (Merged Tree) series table engine is the most distinctive storage engine provided by ClickHouse; the overall process is as follows: Figure 2 As shown;
[0014] Step 6: Based on the meaning of the business rules, use ClickHouse materialized views to pre-aggregate data, use FineReport software, and develop reports in FineReport to visualize the current service call status, so that relevant personnel can clearly observe the number and trend of calls;
[0015] In step 1, it is briefly described where the most important data source (call record flow information) for service call control in this invention comes from.
[0016] In step 2, "real-time processing" refers to processing the Kafka data source using Flink and HBase to implement business logic, sending the call records that trigger alarms / circuit breakers to the alarm / circuit breaker topic, and alerting relevant personnel to handle the issue. "Offline processing" refers to connecting the Kafka data source to ClickHouse, performing pre-aggregation based on business logic using materialized views, and finally displaying the data through FineReport. Both processes use the same data source and operate on two parallel processing lines.
[0017] In step 3, data from multiple topics on different sites is consumed, and the data format is consistent. HBase is used as the intermediate result storage, and the key-value table in HBase is updated every time a Kafka message arrives.
[0018] In step 4, the service call and usage limit rule table is configured by the administrator. It includes multiple analysis dimensions, such as call type dimension, internal resource dimension, third-party service dimension, terminal identifier dimension, period dimension, period offset, etc., effectively covering all currently foreseeable statistical dimensions.
[0019] Step 5 describes the offline processing part, where the data flow is as follows: Kafka calls to record the streaming data to the Kafka engine table in ClickHouse; the data from the previous step is synchronized to the MergeTree engine table in ClickHouse through a materialized view.
[0020] In step 6, based on step 5, a materialized view is created according to business rules, and the data in the ClickHouse MergeTree engine table is aggregated and pointed to the next-level AggregatingMergeTree engine table; the ClickHouse Dictionary engine is used to construct a dictionary table to perform some ID and NAME conversions; combining the results of steps 6-1 and 6-2, the data is imported into FineReport and a report is generated for display.
[0021] On the one hand, this invention consumes internal and third-party call information data from Kafka (which processes all action stream data in a website) in real time via Flink. According to business rules, the data is lightly aggregated into intermediate data and stored in HBase. Then, based on the rules configured in the usage limit rule table, it determines whether the threshold for alarms or circuit breakers has been exceeded. If it has, a message is written to the Topic to trigger alarms or circuit breakers, and administrators and callers are notified via email or other means. On the other hand, ClickHouse and FineReport visualize the current service call status, allowing relevant personnel to clearly observe the number and trend of calls.
[0022] Beneficial Effects: This invention implements rule-based restrictions and risk control on the number of calls and usage of internal services and external third-party services. It abstracts dimensions such as caller, callee, and resources to form usage rules. Based on these rules, traffic analysis is performed; if the alarm threshold is exceeded, an alert is sent to the administrator. Similarly, if the circuit breaker threshold is exceeded, the system immediately suspends the caller and sends an alert to the administrator. Combined with reporting tools, traffic data is displayed from multiple dimensions. This strengthens service call traffic management, automatically alerting or suspending callers exceeding thresholds. Administrators can view the current call volume and trends at any time, enabling rapid and accurate identification of problems in the event of emergencies and reducing economic losses caused by over-limit calls. Attached Figure Description
[0023] Figure 1 This is a schematic diagram of the method flow of an exemplary embodiment of the present invention.
[0024] Figure 2 Flowcharts for steps 5 and 6;
[0025] Figure 3 Flowchart for creating the Kafka engine table for ClickHouse in this invention. Detailed Implementation
[0026] The following is in conjunction with the appendix Figure 1 Further description of the present invention:
[0027] like Figure 1 This paper presents a service call management method based on Flink and ClickHouse. Flink processes each service call in real time, determines whether a threshold is exceeded based on the usage limit rules configured by the administrator, and issues an alert accordingly. The data is then integrated into ClickHouse and pushed to FineReport for real-time monitoring of call statistics up to the current point in time. The method includes the following steps:
[0028] Step 1: Before calling internal or external services, developers first connect to the unified open service platform, then call the required services, and record the metadata of the call request and return result (including: service number, business number, application number, merchant number, developer number, end user identifier, client version, request time, execution duration, end time, usage unit, usage value, request sequence number, etc.), forming a call record log information, which is then reported to Kafka.
[0029] Step 2: Obtain the data from the Topic that records call log information in Kafka, and use it as the data source for our service call analysis and management. Real-time and offline processing are carried out in parallel, and two consumer groups (one Flink and one ClickHouse) consume the data from the same Topic.
[0030] Step 3: Consume the call log data in Kafka in real time using Flink (which corresponds one-to-one with the metadata in Step 1), and store the intermediate results in HBase;
[0031] Step 3-1: Consume data from multiple topics across different sites (all with the same data format). Perform a KeyBy operation according to specific business rules (grouping data with the same call type, merchant ID, developer ID, application ID, service ID, third-party service ID, and user terminal identifier into the same partition as a prerequisite for generating the key in the HBase table); Flink's... keyby The operator's function is to send data with the same key to a partition (i.e., a subtask), using a hash partitioning method.
[0032] Step 3-2: According to business rules, the data is lightly aggregated into intermediate data and stored in HBase (for each call record, the KeyBy in Step 3-1 plus the hour segment where the call ended will be used as the HBase Key, and the calculation results of the number of calls and usage will be used as the HBase Value). Every time a Kafka message arrives, the KV table in HBase will be updated.
[0033] Step 4: Based on the rules in the usage limit rule table, first scan out the data within the data period configured in a certain rule and calculate the cumulative number of calls and usage. Then calculate the limit amount multiplied by the threshold ratio in the limit rule table to obtain the final threshold of this matching rule. Compare the two to determine whether the threshold is exceeded and perform corresponding alarm or circuit breaker processing.
[0034] Step 4-1: Taking into account factors such as call type, internal resources, third-party services, terminal identifier, period, and period offset, determine whether the alarm or circuit breaker threshold has been exceeded.
[0035] Step 4-2: If the limit is exceeded, write the message to the alarm / circuit break topic to trigger an alarm or circuit breaker;
[0036] Step 4-3: Automatically limit or circuit breakers for the service and notify administrators and users via email or other means.
[0037] Step 5: Connect to the data source in Step 2 through ClickHouse's Kafka table engine, and persist the Kafka data source data as ClickHouse table data in the MergeTree engine through materialized views.
[0038] Step 5-1: Kafka calls to record the streaming data and sends it to the Kafka engine table in ClickHouse;
[0039] Step 5-2: Synchronize the data from the previous step to the ClickHouse MergeTree engine table using materialized views.
[0040] Step 6: Based on the meaning of the business rules, use ClickHouse materialized views to pre-aggregate data and visualize the current service call status in FineReport, so that relevant personnel can clearly observe the number and trend of calls;
[0041] Step 6-1: Based on business rules, create a materialized view to aggregate the data in the ClickHouse MergeTree engine table and point it to the next level AggregatingMergeTree engine table.
[0042] Step 6-2: Use ClickHouse's Dictionary engine to construct a dictionary table and perform some ID and NAME conversions;
[0043] Step 6-3: Combining the results of Step 6-1 and Step 6-2, import the data into FineReport and create a report for display.
[0044] This invention discloses a service call management method based on Flink and ClickHouse. Flink processes each service call in real time, determines whether a threshold is exceeded based on usage limit rules configured by administrators, and issues alerts accordingly. Data is also integrated into ClickHouse and pushed to FineReport for real-time monitoring of call statistics up to the current timeline. This significantly enhances traffic control for service calls, automatically alerting or triggering circuit breakers for callers exceeding thresholds. Administrators can view the current call volume and trends at any time, proactively avoiding potential overuse. In the event of unexpected problems, it also allows for quick and accurate problem location and resolution, reducing economic losses caused by overuse.
[0045] Example flow for steps 5 and 6 is as follows Figure 3 .
[0046] The above embodiments are not intended to limit the present invention in any way. Any other improvements and applications made to the above embodiments by equivalent transformations shall fall within the protection scope of the present invention.
Claims
1. A service call management method based on Flink and ClickHouse, characterized in that, Each service call is processed in real time using Flink, and alerts are triggered based on usage limit rules configured by administrators to determine if thresholds are exceeded. Simultaneously, the data is integrated into ClickHouse and pushed to FineReport for real-time monitoring of call statistics up to the current timeframe. The process includes the following steps: Step 1: Before calling internal or external services, developers first connect to the unified open service platform, then call the required services, record the metadata of the call requests and return results to form call log information, and report it to Kafka; Kafka processes all action stream data of consumers on the website; this data is handled by processing logs and log aggregation due to throughput requirements; Step 2: Obtain data from the Topic that records call log information in Kafka, and use it as the data source for service call analysis and management; process data in parallel with real-time and offline processing, and use different consumer groups to consume the same batch of data from the Topic; Topic is Transmission intermediary between message publishers and subscribers The device sends and receives messages through Topics, thereby enabling communication between the server and the device. Real-time processing refers to processing and computing the Kafka data source using Flink and HBase to implement business logic, sending alarm / circuit breaker trigger call records to the alarm / circuit breaker Topic, and alerting relevant personnel for processing. Offline processing refers to connecting the Kafka data source to ClickHouse, pre-aggregating data according to business logic using materialized views, and finally displaying the results through FineReport. The same data source is used, and two parallel processing lines are employed. Step 3: Consume the call log data from Kafka in real time using Flink, and lightly aggregate the data into intermediate data according to business rules before storing it in HBase. HBase It is a distributed, column-oriented open-source database; Step 4: Based on the usage and data usage rules in the traffic limiting rule table, determine whether the alarm or circuit breaker threshold has been exceeded. If it has, write a message to the alarm / circuit breaker topic to trigger the alarm or circuit breaker, notify the administrators and callers via email, and automatically limit or breaker the service. Step 5: Configure the data source Topic from Step 2 using ClickHouse's Kafka table engine, and persist the Kafka data source data as ClickHouse table data using the MergeTree engine through materialized views. MergeTree The series table engine is a storage engine provided by ClickHouse; Step 6: Based on the meaning of the business rules, use ClickHouse materialized views to pre-aggregate data, use FineReport software, and develop reports in FineReport to visualize the current service call status, so that relevant personnel can clearly observe the number and trend of calls.
2. The service call management method based on Flink and ClickHouse according to claim 1, characterized in that, In step 3, step 3-1: consume data from multiple topics on different sites, and first perform KeyBy operations according to certain business rules; Flink's... keyby The operator's function is to send data with the same key to a partition, using a hash partitioning method; Step 3-2: According to business rules, the data is lightly aggregated into intermediate data and stored in HBase. For each call record, the KeyBy in Step 3-1 plus the hour segment where the call ended will be used as the HBase Key, and the calculation results of the number of calls and usage will be used as the HBase Value. Every time a Kafka message arrives, the KV table in HBase will be updated.
3. The service call management method based on Flink and ClickHouse according to claim 1, characterized in that, In step 4: Based on the rules in the usage limit rule table, first scan the data within the data period configured in a certain rule and calculate the cumulative number of calls and usage. Then, calculate the limit amount multiplied by the threshold ratio in the limit rule table to obtain the final threshold of this matching rule. Compare the two to determine whether the threshold is exceeded and take corresponding alarm or circuit breaker actions. The service call and usage limit rule table is configured by the administrator and includes multiple analysis dimensions, including call type dimension, internal resource dimension, third-party service dimension, terminal identifier dimension, period dimension, and period offset, covering all currently foreseeable statistical dimensions. Step 4-1: Taking into account the call type dimension, internal resource dimension, third-party service dimension, terminal identifier dimension, period dimension, and period offset, determine whether the alarm or circuit breaker threshold is exceeded. Step 4-2: If the limit is exceeded, write the message to the alarm / circuit break topic to trigger an alarm or circuit breaker; Step 4-3: Automatically limit or circuit breakers for the service and notify administrators and users via email.
4. The service call management method based on Flink and ClickHouse according to claim 1, characterized in that, Step 5 describes the offline processing part, where the data flow is as follows: Kafka calls to record the streaming data to the Kafka engine table in ClickHouse; the data from the previous step is synchronized to the MergeTree engine table in ClickHouse through materialized views. Step 5-1: Kafka calls to record the streaming data and sends it to the Kafka engine table in ClickHouse; Step 5-2: Synchronize the data from the previous step to the ClickHouse MergeTree engine table using materialized views.
5. The service call management method based on Flink and ClickHouse according to claim 4, characterized in that, In step 6, based on step 5, a materialized view is created according to business rules, and the data in the ClickHouse MergeTree engine table is aggregated and pointed to the next-level AggregatingMergeTree engine table; the ClickHouse Dictionary engine is used to construct a dictionary table to perform some ID and NAME conversions. Step 6-1: Based on business rules, create a materialized view to aggregate the data in the ClickHouse MergeTree engine table and point it to the next level AggregatingMergeTree engine table. Step 6-2: Use ClickHouse's Dictionary engine to construct a dictionary table and perform some ID and NAME conversions; Step 6-3: Combining the results of Step 6-1 and Step 6-2, import the data into FineReport and create a report for display.