Data processing methods, devices, equipment and media
By adjusting the number of data processing containers and partitions through a container orchestrator, the problem of needing to stop operation to adjust parameters in existing technologies is solved, and efficient data volume adjustment and processing of the streaming data processing platform is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA UNITED NETWORK COMM GRP CO LTD
- Filing Date
- 2022-08-15
- Publication Date
- 2026-06-30
AI Technical Summary
Existing data processing methods require stopping the platform and manually adjusting parameters when adjusting the amount of data processed by the streaming data processing platform, resulting in low data processing efficiency.
The number of partitions in the data processing container and the source topic is adjusted to be consistent by the container orchestrator. The producer of the Kafka Streaming data processing platform writes data to different partitions. The data processing container obtains and processes the partition data, and finally the consumer stores it into the target topic, so as to achieve dynamic adjustment of the data processing volume.
The data volume can be adjusted without stopping the streaming data processing platform, thus improving data processing efficiency.
Smart Images

Figure CN115344587B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of big data, and in particular to a data processing method, apparatus, device, and medium. Background Technology
[0002] With the development of technology, the amount of data generated in various fields is increasing, and how to analyze and process this data more quickly has become a concern. To address this issue, streaming data processing platforms have emerged.
[0003] In existing technologies, due to the high throughput characteristics of streaming data processing platforms, these platforms can quickly process and send data after acquisition. Before operation, staff configure the platform's parameters, resulting in a limited data processing capacity for each session. When the data volume is small, to avoid resource waste, the platform must be stopped and parameters modified; similarly, when the data volume is large, to improve processing efficiency, the platform must be stopped and parameters modified again.
[0004] In summary, existing data processing methods, when it is necessary to adjust the amount of data processed by the streaming data processing platform, can only stop the platform from running and then manually adjust the corresponding parameters before data processing can proceed, resulting in low data processing efficiency. Summary of the Invention
[0005] This application provides a data processing method, apparatus, device, and medium to address the problem that existing data processing methods, when needing to adjust the amount of data processed by a streaming data processing platform, can only stop the operation of the streaming data processing platform and then manually adjust the corresponding parameters to achieve data processing, resulting in low data processing efficiency.
[0006] Firstly, this application provides a data processing method, including:
[0007] The container orchestrator adjusts the number of data processing containers and the number of partitions in the source topic of the Kafka Stream data processing platform within the data production containers, and makes the number of partitions the same as the number of data processing containers;
[0008] The producers in the Kafka Stream data processing platform write the acquired raw business data into different partitions of the source topic to obtain partition data.
[0009] Each of the data processing containers obtains partition data from different partitions in the source topic;
[0010] The data processing container processes the partitioned data according to the configured Kafka streaming data processing library to obtain target business data, and stores the target business data in the target topic of the Kafka streaming data processing platform.
[0011] In one specific implementation, each of the data processing containers obtains partition data from different partitions of the source topic, including:
[0012] For each data processing container, the data processing container acquires all partition data in the source topic;
[0013] For each partition data entry, if the remainder of the hash value of the partition data divided by the number of partitions is not equal to the partition number corresponding to the data processing container, then the partition data is deleted.
[0014] In one specific implementation, each of the data processing containers obtains partition data from different partitions of the source topic, including:
[0015] For each data processing container, the data processing container obtains the partition data in the partition corresponding to the partition number.
[0016] In one specific implementation, the producer in the Kafka stream data processing platform writes the acquired raw business data into different partitions of the source topic to obtain partition data, including:
[0017] For each piece of raw business data obtained by the producer, the partition number is determined based on the hash value of the raw business data and the number of partitions;
[0018] The original business data is written into the partition corresponding to the partition number in the source topic to obtain the partition data.
[0019] In one specific embodiment, the method further includes:
[0020] The consumer of the Kafka Stream data processing platform acquires the target topic and stores the target business data in the target topic into the database. The consumer is configured in the data storage container.
[0021] Secondly, this application provides a data processing apparatus, comprising:
[0022] An orchestration module is used to adjust the number of data processing containers and the number of partitions of the source topic in the Kafka Stream data processing platform in the data production container through a container orchestrator, and to make the number of partitions the same as the number of data processing containers.
[0023] The storage module is used to write the raw business data obtained by the producer in the Kafka Stream data processing platform into different partitions of the source topic to obtain partition data;
[0024] An acquisition module is used to acquire partition data of different partitions from the source topic through each of the data processing containers;
[0025] The processing module is used to process the partitioned data through the data processing container according to the configured Kafka streaming data processing library to obtain target business data, and store the target business data into the target topic of the Kafka streaming data processing platform.
[0026] In one specific embodiment, the acquisition module is specifically used for:
[0027] For each data processing container, the data processing container acquires all partition data in the source topic;
[0028] For each partition data entry, if the remainder of the hash value of the partition data divided by the number of partitions is not equal to the partition number corresponding to the data processing container, then the partition data is deleted.
[0029] In one specific embodiment, the acquisition module is specifically used for:
[0030] For each data processing container, the data processing container obtains the partition data in the partition corresponding to the partition number.
[0031] In one specific embodiment, the storage module is specifically used for:
[0032] For each piece of raw business data obtained by the producer, the partition number is determined based on the hash value of the raw business data and the number of partitions;
[0033] The original business data is written into the partition corresponding to the partition number in the source topic to obtain the partition data.
[0034] In one specific embodiment, the storage module is further configured to:
[0035] The target topic is obtained by the consumer of the Kafka Stream data processing platform, and the target business data in the target topic is stored in the database. The consumer is configured in the data storage container.
[0036] Thirdly, this application provides an electronic device, comprising:
[0037] Processor, memory, communication interface;
[0038] The memory is used to store the executable instructions of the processor;
[0039] The processor is configured to execute the data processing method described in any of the first aspects by executing the executable instructions.
[0040] Fourthly, this application provides a readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the data processing method described in any of the first aspects.
[0041] The data processing method, apparatus, device, and medium provided in this application embodiment adjust the number of partitions and the number of data processing containers in the source topic, ensuring that the number of partitions and the number of data processing containers are the same. In the Kafka streaming data processing platform, producers write raw business data into different partitions of the source topic. Each data processing container retrieves the raw data from its corresponding partition, processes it to obtain the target business data, and stores it in the target topic. This solution adjusts the amount of data processed by the streaming data processing platform by adjusting the number of partitions and the number of data processing containers in the source topic, without stopping the platform's operation, effectively improving data processing efficiency. Attached Figure Description
[0042] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0043] Figure 1 A flowchart illustrating an embodiment of the data processing method provided in this application;
[0044] Figure 2 A flowchart illustrating Embodiment 2 of the data processing method provided in this application;
[0045] Figure 3 A schematic diagram of the overall flow of the data processing method provided in this application;
[0046] Figure 4 This is a schematic diagram of the structure of an embodiment of the data processing apparatus provided in this application;
[0047] Figure 5 This is a schematic diagram of the structure of an electronic device provided in this application. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments made by those skilled in the art under the guidance of these embodiments are within the scope of protection of this application.
[0049] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a particular order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0050] With the rapid development of technology, the amount of data generated by various industries is increasing rapidly, and all of this data is valuable, so it is necessary to process and analyze it.
[0051] In existing technologies, when using streaming data processing platforms to process data, operators need to set parameters before using the platform to determine the upper limit of the data volume it can handle. However, in reality, the amount of data is not constant. When the data volume is small, to avoid wasting resources, the streaming data processing platform must be stopped and the parameters modified. When the data volume is large, to improve data processing efficiency, the platform must be stopped and the parameters modified again, resulting in low data processing efficiency.
[0052] To address the problems existing in current technologies, the inventors, during their research on data processing methods, discovered that streaming data processing platforms can be containerized. Containers, similar to virtual machines, are isolated from each other and can publicly provide services. Containers consume fewer resources than virtual machines and physical machines. Virtual machines typically simulate a complete hardware and software environment, while containers only simulate the software environment, thus consuming fewer resources. The producers and source topics of the Kafka streaming data processing platform can be configured as services provided by the data production container; the Kafka streaming data processing library can be configured as a service provided by the data processing container; and the consumers of the Kafka streaming data processing platform can be configured as services provided by the data storage container.
[0053] Because each data processing container has a limited capacity to process data, the amount of data the streaming data processing platform can handle can be adjusted by changing the number of data processing containers. During the operation of the Kafka streaming data processing platform, the number of data processing containers can be adjusted using a container orchestrator. The container orchestrator adjusts the number of partitions in the source topic of the Kafka streaming data processing platform and the number of data processing containers within the data production containers, based on a preset data volume adjustment file. Producers in the data production containers then write the acquired raw business data into different partitions of the source topic. Each data processing container acquires the business data from its corresponding partition, processes it to obtain the target business data, and then stores the target business data in the target topic. Consumers in the data storage containers acquire the target business data from the target topic and store it in the database, effectively improving data processing efficiency. Based on the above inventive concept, the data processing scheme in this application was designed.
[0054] The container orchestrator, data production container, data processing container, and data storage container in the data processing method of this application can run on different devices or on the same device. This application does not limit these operations and the choice can be made according to actual circumstances. The following description uses the example of the container orchestrator, data production container, data processing container, and data storage container running on the same device.
[0055] In this application, the devices that run the container orchestrator, data production container, data processing container, and data storage container in the data processing method can be servers, computers, terminal devices, etc. This application does not limit them and can be selected according to the actual situation.
[0056] The application scenarios of the data processing method provided in the embodiments of this application will be described below.
[0057] For example, in this application scenario, in order to improve the efficiency of data processing, the Kafka Streaming Data Processing Platform needs to be containerized. The producers and source topics of the Kafka Streaming Data Processing Platform can be configured in the data production container; the Kafka Streaming Data Processing Library can be configured in the data processing container; and the consumers of the Kafka Streaming Data Processing Platform can be configured in the data storage container.
[0058] When it is necessary to adjust the amount of data processed by the Kafka Stream Data Processing Platform, there is no need to stop the operation of the Kafka Stream Data Processing Platform. The staff can write the adjusted number of data processing containers into the preset data volume adjustment file. Then, the container orchestrator can adjust the number of partitions of the source theme of the Kafka Stream Data Processing Platform and the number of data processing containers in the data production container according to the number of data processing containers in the preset data volume adjustment file, so that the number of partitions and the number of data processing containers are the same.
[0059] Subsequently, after the producer in the data production container obtains the raw business data, it writes the raw business data into different partitions of the source topic, obtaining partition data. The data processing container can then obtain the partition data in the corresponding partition, so that one data processing container processes the business data in one partition, thus accelerating the efficiency of data processing.
[0060] After processing the business data according to the data processing methods in the Kafka Streaming Data Processing Library, the data processing container obtains the target business data and stores it in the target topic of the Kafka Streaming Data Processing Platform.
[0061] Consumers in the data storage container retrieve the target topic and store the target business data in the target topic into the database.
[0062] It should be noted that the above scenario is only an illustration of an application scenario provided by the embodiments of this application. The embodiments of this application do not limit the actual form of the various devices included in the scenario, nor do they limit the interaction method between devices. In the specific application of the solution, it can be set according to actual needs.
[0063] The technical solution of this application will now be described in detail through specific embodiments. It should be noted that the following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.
[0064] Figure 1This is a flowchart illustrating an embodiment of the data processing method provided in this application. This embodiment adjusts the number of partitions in the source topic and the number of data processing containers in the container orchestrator. Then, after the producer obtains the raw business data, it stores the raw business data into different partitions of the source topic. The data processing containers obtain the business data from the corresponding partitions, process it, and obtain the target business data, which is then stored in the target topic. The method in this embodiment can be implemented through software, hardware, or a combination of both. Figure 1 As shown, this data processing method specifically includes the following steps:
[0065] S101: The container orchestrator adjusts the number of data processing containers and the number of partitions in the source topic of the Kafka Stream data processing platform within the data production containers, and makes the number of partitions the same as the number of data processing containers.
[0066] When it is necessary to adjust the amount of data that the Kafka Stream Data Processing Platform can handle, there is no need to stop the Kafka Stream Data Processing Platform from running; the preset data volume adjustment file can be modified.
[0067] In this step, after modifying the preset data volume adjustment file, during the operation of the Kafka Streaming Data Processing Platform, the container orchestrator can adjust the number of partitions in the source topic of the Kafka Streaming Data Processing Platform and the number of data processing containers in the data production container according to the preset data volume adjustment file, ensuring that the number of partitions in the source topic and the number of data processing containers are the same. The data production container is configured with the producer and source topic of the Kafka Streaming Data Processing Platform, and the data processing container is configured with the Kafka Streaming Data Processing Library.
[0068] It should be noted that the preset data volume adjustment file can be modified in several ways. First, it can be based on the amount of data already acquired to predict the amount of data the Kafka Stream data processing platform will need to process next. Then, based on the correspondence between the data volume and the number of data processing containers, the device running the container orchestrator modifies the data volume adjustment file. Alternatively, it can be manually modified by staff. This application does not limit the method of modifying the preset data volume adjustment file; the method can be chosen according to the actual situation.
[0069] S102: Producers in the Kafka Stream Data Processing Platform write the acquired raw business data into different partitions of the source topic to obtain partition data.
[0070] In this step, after the container orchestrator adjusts the number of partitions and the number of data processing containers, the producers in the data production containers will write the acquired raw business data into different partitions of the source topic, resulting in partitioned data. Because the amount of raw business data acquired by the producers is large, while the capacity of a single partition of the source topic is limited, it is necessary to write the raw business data into different partitions of the source topic.
[0071] It should be noted that the producer can obtain raw business data in two ways: either by sending a data acquisition request to the electronic device that generates the raw business data, and the electronic device then sending the raw business data back to the producer; or by the electronic device that generates the raw business data sending the raw business data to the producer on a timed or quantitative basis. This application does not limit the method by which the producer obtains raw business data; the method can be chosen according to the actual situation.
[0072] S103: Each data processing container obtains partition data for different partitions from the source topic.
[0073] In this step, after the producer writes the raw business data to different partitions of the source topic, to improve data processing efficiency, each data processing container needs to retrieve the partition data from the corresponding partition in the source topic. There is a correspondence between partitions and data processing containers; one partition corresponds to one data processing container. In this way, each data processing container can retrieve the partition data from its corresponding partition for subsequent processing.
[0074] S104: The data processing container processes the partitioned data according to the configured Kafka streaming data processing library to obtain the target business data, and stores the target business data into the target topic of the Kafka streaming data processing platform.
[0075] In this step, after the data processing container obtains the partitioned data, in order to meet the user's data processing needs, it can process the partitioned data according to the data processing methods in the Kafka Streaming Data Processing Library to obtain the target business data, and then store the target business data in the target topic of the Kafka Streaming Data Processing Platform.
[0076] It should be noted that the data processing method can be a data aggregation method, a data mapping method, a data filtering method, or a user-defined data processing method. Users can also set the parameters in the data processing method to meet different data processing needs. This application does not limit the data processing method; it can be selected according to the actual situation.
[0077] It should be noted that after the data processing container obtains the partitioned data, it can also group the partitioned data for processing, and then process each group of data, which can improve data processing efficiency.
[0078] The data processing method provided in this embodiment involves a container orchestrator adjusting the number of partitions and data processing containers. The producer stores the acquired raw business data into different partitions, and each data processing container processes the partition data in its corresponding partition. After obtaining the target business data, it is stored in the target topic. Compared to existing technologies that require stopping the streaming data processing platform and manually adjusting parameters to control the amount of data processed before data processing can resume, this application eliminates the need to stop the streaming data processing platform. The container orchestrator adjusts the number of data processing containers to control the amount of data processed, and multiple data processing containers are used to process the business data, effectively improving data processing efficiency.
[0079] Figure 2 This is a flowchart illustrating a second embodiment of the data processing method provided in this application. Based on the above embodiments, this application describes a scenario where the producer writes the original business data into different partitions according to the hash value and number of partitions of the original business data, and the data processing container retrieves the partition data based on the hash value, number of partitions, and partition number of the original business data. For example... Figure 2 As shown, this data processing method specifically includes the following steps:
[0080] S201: For each piece of raw business data obtained by the producer, the partition number is determined based on the hash value of the raw business data and the number of partitions of the source topic.
[0081] S202: Write the original business data into the partition corresponding to the partition number in the source topic to obtain the partition data.
[0082] In the steps described above, after the producer obtains the raw business data, for each piece of raw business data, the remainder when the hash value of the raw business data is divided by the number of partitions is calculated to determine the partition number corresponding to that raw business data. Then, the raw business data is written into the partition corresponding to the partition number in the source topic to obtain the partition data.
[0083] For example, if the number of partitions is 3, and the partition numbers are 0, 1, and 2, when the remainder of the hash value of the original business data divided by the number of partitions is 0, the original business data is determined to correspond to partition 0, and the original business data is stored in partition 0; when the remainder of the hash value of the original business data divided by the number of partitions is 1, the original business data is determined to correspond to partition 1, and the original business data is stored in partition 1; when the remainder of the hash value of the original business data divided by the number of partitions is 2, the original business data is determined to correspond to partition 2, and the original business data is stored in partition 2.
[0084] It should be noted that when the original business data is in key-value pair format, the hash value of the original business data can be either the hash value of the key or the hash value of the value. This application does not limit the method for determining the hash value of the original business data; the appropriate method can be chosen based on the actual situation.
[0085] S203: For each data processing container, the data processing container retrieves all partition data from the source topic.
[0086] In this step, after the producer writes the raw business data to different partitions, for each data processing container, in order for the data processing container to obtain the partition data in the corresponding partition, the data processing container first obtains all the partition data in the source topic.
[0087] S204: For each partition data, determine whether the remainder of the hash value of the partition data divided by the number of partitions is equal to the partition number corresponding to the data processing container; if the remainder of the hash value of the partition data divided by the number of partitions is equal to the partition number corresponding to the data processing container, then execute S205; if the remainder of the hash value of the partition data divided by the number of partitions is not equal to the partition number corresponding to the data processing container, then execute S206.
[0088] S205: Retain partition data.
[0089] S206: Delete the partition data.
[0090] In the above steps, after the data processing container obtains all partition data, in order to determine which partition data the data processing container should process, for each partition data, it is necessary to determine whether the remainder of the hash value of the partition data divided by the number of partitions is equal to the partition number corresponding to the data processing container.
[0091] If the remainder of the hash value of the partition data divided by the number of partitions is equal to the partition number corresponding to the data processing container, it means that the partition data belongs to the business data processed by the data processing container and is also the partition data in the corresponding partition of the data processing container, so it is retained.
[0092] If the remainder of the hash value of the partition data divided by the number of partitions is not equal to the partition number corresponding to the data processing container, it means that the partition data does not belong to the business data processed by the data processing container, nor is it the partition data in the corresponding partition of the data processing container, and it should be deleted.
[0093] The data processing method provided in this embodiment involves the producer determining the partition where the original business data should be written based on the hash value and number of partitions of the original business data; the data processing container then retrieves the partition data based on the hash value, number of partitions, and partition number of the partition data. This ensures that each data processing container retrieves the partition data from its corresponding partition, improving the accuracy of the partition data retrieval by the data processing container.
[0094] The following describes how the data processing container provided in this application obtains partition data based on the partition number.
[0095] After the producer writes the original business data to different partitions based on the hash value and number of partitions of the original business data; or the producer randomly writes the original business data to different partitions; or the producer writes the original business data to different partitions based on the amount of data in the original business data, and the amount of data in each partition is the same, since each partition has a partition number, for each data processing container, the data processing container obtains the source topic, and obtains the partition data in the partition corresponding to the partition number corresponding to the data processing container.
[0096] It should be noted that the embodiments of this application do not limit the way the producer writes the original business data to different partitions, and can be selected according to the actual situation.
[0097] The data processing method provided in this embodiment allows the data processing container to obtain the source topic and then retrieve the partition data in the partition according to the corresponding partition number, which effectively improves the efficiency of data retrieval and data processing.
[0098] The following describes the process of obtaining a target topic using a data storage container provided in this application embodiment and storing the target business data in the target topic into a database.
[0099] After the data processing container stores the target business data into the target topic, in order to improve the efficiency of data storage, the consumer of the Kafka Stream Data Processing Platform can be used to obtain the target topic and store the target business data in the target topic into the database. The consumer is configured in the data storage container.
[0100] Since the process of processing business data is completed through data processing containers, data storage containers are used to obtain the target subject in order to ensure process compatibility and improve data storage efficiency.
[0101] The data processing method provided in this embodiment obtains the target topic by configuring a consumer data storage container with a Kafka Stream data processing platform, and stores the target business data in the target topic into the database, which effectively improves the efficiency of data storage.
[0102] The overall flow of the data processing method provided in the embodiments of this application is described below.
[0103] Figure 3 A schematic diagram of the overall flow of the data processing method provided in this application is shown below. Figure 3 As shown, the container orchestrator adjusts the files according to the preset data volume, setting the number of data processing containers to three and the number of partitions to three. The data production container is configured with producers and source topics from the Kafka Streaming Data Processing Platform, the data processing container is configured with the Kafka Streaming Data Processing Library, and the data storage container is configured with consumers from the Kafka Streaming Data Processing Platform.
[0104] The producer writes the acquired raw business data into different partitions of the source topic. Each data processing container then retrieves the source topic, processes the corresponding partition data from the source topic to obtain the target business data, and finally stores the target business data into the target topic. The data storage container can then retrieve the target topic, obtain the target business data, and store it in the database.
[0105] The data processing method provided in this embodiment allows the container orchestrator to adjust the number of data processing containers based on a preset data volume. Each data processing container processes the data, thereby adjusting the amount of data processed by the streaming data processing platform and effectively improving the efficiency of data processing.
[0106] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.
[0107] Figure 4 This is a schematic diagram of the structure of an embodiment of the data processing apparatus provided in this application; as shown below. Figure 4 As shown, the data processing device 40 includes:
[0108] The orchestration module 41 is used to adjust the number of data processing containers and the number of partitions of the source topic in the Kafka Stream data processing platform in the data production container through the container orchestrator, and to make the number of partitions the same as the number of data processing containers.
[0109] Storage module 42 is used to write the acquired raw business data into different partitions of the source topic through the producer in the Kafka stream data processing platform to obtain partition data;
[0110] The acquisition module 43 is used to acquire partition data of different partitions from the source topic through each of the data processing containers;
[0111] The processing module 44 is used to process the partition data through the data processing container according to the configured Kafka streaming data processing library to obtain target business data, and store the target business data into the target topic of the Kafka streaming data processing platform.
[0112] Furthermore, the acquisition module 43 is specifically used for:
[0113] For each data processing container, the data processing container acquires all partition data in the source topic;
[0114] For each partition data entry, if the remainder of the hash value of the partition data divided by the number of partitions is not equal to the partition number corresponding to the data processing container, then the partition data is deleted.
[0115] Furthermore, the acquisition module 43 is specifically used for:
[0116] For each data processing container, the data processing container obtains the partition data in the partition corresponding to the partition number.
[0117] Furthermore, the storage module 42 is specifically used for:
[0118] For each piece of raw business data obtained by the producer, the partition number is determined based on the hash value of the raw business data and the number of partitions;
[0119] The original business data is written into the partition corresponding to the partition number in the source topic to obtain the partition data.
[0120] Furthermore, the storage module 42 is also used for:
[0121] The target topic is obtained by the consumer of the Kafka Stream data processing platform, and the target business data in the target topic is stored in the database. The consumer is configured in the data storage container.
[0122] The data processing device provided in this embodiment is used to execute the technical solutions in any of the foregoing method embodiments. Its implementation principle and technical effect are similar, and will not be described again here.
[0123] Figure 5 This is a schematic diagram of the structure of an electronic device provided in this application. Figure 5 As shown, the electronic device 50 includes:
[0124] Processor 51, memory 52, and communication interface 53;
[0125] The memory 52 is used to store the executable instructions of the processor 51;
[0126] The processor 51 is configured to execute the technical solution of the electronic device in any of the foregoing method embodiments by executing the executable instructions.
[0127] Optionally, the memory 52 can be either standalone or integrated with the processor 51.
[0128] Optionally, when the memory 52 is a device independent of the processor 51, the electronic device 50 may further include:
[0129] Bus 54, memory 52 and communication interface 53 are connected to processor 51 through bus 54 and complete communication with each other. Communication interface 53 is used to communicate with other devices.
[0130] Optionally, the communication interface 53 can be implemented using a transceiver. The communication interface is used to enable communication between the database access device and other devices (e.g., clients, read-write databases, and read-only databases). The memory may include random access memory (RAM) and may also include non-volatile memory, such as at least one disk drive.
[0131] Bus 54 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, only one thick line is used in the diagram, but this does not indicate that there is only one bus or one type of bus.
[0132] The processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0133] The electronic device is used to execute the technical solutions in any of the foregoing method embodiments. Its implementation principle and technical effect are similar, and will not be described again here.
[0134] This application also provides a readable storage medium storing a computer program thereon, which, when executed by a processor, implements the technical solutions provided in any of the foregoing method embodiments.
[0135] This application also provides a computer program product, including a computer program, which, when executed by a processor, is used to implement the technical solutions provided in any of the foregoing method embodiments.
[0136] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.
[0137] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this application.
Claims
1. A data processing method, characterized in that, include: The container orchestrator adjusts the number of data processing containers and the number of partitions in the source topic of the Kafka Stream data processing platform within the data production containers, and makes the number of partitions the same as the number of data processing containers; The producers in the Kafka Stream data processing platform write the acquired raw business data into different partitions of the source topic to obtain partition data. Each of the data processing containers obtains partition data from different partitions in the source topic; The data processing container processes the partitioned data according to the configured Kafka streaming data processing library to obtain target business data, and stores the target business data into the target topic of the Kafka streaming data processing platform; Each of the data processing containers obtains partition data from different partitions of the source topic, including: For each data processing container, the data processing container acquires all partition data in the source topic; For each piece of partition data, if the remainder of the hash value of the partition data divided by the number of partitions is equal to the partition number corresponding to the data processing container, then the partition data is retained. For each data processing container, the data processing container obtains the partition data in the partition corresponding to the partition number. If the remainder of the hash value of the partition data divided by the number of partitions is not equal to the partition number corresponding to the data processing container, then the partition data is deleted.
2. The method according to claim 1, characterized in that, In the Kafka stream data processing platform, the producer writes the acquired raw business data into different partitions of the source topic, resulting in partition data, including: For each piece of raw business data obtained by the producer, the partition number is determined based on the hash value of the raw business data and the number of partitions; The original business data is written into the partition corresponding to the partition number in the source topic to obtain the partition data.
3. The method according to claim 2, characterized in that, The method further includes: The consumer of the Kafka Stream data processing platform acquires the target topic and stores the target business data in the target topic into the database. The consumer is configured in the data storage container.
4. A data processing apparatus, characterized in that, include: An orchestration module is used to adjust the number of data processing containers and the number of partitions of the source topic in the Kafka Stream data processing platform in the data production container through a container orchestrator, and to make the number of partitions the same as the number of data processing containers. The storage module is used to write the raw business data obtained by the producer in the Kafka Stream data processing platform into different partitions of the source topic to obtain partition data; An acquisition module is used to acquire partition data of different partitions from the source topic through each of the data processing containers; The processing module is used to process the partitioned data through the data processing container according to the configured Kafka streaming data processing library to obtain target business data, and store the target business data into the target topic of the Kafka streaming data processing platform; The acquisition module is specifically configured to, for each data processing container, acquire all partition data in the source topic; for each piece of partition data, if the remainder of the hash value of the partition data divided by the number of partitions is equal to the partition number corresponding to the data processing container, then the partition data is retained; for each data processing container, according to the partition number corresponding to the data processing container, the data processing container acquires the partition data in the partition corresponding to the partition number. If the remainder of the hash value of the partition data divided by the number of partitions is not equal to the partition number corresponding to the data processing container, then the partition data is deleted.
5. The apparatus according to claim 4, characterized in that, The storage module is specifically used for: For each piece of raw business data obtained by the producer, the partition number is determined based on the hash value of the raw business data and the number of partitions; The original business data is written into the partition corresponding to the partition number in the source topic to obtain the partition data.
6. The apparatus according to claim 5, characterized in that, The storage module is also used for: The target topic is obtained by the consumer of the Kafka Stream data processing platform, and the target business data in the target topic is stored in the database. The consumer is configured in the data storage container.
7. An electronic device, characterized in that, include: Processor, memory, communication interface; The memory is used to store the executable instructions of the processor; The processor is configured to execute the data processing method according to any one of claims 1 to 3 by executing the executable instructions.
8. A readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the data processing method according to any one of claims 1 to 3.
9. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the data processing method according to any one of claims 1 to 3.