Big data doris writing method and device

By using Spark stream processing components and Doris's HTTP interface in a vehicle-to-everything (V2X) big data analytics platform, combined with Kafka message queues, a high-performance data writing mode was designed. This solved the problem of real-time writing of large amounts of data in Doris, achieving low-latency and high-performance data processing.

CN122309594APending Publication Date: 2026-06-30CHERY AUTOMOBILE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHERY AUTOMOBILE CO LTD
Filing Date
2026-03-31
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Traditional Doris writing methods suffer from poor real-time write performance and high latency in vehicle-to-everything (V2X) big data analytics platforms, failing to meet real-time processing requirements within one second.

Method used

By determining the consumption batch and data volume of the target message queue, the Spark stream processing component is used to consume the queue data, which is then encapsulated into an elastic distributed data collection. Doris message header information is configured, and the HTTP interface is called to assemble the data and write it to Doris using the StreamLoad method.

Benefits of technology

It achieves low-latency and high-performance data writing, and can efficiently import hundreds of millions of real-time data generated every day into Doris, meeting the real-time writing requirements of the vehicle-to-everything (V2X) big data analysis platform.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309594A_ABST
    Figure CN122309594A_ABST
Patent Text Reader

Abstract

This application relates to a method and apparatus for writing large amounts of data to Doris. The method includes: determining the consumption batch and consumption data volume of a target message queue; consuming queue data of the target message queue using a pre-defined Spark stream processing component based on the consumption batch and consumption data volume to obtain corresponding consumption data; encapsulating the consumption data into an elastic distributed data set of the Spark framework; assembling the distributed data in the elastic distributed data set into HTTP request data strings; configuring pre-defined Doris message header information; calling the Doris HTTP interface to assemble the message header information and HTTP request data strings to obtain corresponding assembled set data; and writing the assembled set data into the corresponding Doris using a pre-defined StreamLoad method. This solves the problem of real-time writing of large amounts of data to Doris in existing vehicle network big data analysis platforms.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of intelligent analysis platform technology for vehicle networking, and in particular to a method and apparatus for writing large amounts of data using Doris. Background Technology

[0002] Traditional methods for writing to Doris mainly include JDBC and StreamLoad. JDBC has poor performance, while StreamLoad performs better. StreamLoad typically imports data via CSV files or manually. Furthermore, Spark itself processes real-time data in micro-batch mode, rather than single-row real-time processing. Therefore, the entire processing flow has a delay of up to a second (2-5 seconds), which cannot meet the requirements of real-time processing (within 1 second) and urgently needs to be addressed. Summary of the Invention

[0003] This application provides a method and apparatus for writing large amounts of Doris data to solve the problem that existing vehicle network big data analysis platforms have difficulty in writing large amounts of Doris data in real time.

[0004] The first aspect of this application provides a method for writing large amounts of data to Doris, including the following steps: determining the consumption batch and consumption data volume of a target message queue, and based on the consumption batch and consumption data volume, consuming the queue data of the target message queue through a preset Spark stream processing component to obtain corresponding consumption data; encapsulating the consumption data into an elastic distributed data set of the Spark framework, and assembling the distributed data in the elastic distributed data set into HTTP request data strings respectively; configuring preset Doris message header information, and calling the Doris HTTP interface to assemble the message header information and the HTTP request data string to obtain corresponding assembled set data, and writing the assembled set data into the corresponding Doris through a preset StreamLoad method.

[0005] Optionally, in one embodiment of this application, determining the consumption batch and consumption data volume of the target message queue, and consuming the queue data of the target message queue through a preset Spark stream processing component based on the consumption batch and the consumption data volume to obtain the corresponding consumption data, includes: obtaining the vehicle network data of the current vehicle, and reporting the vehicle network data to the target message queue through an onboard TSP, and obtaining the data volume information of the vehicle network data in the target message queue to determine the consumption batch and the consumption data volume based on the data volume information; consuming the queue data in the target message queue based on the Spark stream processing component, the consumption batch, and the consumption data volume to obtain the consumption data.

[0006] Optionally, in one embodiment of this application, assembling the distributed data in the elastic distributed data set into HTTP request data strings includes: converting the consumed data into an elastic distributed data set, parsing the partition data of each partition in the elastic distributed data set to obtain the parsed data corresponding to each partition, and converting and assembling the parsed data corresponding to all partition data to generate the HTTP request data string corresponding to the elastic distributed data set.

[0007] Optionally, in one embodiment of this application, configuring preset Doris message header information and calling the Doris HTTP interface to assemble the message header information and the HTTP request data string to obtain the corresponding assembled set data includes: configuring message header information corresponding to each Doris in the current Doris cluster based on the Doris database address and authentication requirements of the current Doris cluster, wherein the message header information includes the HTTP protocol request to be requested and the authentication information to be requested; and calling the Doris HTTP interface to assemble the Doris address, the HTTP protocol request to be requested, the authentication information to be requested, and the HTTP request data string to generate the assembled set data.

[0008] A second aspect of this application provides a large-scale Doris writing device, comprising: a consumption module, configured to determine the consumption batch and consumption data volume of a target message queue, and based on the consumption batch and consumption data volume, consume queue data of the target message queue through a preset Spark stream processing component to obtain corresponding consumption data; an encapsulation module, configured to encapsulate the consumption data into an elastic distributed data set of the Spark framework, and assemble the distributed data in the elastic distributed data set into HTTP request data strings respectively; and a writing module, configured to configure preset Doris message header information, and call the Doris HTTP interface to assemble the message header information and the HTTP request data string to obtain corresponding assembled set data, and write the assembled set data into the corresponding Doris through a preset StreamLoad method.

[0009] Optionally, in one embodiment of this application, the consumption module includes: a first acquisition unit, configured to acquire vehicle network data of the current vehicle, and report the vehicle network data to the target message queue via the vehicle-mounted TSP, and acquire data volume information of the vehicle network data in the target message queue, so as to determine the consumption batch and the consumption data volume based on the data volume information; and a second acquisition unit, configured to consume queue data in the target message queue based on the Spark stream processing component, the consumption batch, and the consumption data volume, so as to acquire the consumption data.

[0010] Optionally, in one embodiment of this application, the encapsulation module includes: a conversion unit, configured to convert the consumed data into an elastic distributed data set, and to perform a parsing operation on the partition data of each partition in the elastic distributed data set to obtain the parsed data corresponding to each partition, and to perform conversion and assembly operations on the parsed data corresponding to all partition data to generate an HTTP request data string corresponding to the elastic distributed data set.

[0011] Optionally, in one embodiment of this application, the writing module includes: a configuration unit, configured to configure message header information corresponding to each Doris in the current Doris cluster based on the Doris database address and authentication requirements of the current Doris cluster, wherein the message header information includes a pending HTTP protocol request and pending authentication information; and a generation unit, configured to call the Doris HTTP interface to concatenate the Doris address, the pending HTTP protocol request, the pending authentication information, and the HTTP request data string to generate the assembled set data.

[0012] A third aspect of this application provides a vehicle, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the large data volume Doris writing method as described in the above embodiments.

[0013] A fourth aspect of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described large data volume Doris writing method.

[0014] A fifth aspect of this application provides a computer program product, including a computer program that is executed to implement the above-described large data volume Doris writing method.

[0015] Therefore, the embodiments of this application have the following beneficial effects: The embodiments of this application determine the consumption batch and consumption data volume of the target message queue, and based on the consumption batch and consumption data volume, consume the queue data of the target message queue through a preset Spark stream processing component to obtain the corresponding consumption data; encapsulate the consumption data into an elastic distributed data set of the Spark framework, and assemble the distributed data in the elastic distributed data set into HTTP request data strings; configure preset Doris message header information, and call the Doris HTTP interface to assemble the message header information and HTTP request data strings to obtain the corresponding assembled set data, and write the assembled set data into the corresponding Doris through a preset StreamLoad method. This application, based on the Spark Streaming framework combined with the Kafka message queue, designs a high-performance data writing mode to import hundreds of millions of real-time data generated daily into Doris, thereby providing users with low-latency and high-performance data. This solves the problem of difficulty in real-time writing of large amounts of data to Doris in existing vehicle network big data analysis platforms.

[0016] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description

[0017] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein: Figure 1 This is a flowchart illustrating a large-scale Doris writing method according to an embodiment of this application; Figure 2 A schematic diagram of a Spark Streaming write process to Doris provided for one embodiment of this application; Figure 3 A schematic diagram of a process for writing distributed data by calling the HTTP interface of Doris, as provided in one embodiment of this application; Figure 4 A schematic diagram illustrating the actual operation of Spark Streaming writing to Doris in a project, as provided in one embodiment of this application; Figure 5 This is an example diagram of a large data volume Doris writing device according to an embodiment of this application; Figure 6 This is a schematic diagram of the vehicle structure provided in an embodiment of this application.

[0018] Among them, 10 is a large data volume Doris writing device; 100 is a consumer module; 200 is a packaging module; 300 is a writing module; 601 is a memory; 602 is a processor; and 603 is a communication interface. Detailed Implementation

[0019] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.

[0020] The following describes a method and apparatus for writing large amounts of data to Doris, based on embodiments of this application, with reference to the accompanying drawings. Addressing the problems mentioned in the background section, this application provides a method for writing large amounts of data to Doris. In this method, the consumption batch and consumption data volume of a target message queue are determined. Based on the consumption batch and consumption data volume, queue data of the target message queue is consumed using a preset Spark stream processing component to obtain corresponding consumption data. The consumption data is encapsulated into an elastic distributed data set of the Spark framework, and the distributed data in the elastic distributed data set are assembled into HTTP request data strings. Preset Doris message header information is configured, and the Doris HTTP interface is called to assemble the message header information and HTTP request data strings to obtain corresponding assembled set data. The assembled set data is then written to the corresponding Doris instance using a preset StreamLoad method. This application, based on the Spark Streaming framework combined with the Kafka message queue, designs a high-performance data writing mode to import hundreds of millions of real-time data generated daily into Doris, thereby providing users with low-latency and high-performance data. This solves the problem of difficulty in real-time writing large amounts of data to Doris in existing vehicle network big data analysis platforms.

[0021] Specifically, Figure 1 This is a flowchart illustrating a method for writing large amounts of data using Doris, as provided in an embodiment of this application.

[0022] like Figure 1 As shown, this Doris writing method for large data volumes includes the following steps: In step S101, the consumption batch and consumption data volume of the target message queue are determined, and based on the consumption batch and consumption data volume, the queue data of the target message queue is consumed through the preset Spark stream processing component to obtain the corresponding consumption data.

[0023] In this embodiment, the consumption batch and single consumption data volume of the target message queue can be determined first. Based on the batch and data volume configuration, the preset Spark stream processing component is called to perform streaming consumption of the queue data to output the corresponding structured consumption data.

[0024] Therefore, the embodiments of this application improve the efficiency and controllability of data processing by setting the consumption batch and data volume of the message queue and combining it with the Spark stream processing component to achieve precise consumption of queue data.

[0025] Optionally, in one embodiment of this application, the consumption batch and consumption data volume of the target message queue are determined, and based on the consumption batch and consumption data volume, the queue data of the target message queue is consumed through a preset Spark stream processing component to obtain the corresponding consumption data. This includes: obtaining the vehicle network data of the current vehicle, and reporting the vehicle network data to the target message queue through the vehicle-mounted TSP, and obtaining the data volume information of the vehicle network data in the target message queue to determine the consumption batch and consumption data volume based on the data volume information; consuming the queue data in the target message queue based on the Spark stream processing component, the consumption batch, and the consumption data volume to obtain the consumption data.

[0026] It should be noted that, in the embodiments of this application, the vehicle-to-Kafka (Kafka is an open-source distributed stream processing platform maintained by the Apache Software Foundation, mainly used to build real-time data pipelines and streaming applications. Its core design goals are high throughput, low latency, and support for data persistence and horizontal scaling. A topic in Kafka is similar to a table in a database or a queue in a message queue. Producers send data to a topic, and consumers read data from a topic. A topic can have multiple consumers and multiple producers, forming independent message pipelines / message queues) is automatically completed by Tbox.

[0027] Currently, the data volume can reach 20,000+ per second, and over 1 billion per day. Leveraging the performance of the Spark Streaming distributed framework, data from Kafka is consumed. The entire Spark Streaming system acts as a consumer group, and the consumption frequency and number of data entries can be set according to the actual data volume. Corresponding machine parameters also need to be configured (since the consumed data is initially stored in memory, the more machines, the more memory, and the more CPU cores, the stronger the processing power, allowing for the processing of more data volumes within a given time). Currently, with 4 servers configured with 4 cores and 8GB of RAM, it can consume 20,000+ data entries per second.

[0028] Therefore, the embodiments of this application automatically report vehicle data to the Kafka platform through Tbox. Relying on the high throughput characteristics of the Spark Streaming distributed framework, the consumption frequency, number of data entries and machine parameters can be flexibly configured to achieve efficient and stable processing of more than 20,000 vehicle data entries per second and more than 1 billion vehicle data entries per day.

[0029] In step S102, the consumed data is encapsulated into an elastic distributed data set of the Spark framework, and the distributed data in the elastic distributed data set is assembled into HTTP request data strings.

[0030] Furthermore, in this embodiment of the application, the data obtained from streaming consumption can be encapsulated into a resilient distributed dataset of the Spark framework, and then the distributed data shards in the dataset can be structured and assembled into HTTP request data strings that conform to the interface specification.

[0031] Therefore, embodiments of this application encapsulate consumer data into a resilient distributed dataset and assemble it into an HTTP request data string, and leverage Spark's distributed characteristics to improve data processing efficiency in order to adapt to network transmission requirements.

[0032] Optionally, in one embodiment of this application, the distributed data in the elastic distributed data set is assembled into an HTTP request data string, including: converting the consumed data into an elastic distributed data set, parsing the partition data of each partition in the elastic distributed data set to obtain the parsed data corresponding to each partition, and converting and assembling the parsed data corresponding to all partition data to generate an HTTP request data string corresponding to the elastic distributed data set.

[0033] In actual execution, after acquiring the corresponding consumption data, the embodiments of this application can convert all the consumption data into a distributed RDD (Resilient Distributed Dataset), which is the core and most basic data abstraction of Spark. It can be understood as an immutable, partitionable collection of elements that can be computed in parallel, and it is distributed and stored on different cluster nodes.

[0034] The core characteristic of RDDs is Resilient, which refers to fault tolerance. RDDs can automatically recover from node failures. This is achieved by recording lineage, rather than through data replication, making recovery highly efficient. Distributed: Data is partitioned and stored in the memory of multiple machines, enabling parallel computation. Dataset: A collection of data (which can be read from HDFS, the local file system, or directly transformed from an in-memory collection). RDD data is not directly readable and must be accessed through Spark's API, which allows for operations such as cleaning and transformation. RDD data is divided into multiple partitions, which are distributed across different nodes in the cluster. Partitions are the basic unit of parallel computation in Spark, and each partition is processed by a computation task. A computation task can be distributed across one or more servers.

[0035] Subsequently, embodiments of this application can assemble data from a distributed RDD dataset: 1. Parse the data in each partition of the distributed RDD dataset.

[0036] 2. Parse each data entry individually for each partition in each RDD, appending a comma to the end of each entry. After concatenating the complete data, enclose the remaining entries in square brackets to ensure that all data in the current RDD is correctly converted into an array.

[0037] The above data assembly operations can be completed on a big data cluster, running simultaneously on multiple machines, and transforming data from multiple RDDs and partitions at the same time. For example, there are three distributed datasets: RDD1, RDD2, RDD3, RDD4, and RDD5. Each RDD contains a number of data points (determined by the actual amount of data consumed and the frequency of data retrieval). RDD1 contains datasets (1,2,3), RDD2 contains datasets (4,5,6), RDD3 contains datasets (7,8,9), RDD4 contains datasets (10,11,12), and RDD5 contains datasets (13,14,15).

[0038] Furthermore, in this embodiment of the application, the data of each partition in each RDD can be processed by the program to further transform and assemble the entire data into a usable array (i.e., an HTTP request data string), as shown in the following formula:

[0039] Therefore, the embodiments of this application transform consumer data into a fault-tolerant distributed RDD dataset, and rely on the multi-node parallel parsing of each partition's data and standardized assembly into an array-formatted request data string, thereby fully leveraging Spark's distributed parallel computing power to significantly improve the efficiency of massive data conversion and assembly, while taking into account both the stability and efficiency of data processing.

[0040] In step S103, the preset Doris message header information is configured, and the Doris HTTP interface is called to assemble the message header information and HTTP request data string to obtain the corresponding assembled set data. The assembled set data is then written to the corresponding Doris using the preset StreamLoad method.

[0041] After that, as Figure 2 As shown, this embodiment of the application can configure the preset message header information corresponding to Doris, call the HTTP interface of Doris, and concatenate and integrate the message header information with the assembled HTTP request data string to generate the corresponding assembled set data; subsequently, this embodiment of the application can use the StreamLoad loading method to write the assembled set data into the target Doris database in batches.

[0042] Therefore, the embodiments of this application generate a compliant data set by configuring the Doris message header and concatenating the request data string, and write it to Doris in batches using the StreamLoad method, thereby achieving efficient storage of massive amounts of data, improving the stability and throughput efficiency of data writing, and adapting to the needs of big data batch storage scenarios.

[0043] Optionally, in one embodiment of this application, pre-defined Doris message header information is configured, and the Doris HTTP interface is called to assemble the message header information and the HTTP request data string to obtain the corresponding assembled set data. This includes: configuring the message header information corresponding to each Doris in the current Doris cluster based on the Doris database address and authentication requirements of the current Doris cluster, wherein the message header information includes the HTTP protocol request to be requested and the authentication information to be requested; and calling the Doris HTTP interface to assemble the Doris address, the HTTP protocol request to be requested, the authentication information to be requested, and the HTTP request data string to generate the assembled set data.

[0044] As one possible approach, such as Figure 3As shown, this embodiment first requires configuring the message header information for writing to Doris. By default, the values ​​of the three parameters are configured first: (Expect:100-continue, format:json, strip_outer_array:true). Based on the Doris database address and authentication requirements of the current cluster, the required HTTP protocol request and authentication information (Doris username and password) are configured. The Doris request address is then called, manually concatenating the Doris address + " / api / " + Doris database name + " / " + Doris table name + " / _stream_load". The concatenated collection data (i.e., the HTTP request data string) is passed as the message body, completing the StreamLoad call and importing the data into Doris. This method will also be completed on a distributed cluster, with multiple nodes and multiple tasks running simultaneously to import data into the Doris cluster. Furthermore, the actual operation of Spark Streaming writing to Doris in the project is as follows: Figure 4 As shown.

[0045] The large-volume Doris writing method proposed in this application consumes data from the message queue Kafka in real time using Spark Streaming; sets the consumption batch and data volume according to the data volume and real-time requirements; encapsulates the consumed data into Spark RDDs (distributed datasets); utilizes the distributed capabilities of the big data platform, encapsulates the RDDs into HTTP message bodies to call the request interface; and calls the Doris HTTP interface to write the distributed data to Doris using the StreamLoad method. This application consumes data from the message queue Kafka in real time using Spark Streaming, utilizes the distributed framework and distributed computing capabilities of the big data platform, encapsulates the real-time reported vehicle network data into a distributed dataset, synchronously calls the Doris HTTP interface, and writes hundreds of millions of data points to Doris using the StreamLoad method, thereby providing automakers with instant query and analysis services. This solves the problem of real-time writing of large volumes of data to Doris in vehicle network big data analysis platforms.

[0046] Secondly, a large data volume Doris writing device according to an embodiment of this application is described with reference to the accompanying drawings.

[0047] Figure 5 This is a block diagram of a large data volume Doris writing device according to an embodiment of this application.

[0048] like Figure 5As shown, the large data volume Doris writing device 10 includes: a consumption module 100, a packaging module 200, and a writing module 300.

[0049] The consumption module 100 is used to determine the consumption batch and consumption data volume of the target message queue, and based on the consumption batch and consumption data volume, consumes the queue data of the target message queue through a preset Spark stream processing component to obtain the corresponding consumption data.

[0050] Encapsulation module 200 is used to encapsulate consumer data into a Spark framework elastic distributed data collection, and to assemble the distributed data in the elastic distributed data collection into HTTP request data strings.

[0051] The writing module 300 is used to configure the preset Doris message header information, and call the Doris HTTP interface to assemble the message header information and HTTP request data string to obtain the corresponding assembled set data, and write the assembled set data into the corresponding Doris through the preset StreamLoad method.

[0052] Optionally, in one embodiment of this application, the consumption module 100 includes: a first acquisition unit and a second acquisition unit.

[0053] The first acquisition unit is used to acquire the vehicle network data of the current vehicle, and report the vehicle network data to the target message queue through the vehicle-mounted TSP. It also acquires the data volume information of the vehicle network data in the target message queue, so as to determine the consumption batch and consumption data volume based on the data volume information.

[0054] The second acquisition unit is used to consume queue data in the target message queue based on the Spark stream processing component, consumption batch, and consumption data volume to obtain the consumption data.

[0055] Optionally, in one embodiment of this application, the encapsulation module 200 includes: a conversion unit, configured to convert the consumed data into an elastic distributed data set, and to perform parsing operations on the partition data of each partition in the elastic distributed data set to obtain the parsed data corresponding to each partition, and to perform conversion and assembly operations on the parsed data corresponding to all partition data to generate an HTTP request data string corresponding to the elastic distributed data set.

[0056] Optionally, in one embodiment of this application, the writing module 300 includes a configuration unit and a generation unit.

[0057] The configuration unit is used to configure the message header information for each Doris instance in the current Doris cluster based on the Doris database address and authentication requirements of the current Doris cluster. The message header information includes the HTTP protocol request to be requested and the authentication information to be requested.

[0058] The generation unit is used to call the Doris HTTP interface to concatenate the Doris address, the HTTP protocol request to be requested, the authentication information to be requested, and the HTTP request data string to generate the assembled set data.

[0059] It should be noted that the foregoing explanation of the Doris writing method embodiment for large data volumes also applies to the Doris writing device for large data volumes in this embodiment, and will not be repeated here.

[0060] The large-scale Doris writing device proposed in this application includes a consumption module 100, used to determine the consumption batch and consumption data volume of the target message queue, and based on the consumption batch and consumption data volume, consume the queue data of the target message queue through a preset Spark stream processing component to obtain the corresponding consumption data; an encapsulation module 200, used to encapsulate the consumption data into an elastic distributed data set of the Spark framework, and assemble the distributed data in the elastic distributed data set into HTTP request data strings; and a writing module 300, used to configure preset Doris message header information, and call the Doris HTTP interface to assemble the message header information and HTTP request data strings to obtain the corresponding assembled set data, and write the assembled set data into the corresponding Doris through a preset StreamLoad method. This application, based on the Spark Streaming framework combined with the Kafka message queue, designs a high-performance data writing mode to import hundreds of millions of real-time data generated daily into Doris, thereby providing users with low-latency and high-performance data.

[0061] Figure 6 A schematic diagram of the structure of a vehicle provided in an embodiment of this application. The vehicle may include: The memory 601, the processor 602, and the computer program stored on the memory 601 and capable of running on the processor 602.

[0062] When the processor 602 executes the program, it implements the large data volume Doris writing method provided in the above embodiments.

[0063] Furthermore, the vehicle also includes: Communication interface 603 is used for communication between memory 601 and processor 602.

[0064] The memory 601 is used to store computer programs that can run on the processor 602.

[0065] The memory 601 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk storage device.

[0066] If the memory 601, processor 602, and communication interface 603 are implemented independently, then the communication interface 603, memory 601, and processor 602 can be interconnected via a bus to complete communication between them. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 6 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0067] Optionally, in a specific implementation, if the memory 601, processor 602, and communication interface 603 are integrated on a single chip, then the memory 601, processor 602, and communication interface 603 can communicate with each other through an internal interface.

[0068] The processor 602 may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of this application.

[0069] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described large data volume Doris writing method.

[0070] This application also provides a computer program product, including a computer program, which, when executed, is used to implement the above-described large data volume Doris writing method.

[0071] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0072] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "N" means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0073] Any process or method described in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or N executable instructions for implementing custom logic functions or processes, and the scope of the preferred embodiments of this application includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as should be understood by those skilled in the art to which embodiments of this application pertain.

[0074] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Alternatively, the computer-readable medium may be paper or other suitable media on which the program can be printed, since the program can be obtained electronically by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.

[0075] It should be understood that the various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0076] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

[0077] Furthermore, the functional units in the various embodiments of this application can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

[0078] The storage medium mentioned above can be a read-only memory, a disk, or an optical disk, etc. Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions, and variations to the above embodiments within the scope of this application.

Claims

1. A method for writing large amounts of data using Doris, characterized in that, Includes the following steps: Determine the consumption batch and consumption data volume of the target message queue, and based on the consumption batch and consumption data volume, consume the queue data of the target message queue through a preset Spark stream processing component to obtain the corresponding consumption data; The consumption data is encapsulated into an elastic distributed data collection using the Spark framework, and the distributed data in the elastic distributed data collection are assembled into HTTP request data strings respectively. Configure the preset Doris message header information, and call the Doris HTTP interface to assemble the message header information and the HTTP request data string to obtain the corresponding assembled set data. Then, write the assembled set data into the corresponding Doris using the preset StreamLoad method.

2. The method according to claim 1, characterized in that, The process of determining the consumption batch and consumption data volume of the target message queue, and consuming the queue data of the target message queue through a preset Spark stream processing component based on the consumption batch and consumption data volume to obtain the corresponding consumption data, includes: The vehicle network data of the current vehicle is obtained and reported to the target message queue through the vehicle TSP. The data volume information of the vehicle network data in the target message queue is also obtained to determine the consumption batch and the consumption data volume based on the data volume information. Based on the Spark stream processing component, the consumption batch, and the consumption data volume, the queue data in the target message queue is consumed to obtain the consumption data.

3. The method according to claim 2, characterized in that, The step of assembling the distributed data in the elastic distributed dataset into HTTP request data strings includes: The consumption data is converted into an elastic distributed data set, and the partition data of each partition in the elastic distributed data set is parsed to obtain the parsed data corresponding to each partition. The parsed data corresponding to all partition data is then transformed and assembled to generate the HTTP request data string corresponding to the elastic distributed data set.

4. The method according to claim 3, characterized in that, The configuration includes preset Doris message header information, and the Doris HTTP interface is called to assemble the message header information and the HTTP request data string to obtain the corresponding assembled set data, including: Based on the Doris database address and authentication requirements of the current Doris cluster, configure the message header information corresponding to each Doris in the current Doris cluster, wherein the message header information includes the HTTP protocol request to be requested and the authentication information to be requested; The Doris HTTP interface is invoked to concatenate the Doris address, the HTTP protocol request to be requested, the authentication information to be requested, and the HTTP request data string to generate the assembled set data.

5. A Doris writing device for large data volumes, characterized in that, include: The consumption module is used to determine the consumption batch and consumption data volume of the target message queue, and based on the consumption batch and consumption data volume, consume the queue data of the target message queue through a preset Spark stream processing component to obtain the corresponding consumption data. The encapsulation module is used to encapsulate the consumed data into an elastic distributed data collection of the Spark framework, and to assemble the distributed data in the elastic distributed data collection into HTTP request data strings respectively. The writing module is used to configure the preset Doris message header information, and call the Doris HTTP interface to assemble the message header information and the HTTP request data string to obtain the corresponding assembled set data, and write the assembled set data into the corresponding Doris through the preset StreamLoad method.

6. The apparatus according to claim 5, characterized in that, The consumption module includes: The first acquisition unit is used to acquire the vehicle network data of the current vehicle, and report the vehicle network data to the target message queue through the vehicle-mounted TSP, and acquire the data volume information of the vehicle network data in the target message queue, so as to determine the consumption batch and the consumption data volume based on the data volume information; The second acquisition unit is used to consume queue data in the target message queue based on the Spark stream processing component, the consumption batch, and the consumption data volume, so as to obtain the consumption data.

7. The apparatus according to claim 6, characterized in that, The packaging module includes: The conversion unit is used to convert the consumed data into an elastic distributed data set, and to parse the partition data of each partition in the elastic distributed data set to obtain the parsed data corresponding to each partition. The unit also performs conversion and assembly operations on the parsed data corresponding to all partition data to generate the HTTP request data string corresponding to the elastic distributed data set.

8. A vehicle, characterized in that, include: A memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the large data volume Doris writing method as described in any one of claims 1-4.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, The program is executed by the processor to implement the large data volume Doris writing method as described in any one of claims 1-4.

10. A computer program product, comprising a computer program, characterized in that, The computer program is executed to implement the large data volume Doris writing method as described in any one of claims 1-4.