A data compression method, device, electronic equipment and storage medium
By employing a columnar architecture and adaptive compression algorithm to segment and compress vehicle data in vehicle storage, the problems of storage redundancy and insufficient compression efficiency in vehicle storage are solved, achieving efficient data compression and reduced storage space.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU AUTOMOBILE GROUP CO LTD
- Filing Date
- 2026-02-02
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for in-vehicle storage suffer from storage redundancy and insufficient compression efficiency. In particular, when storing in-vehicle data, they cannot effectively utilize the characteristic differences of the collected data, resulting in wasted storage space and low compression efficiency.
By dividing the collected data into blocks and storing them in a columnar manner according to the collection time, and adaptively selecting compression algorithms according to the data type, the columns containing the data are dynamically compressed. The columnar architecture is used to store the data, and the compression is performed by combining the difference storage algorithm and the short code mapping algorithm.
It improves data compression efficiency, reduces storage space, lowers data upload bandwidth, and ensures efficient data querying.
Smart Images

Figure CN122247431A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of communication technology, and more specifically, to a data compression method, apparatus, electronic device, and storage medium. Background Technology
[0002] In the wave of intelligent vehicle development, in-vehicle storage is facing unprecedented opportunities. With the deep integration of artificial intelligence and internet technology, the demand for in-vehicle storage in intelligent vehicles is growing rapidly; at the same time, the enhanced connectivity of vehicles is leading to a dramatic increase in the amount of data generated by vehicle networks. Therefore, how to efficiently compress in-vehicle storage data within limited space and reduce storage footprint has become a major technological challenge. Summary of the Invention
[0003] In view of this, embodiments of this application propose a data compression method, apparatus, electronic device, and storage medium to improve the above-mentioned problems.
[0004] In a first aspect, embodiments of this application provide a data compression method, the method comprising: acquiring raw collected data to be compressed and the acquisition time corresponding to the raw collected data; acquiring a file block to be compressed based on the raw collected data and the acquisition time corresponding to the raw collected data, wherein the file block to be compressed stores data in a columnar architecture, and the file block to be compressed includes a column containing the raw collected data and a column containing the acquisition time; compressing the column containing the raw collected data in the file block to be compressed according to a compression algorithm corresponding to the data type of the raw collected data, to obtain a compressed file block, wherein the data type includes numeric and non-numeric types.
[0005] Secondly, embodiments of this application provide a data compression apparatus, comprising: a data acquisition module, a data segmentation module, and a data column compression module. The data acquisition module is used to acquire raw acquired data to be compressed and the acquisition time corresponding to the raw acquired data; the data segmentation module is used to acquire file blocks to be compressed based on the raw acquired data and the acquisition time corresponding to the raw acquired data, wherein the file blocks to be compressed use a columnar architecture to store data, and the file blocks to be compressed include the column containing the raw acquired data and the column containing the acquisition time; the data column compression module is used to compress the column containing the raw acquired data in the file blocks to be compressed according to a compression algorithm corresponding to the data type of the raw acquired data, to obtain compressed file blocks, wherein the data type includes numeric and non-numeric types.
[0006] Thirdly, embodiments of this application provide an electronic device, including a memory and a processor, wherein the memory is coupled to the processor, the memory stores instructions, and when the instructions are executed by the processor, the processor executes the data compression method provided in the first aspect above.
[0007] Fourthly, embodiments of this application provide a computer-readable storage medium storing program code, which can be invoked by a processor to execute the data compression method provided in the first aspect above.
[0008] In this application's solution, the original acquired data to be compressed and the corresponding acquisition time are obtained. Based on the original acquired data and the corresponding acquisition time, a file block to be compressed is obtained. The file block to be compressed uses a columnar architecture to store data, and includes a column containing the original acquired data and a column containing the acquisition time. According to the compression algorithm corresponding to the data type of the original acquired data, the column containing the original acquired data in the file block to be compressed is compressed to obtain the compressed file block. The data type includes numeric and non-numeric types. Thus, the acquired data is divided into blocks and stored in a columnar manner according to the acquisition time corresponding to the acquired data, and the compression algorithm corresponding to the column containing the data is adaptively selected according to the data type of the acquired data, and the column containing the data is dynamically compressed, which improves the efficiency of data compression and reduces the data storage space. Attached Figure Description
[0009] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0010] Figure 1 A schematic flowchart of a data compression method provided in an embodiment of this application is shown; Figure 2 This application illustrates a schematic diagram of a gateway clock synchronization process according to an embodiment of the present application. Figure 3 A schematic flowchart of a data compression method provided in an embodiment of this application is shown; Figure 4 This illustration shows a schematic diagram of multiple file blocks to be compressed stored in a columnar manner according to an embodiment of this application; Figure 5 A block diagram of a data compression apparatus according to an embodiment of this application is shown; Figure 6A block diagram of an electronic device according to an embodiment of the present application for performing a data compression method according to an embodiment of the present application is shown. Detailed Implementation
[0011] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0012] The implementation details of the technical solutions in the embodiments of this application are described in detail below: Currently, in-vehicle electronic systems typically add timestamps to all data for full recording when storing data, leading to wasted storage space. Furthermore, when compressing stored data, they often employ a single compression algorithm, which struggles to adapt to the characteristics of different signals, resulting in low compression efficiency. Therefore, existing technologies for in-vehicle data storage generally suffer from storage redundancy and insufficient compression efficiency.
[0013] To address the aforementioned problems, the inventors, through extensive research, have developed a data compression method, apparatus, electronic device, and storage medium as provided in the embodiments of this application. By dividing the collected data into blocks and storing them column-wise according to the collection time, and adaptively selecting the compression algorithm corresponding to the column containing the data based on the data type, the data column is dynamically compressed, thereby improving data compression efficiency and reducing data storage space. The specific data compression method will be described in detail in subsequent embodiments.
[0014] The embodiments involved in this application will now be described with reference to the accompanying drawings.
[0015] Please see Figure 1 , Figure 1 A schematic flowchart of a data compression method according to an embodiment of this application is shown. In a specific embodiment, this data compression method can be applied to, for example... Figure 5 The data compression device 200 and the electronic device 100 equipped with the data compression device 200 are shown. Figure 6 The following will use an electronic device as an example to illustrate the specific process of this embodiment. Of course, it is understood that the electronic device used in this embodiment may include vehicles, in-vehicle terminals, computers, etc., and is not limited thereto. The following will focus on... Figure 1 The process shown will be described in detail. The data compression method may specifically include the following steps: Step S110: Obtain the raw acquisition data to be compressed and the acquisition time corresponding to the raw acquisition data.
[0016] In some implementations, the electronic device may have a pre-configured configuration file. This configuration file may include a mapping relationship between signal names and IDs, a file generation cycle for data blocks (e.g., 30 seconds), a data upload cycle (e.g., 3 minutes), and compression algorithms corresponding to different data types. This configuration file can be obtained by the electronic device from a associated cloud or other electronic device, or it can be automatically generated by the electronic device based on user-input orchestration instructions. Signal names include, but are not limited to, temperature, vehicle speed, battery level, gear position, and fault codes. The file generation cycle can be set by the user or obtained from third-party experimental data; no limitation is imposed here. For example, file generation cycles may be 30s, 25s, or 35s. Similarly, the data upload cycle can be set by the user or obtained from third-party experimental data; no limitation is imposed here. For example, data upload cycles may be 3 minutes, 2.5 minutes, or 3.5 minutes.
[0017] The electronic device can collect data based on the configuration file to obtain raw data. Specifically, it can collect data corresponding to a signal name based on the mapping between signal names and IDs in the configuration file, and then associate and store the corresponding data with the ID based on the mapping, thus obtaining raw data and reducing data storage requirements. Optionally, the electronic device can also receive a configuration file from the cloud, and based on the mapping between signal names and IDs included in the configuration file, determine the signals to be collected, collect the data corresponding to the signals, and associate and store the corresponding data with the ID based on the mapping, obtaining raw data. This raw data can also be fed back to the cloud, thereby reducing data volume and upload bandwidth.
[0018] In some implementations, considering the different signal sources of the original acquired data, the clocks corresponding to each signal source are synchronized independently, resulting in time deviations in the original acquired data. Therefore, the electronic device can use a unified clock module to add an acquisition time to all the original acquired data. That is, the clocks of signal A from module A and signal B from module B are synchronized to obtain the original acquired data to be compressed and its corresponding acquisition time, avoiding deviations caused by the synchronization of different modules.
[0019] For example, please refer to Figure 2 The illustration shows a schematic diagram of a gateway clock synchronization process provided in one embodiment. In this embodiment, the electronic device can be a vehicle. The vehicle can obtain raw data from the vehicle's bus, and then pass the raw data through a unified module (the gateway forwards the corresponding clock) to add a collection time and continue its flow. Thus, the vehicle's acquisition module obtains the raw data to be compressed and the corresponding collection time.
[0020] Step S120: Based on the original collected data and the collection time corresponding to the original collected data, obtain a file block to be compressed, wherein the file block to be compressed uses a columnar architecture to store data, and the file block to be compressed includes the column where the original collected data is located and the column where the collection time is located.
[0021] In some implementations, after the electronic device obtains the raw acquisition data to be compressed and the acquisition time corresponding to the raw acquisition data, it can write the raw acquisition data and the acquisition time corresponding to the raw acquisition data into the storage unit of the electronic device.
[0022] Among them, electronic devices can perform data block storage, data block compression, compressed file block archiving and recompression, and upload to the cloud or related electronic devices for the raw collected data and the corresponding collection time in the storage unit.
[0023] In some implementations, during the data block storage process, the electronic device can obtain the file block to be compressed based on the original acquired data to be compressed and the corresponding acquisition time. The file block to be compressed can use a columnar architecture to store the data, and may include a column containing the original acquired data and a column containing the acquisition time. The column containing the original acquired data stores the specific value corresponding to the original acquired data, and the column containing the acquisition time stores the acquisition time corresponding to the original acquired data.
[0024] Optionally, the electronic device can divide the original acquired data into blocks based on the acquisition time and preset size, obtaining file blocks of preset size to be compressed. Specifically, the electronic device can form a file block to be compressed based on the chronological order of the acquisition times corresponding to the original acquired data, provided that the amount of original acquired data reaches the preset size.
[0025] The electronic device can create a columnar storage compressed file based on the original acquired data stored in the storage unit and the acquisition time corresponding to the original acquired data. It can also write the original acquired data as file metadata of this columnar storage compressed file into the configuration file information corresponding to the columnar storage compressed file. Furthermore, it can form at least one file block to be compressed based on the file generation cycle corresponding to the configuration file and the first acquisition time corresponding to the original acquired data. For example, if the file generation cycle is 30 seconds, the electronic device can form a file block to be compressed from the original acquired data within the next 30 seconds based on the first acquisition time corresponding to the original acquired data. This file block to be compressed can use a columnar architecture to store data, and it can include the column containing the original acquired data and the column containing the acquisition time.
[0026] Step S130: According to the compression algorithm corresponding to the data type of the original collected data, compress the column containing the original collected data in the file block to be compressed to obtain the compressed file block, wherein the data type includes numeric and non-numeric types.
[0027] In some implementations, after obtaining a file block to be compressed, the electronic device can compress the columns containing the original acquired data within the file block according to the data type of the original acquired data included in the file block, thereby obtaining a compressed file block. The original acquired data can be of numerical or non-numerical data type. Numerical original acquired data may include vehicle speed, temperature, linearly changing physical quantities, etc., while non-numerical original acquired data may include enumeration type, Boolean type, status words, text labels, and event codes, etc. Enumeration type may include vehicle gear position, fault codes, etc., while Boolean type may include yes / no, on / off lights, etc.
[0028] The electronic device can compress the column containing the original acquired data in the file block to be compressed according to the compression algorithm corresponding to the different data types included in the configuration file, and obtain the compressed file block. Thus, the compression algorithm is adaptively selected based on the data type of the signal to perform column-based dynamic compression of the column containing the original acquired data, thereby improving the efficiency of data compression.
[0029] In some implementations, after obtaining the compressed file block, the electronic device can write the configuration file information into the compressed file block. This allows for subsequent decompression and data parsing of the compressed file block based on the configuration file during its subsequent circulation. For example, the electronic device can write the configuration file information into the compressed file block and then upload the compressed file block to the cloud. Accordingly, the cloud can decompress and parse the compressed file block based on the configuration file it carries.
[0030] In some implementations, after obtaining compressed file blocks, the electronic device can archive and recompress the compressed file blocks based on the data upload cycle included in the configuration file, and upload them to the cloud or an associated electronic device connected to the electronic device. Optionally, the electronic device may also have a preset number of compressed file blocks; in this case, after obtaining the preset number of compressed file blocks, the electronic device can archive and recompress the preset number of compressed file blocks and upload them to the cloud or an electronic device connected to the electronic device. This dynamically compresses the data column of the file blocks according to the data type, improving data compression efficiency while reducing data storage space and lowering data upload bandwidth.
[0031] An embodiment of this application provides a data compression method that obtains the original acquired data to be compressed and the corresponding acquisition time; based on the original acquired data and the corresponding acquisition time, obtains a file block to be compressed, wherein the file block to be compressed uses a columnar architecture to store data, and the file block to be compressed includes a column containing the original acquired data and a column containing the acquisition time; according to the compression algorithm corresponding to the data type of the original acquired data, the column containing the original acquired data in the file block to be compressed is compressed to obtain a compressed file block, wherein the data type includes numeric and non-numeric types, thereby dividing the acquired data into blocks for columnar storage according to the acquisition time corresponding to the acquired data, and adaptively selecting the compression algorithm corresponding to the column containing the data according to the data type of the acquired data, dynamically compressing the column containing the data, thereby improving the efficiency of data compression and reducing the data storage space.
[0032] Please see Figure 3 , Figure 3 A schematic flowchart of a data compression method according to an embodiment of this application is shown. This method is applied to the aforementioned electronic device, and will be discussed below. Figure 3 The process shown will be described in detail. The data compression method may specifically include the following steps: Step S210: Obtain the raw acquisition data to be compressed and the acquisition time corresponding to the raw acquisition data.
[0033] For a detailed description of step S210, please refer to the previous description of step S110, which will not be repeated here.
[0034] Step S220: Based on the file generation cycle and the acquisition time corresponding to the original acquisition data, the original acquisition data is divided into blocks to obtain multiple file blocks to be compressed. The file blocks to be compressed use a columnar architecture to store data, and the file blocks to be compressed include the column where the original acquisition data is located and the column where the acquisition time is located.
[0035] In some implementations, the electronic device may have a pre-set file generation cycle. This cycle can be set by the user or obtained from third-party experimental data, and is not limited here. For example, the file generation cycle may be 30s, 25s, 35s, etc. The electronic device can divide the original collected data into blocks based on the file generation cycle and the corresponding acquisition time, obtaining multiple file blocks to be compressed. For example, the electronic device can record the first acquisition time corresponding to the original collected data and use this first acquisition time as the start time. A file block to be compressed is formed based on the original collected data acquired within the file generation cycle starting from this start time. This file block to be compressed can use a columnar architecture to store data, and may include a column containing the original collected data and a column containing the acquisition time.
[0036] In some implementations, the electronic device may also have a pre-set data upload period. This data upload period can be set by the user or obtained from third-party experimental data, and is not limited here. For example, the data upload period may be 3 minutes, 2.5 minutes, 3.5 minutes, etc. The data upload period can be longer than the file generation period. Within this data upload period, the electronic device can divide the original collected data into blocks according to the file generation period and the corresponding collection time, obtaining multiple file blocks to be compressed. These multiple file blocks to be compressed within the data upload period are then archived, packaged, compressed, and uploaded to the cloud or a related electronic device.
[0037] Step S230: Number the other file blocks to be compressed in the plurality of file blocks to be compressed according to the block division order to obtain the file sequence number corresponding to the other file blocks to be compressed, wherein the other file blocks to be compressed are the file blocks to be compressed other than the first file block to be compressed in the plurality of file blocks to be compressed.
[0038] In some implementations, after the electronic device obtains multiple file blocks to be compressed, it can number the other file blocks to be compressed according to their block order to obtain their corresponding file numbers. These other file blocks to be compressed are those excluding the first file block. The first file block to be compressed can be the earliest formed file block among the multiple file blocks, or it can be the earliest formed file block within a data upload cycle.
[0039] For example, an electronic device can number multiple file blocks to be compressed within a data upload cycle, excluding the first file block to be compressed, according to the order in which the file blocks to be compressed are formed, and obtain the file sequence number corresponding to each of the multiple file blocks to be compressed within the data upload cycle, excluding the first file block to be compressed, so as to quickly filter the file blocks to be compressed and the original collected data based on the file sequence number.
[0040] Step S240: Based on the first file time corresponding to the first file block to be compressed and the file sequence number corresponding to the other file blocks to be compressed, store the multiple file blocks to be compressed in a columnar manner, wherein the first file time is the acquisition time of the first data collected in the first file block to be compressed.
[0041] In some implementations, during the process of obtaining the first file block among multiple file blocks to be compressed, the electronic device can obtain the first file time corresponding to the first file block to be compressed. The first file time can be the acquisition time of the earliest acquired raw data included in the first file block to be compressed.
[0042] In some implementations, the electronic device can, after obtaining the first file time corresponding to the first file block to be compressed and the file sequence number corresponding to the other file blocks to be compressed, store the multiple file blocks to be compressed in a columnar manner according to the first file time corresponding to the first file block to be compressed and the file sequence number corresponding to the other file blocks to be compressed.
[0043] For example, please refer to Figure 4This illustration shows a schematic diagram of columnar storage of multiple file blocks to be compressed according to an embodiment of this application. The data upload cycle can be 3 minutes, and the file generation cycle can be 30 seconds. The electronic device can generate one file block to be compressed every 30 seconds within the data upload cycle, obtaining multiple file blocks to be compressed corresponding to that data upload cycle. The electronic device can obtain the acquisition time of the earliest collected raw data in the data upload cycle as the first file time (e.g., 1757658075000), and can write this first file time into the filename of the first file block to be compressed among the multiple file blocks to be compressed included in the data upload cycle. The electronic device can number the other file blocks to be compressed according to the block division order, obtaining the file sequence numbers corresponding to the other file blocks to be compressed (e.g., file_1, file_2, ...). Based on the first file time corresponding to the first file block to be compressed and the file sequence numbers corresponding to the other file blocks to be compressed, the multiple file blocks to be compressed can be stored columnarly for subsequent archiving, recompression, and uploading to the cloud or associated electronic devices. This creates a columnar storage compressed file, records the time of the first file and the file name written, and then forms a file block every 30 seconds without recording the time, only recording the file block number, thus reducing the redundancy of data storage.
[0044] Step S250: Based on the acquisition time corresponding to the original acquisition data and the difference between the acquisition time corresponding to the original acquisition data and the first acquisition time, compress the column containing the acquisition time in the file block to be compressed, wherein the first acquisition time is the first acquisition time corresponding to the original acquisition data.
[0045] In some implementations, the file block to be compressed obtained by the electronic device includes a column for the acquisition time. This column can be used to record the acquisition time corresponding to the original acquisition data included in the file block to be compressed. Specifically, during the creation of the column for the acquisition time corresponding to the file block to be compressed, the electronic device can compress the column based on the acquisition time corresponding to the original acquisition data included in the file block, and the difference between the acquisition time of the original acquisition data and the first acquisition time. The first acquisition time is the initial acquisition time corresponding to the original acquisition data. The first acquisition time can be the same as the initial file time.
[0046] The electronic device can convert the acquisition time corresponding to the original acquisition data included in the file block to be compressed into the difference between the acquisition time corresponding to the original acquisition data and the first acquisition time, and store it in the acquisition time column of the file block to be compressed, thereby compressing the time column data corresponding to the file block to be compressed and reducing the data storage space. For example, please refer to Table 1, which shows the column of the original acquisition data corresponding to the compressed file block provided in an embodiment of this application. Among them, the difference between signal A and the first acquisition time (signal A differential time) at different acquisition times is 10, 20, and 30, respectively; the difference between signal 8 and the first acquisition time at different acquisition times (signal B differential time) is 18, 28, and 38, respectively. Among them, the acquisition time column only stores the first acquisition time and the difference between the acquisition time and the first acquisition time.
[0047] Table 1 In some implementations, the electronic device can also perform second-order differential encoding on the difference between the acquisition time corresponding to the original acquisition data included in the file block to be compressed and the first acquisition time, and store the difference in the acquisition time column in the file block to be compressed to obtain the compressed acquisition time column. The acquisition time column can be bit-packed and compressed and written into the file block to be compressed so that the compressed file block only stores the difference between the acquisition time of the original acquisition data and the first acquisition time, thereby reducing the amount of data stored.
[0048] In this process, it's understandable that the electronic device can divide and store the raw collected data into blocks. Each file generation cycle can create a block of files to be compressed, and multiple blocks of files to be compressed within each data upload cycle can be archived and recompressed. Within a single data upload cycle, the electronic device can record only the first file's time and write the filename of the first block of files to be compressed. Subsequent blocks do not need their collection times recorded; only the block number needs to be recorded, saving data storage space. For the time column data corresponding to the blocks of files to be compressed, the electronic device can use the difference between the collection time of the raw collected data included in the block and the time of the first file as the stored value, writing it into the column containing the collection time of the block. This time column data can then be compressed using second-order differential encoding and bit-packing before being written into the block of files to be compressed. This method records the first file's time, using the relative time between the original data acquisition time and the first file's time as the stored value. Subsequent files to be compressed, following the first file block, are associated with their times using inter-block sequential numbering. By combining the first file's time with the relative time chain, the data storage size is reduced. Furthermore, second-order differential encoding of the relative time chain further reduces the data storage space. Additionally, the column indicating the acquisition time of each file block and the file sequence number can be used to quickly filter compressed file blocks and data, improving data retrieval efficiency.
[0049] Step S260: If the data type of the original collected data is numerical, then according to the difference storage algorithm, compress the column containing the original collected data in the file block to be compressed to obtain the compressed file block.
[0050] In some implementations, after obtaining the file block to be compressed, the electronic device can compress the original acquired data corresponding to the file block to be compressed according to the data type of the original acquired data included in the file block to obtain the compressed file block.
[0051] The electronic device can be pre-set with compression algorithms corresponding to different data types. Based on this, the electronic device can adaptively select the compression algorithm according to the data type of the original collected data, and dynamically compress the column containing the original collected data, thereby improving the efficiency of data compression.
[0052] As a feasible approach, the compression algorithm corresponding to numerical data is the differential storage algorithm. Specifically, if the original data collected in the file block to be compressed is numerical, the electronic device can use the differential storage algorithm to compress the columns containing the original data in that file block to obtain the compressed file block. Numerical data can include slowly changing signals such as vehicle speed and temperature.
[0053] In some implementations, during the process of compressing the column containing the original collected data based on the difference storage algorithm, the electronic device can obtain the first collection value corresponding to the original collected data in the file block to be compressed, and can compress the column containing the original collected data in the file block to be compressed based on the first collection value and the difference between the non-first collection value and the first collection value of the original collected data in the file block to be compressed, thereby obtaining the compressed file block.
[0054] The electronic device can convert the non-first acquisition value of the original acquisition data into the difference between the non-first acquisition value and the first acquisition value, and store it in the column of the original acquisition data in the file block to be compressed, thus obtaining the compressed file block.
[0055] In some implementations, if the original acquired data is numerical, the electronic device can further compress consecutive identical values within the column containing the original acquired data using a run-length encoding compression algorithm to obtain a compressed file block. These consecutive identical values within the column can be either the values in the original acquired data before the electronic device applies the difference storage algorithm for compression, or the values in the column after the electronic device applies the difference storage algorithm for compression. Thus, by performing difference storage compression on the numerical original acquired data and then combining it with run-length encoding to compress consecutive identical values, the data storage space is reduced.
[0056] Step S270: If the data type of the original collected data is non-numerical, then according to the short code mapping algorithm, compress the column containing the original collected data in the file block to be compressed to obtain the compressed file block.
[0057] In some implementations, the electronic device may have a pre-configured compression algorithm for non-numerical data types, which is a short code mapping algorithm. Based on this, during the compression process of the original collected data column containing the file block to be compressed, the electronic device can compress non-numerical data according to the short code mapping algorithm to obtain the compressed file block.
[0058] As a feasible approach, during the compression of non-numerical data using a short code mapping algorithm, electronic devices can convert data in the column containing the original collected data within the file block to be compressed into short codes based on a preset mapping relationship, thus obtaining the compressed file block. The preset mapping relationship can include a mapping between data and short codes, which can include a one-to-one relationship or a multi-player relationship, without limitation. This converts non-numerical data into short codes, improving data compression efficiency and reducing data storage space.
[0059] For example, the data compression method provided in this application can be applied to vehicle-mounted edge computing scenarios. The electronic device can be a vehicle; developers can set a configuration file for data collection by the vehicle on the vehicle itself or in the cloud communicating with the vehicle. Based on this, the vehicle can have a pre-set configuration file or obtain a configuration file distributed from the cloud. The configuration file can include a mapping relationship between signal name matrices and IDs, file generation period (e.g., 30 seconds), data upload period (e.g., 3 minutes), and compression algorithms corresponding to different signal data types.
[0060] The vehicle can collect all data based on the configuration file, add a collection time stamp to a unified module, obtain the raw collected data to be compressed along with its corresponding collection time, and continue processing. Alternatively, the vehicle can collect data based on a configuration file distributed from the cloud, add a collection time stamp to the unified module, obtain the raw collected data to be compressed along with its corresponding collection time, and continue processing. The vehicle can write the obtained raw collected data to be compressed along with its corresponding collection time into the vehicle's storage module. The vehicle can create columnar compressed files based on the data stored in the vehicle's storage module; for example, file metadata is written to the configuration file information, the first file's time is recorded, and the filename is written. Then, every 30 seconds (the file generation cycle), a file block to be compressed is formed without recording the time, only the file sequence number. This generates data file blocks at a fixed period (e.g., 30 seconds), recording a complete timestamp only in the first file block, with subsequent file blocks using a relative time base mechanism.
[0061] The vehicle can process and compress data within each block of files to be compressed. Specifically, it can process and compress data within a 30-second timeframe. The vehicle can use the relative time calculated from the acquisition time of the original data and the first file time as the storage value, writing it into the acquisition time column. It can then use second-order differential encoding and bit-packing compression on this time column data and write it into the compressed file block. This allows for the storage of vehicle signal data in blocks (e.g., one block to be compressed every 30 seconds), recording the first file time, using the relative time between the original data acquisition time and the first file time as the storage value of the acquisition time column, and associating the time of the file blocks to be compressed using inter-block sequential numbering. By using the first file time and the relative time chain, the data storage size is reduced. Furthermore, second-order differential encoding of the relative time chain improves compression efficiency.
[0062] The vehicle can also compress the data in the column containing the original acquired data corresponding to the data type of the file block to be compressed, according to the compression algorithm corresponding to the data type, to obtain the compressed file block. Thus, the data adopts a columnar storage architecture, with the time column using second-order difference, and other signal columns dynamically selecting compression algorithms based on data type, improving data compression efficiency. Specifically, for numerical data (such as slowly changing signals like vehicle speed and temperature), the vehicle can store the initial acquired value of the file block to be compressed, as well as the difference between the initial and non-initial acquired values, in the column containing the original acquired data. It can also combine run-length encoding to compress consecutive identical values. For non-numerical data (such as gear positions, fault codes, and other enumerated or Boolean types), the vehicle can map the data to short codes for storage.
[0063] The vehicle can periodically archive and recompress compressed file blocks within a data upload cycle before uploading them to the cloud; it can also periodically archive and recompress compressed file blocks within a data upload cycle in response to data upload commands issued by the cloud, and it can stop uploading data to the cloud in response to data stop upload commands issued by the cloud. For example, every 3 minutes, compressed file blocks are archived, recompressed, and uploaded to the cloud. This improves compression efficiency, reduces the data packets transmitted by the vehicle, and lowers the bandwidth for data upload. The cloud can receive the data sent by the vehicle and perform decompression, parsing, and display processing, reducing data storage space while ensuring efficient data retrieval.
[0064] The data compression method provided in one embodiment of this application is compared to... Figure 1 The data compression method shown can also be used to compress the column containing the original collected data in the file block to be compressed according to the difference storage algorithm if the original collected data is of numerical type. This results in a compressed file block, which reduces the amount of data stored and the storage space required, thereby improving the efficiency of data compression.
[0065] Meanwhile, if the data type of the original collected data is non-numerical, this embodiment can also compress the column containing the original collected data in the file block to be compressed according to the short code mapping algorithm to obtain the compressed file block, thereby performing short code storage for non-numerical data, reducing the amount of data stored and the space for data storage, and improving the efficiency of data compression.
[0066] In addition, this embodiment can also divide the original collected data into blocks according to the file generation cycle and the collection time corresponding to the original collected data to obtain multiple file blocks to be compressed; the other file blocks to be compressed in the multiple file blocks to be compressed are numbered according to the block division order to obtain the file sequence number corresponding to the other file blocks to be compressed, wherein the other file blocks to be compressed are the file blocks to be compressed other than the first file block to be compressed; according to the first file time corresponding to the first file block to be compressed and the file sequence number corresponding to the other file blocks to be compressed, the multiple file blocks to be compressed are stored in a columnar manner, wherein the first file time is the collection time of the first collected data in the first file block to be compressed, so that the file blocks are associated with time by inter-block sequential numbering, which reduces storage space while ensuring the efficiency of data query.
[0067] Meanwhile, this embodiment can also compress the column containing the acquisition time in the file block to be compressed based on the acquisition time corresponding to the original acquisition data and the difference between the acquisition time corresponding to the original acquisition data and the first acquisition time. The first acquisition time is the first acquisition time corresponding to the original acquisition data. Thus, the acquisition time is stored by the first acquisition time plus the relative time chain, which reduces the size of the data storage and improves the compression efficiency.
[0068] Please see Figure 5 , Figure 5 A block diagram of a data compression apparatus 200 according to an embodiment of this application is shown. This data compression apparatus 200 is applied to the aforementioned electronic device, and will be discussed below. Figure 5 The process shown is described in detail. The data compression device 200 may include: a data acquisition module 210, a data segmentation module 220, and a data column compression module 230, wherein: The data acquisition module 210 is used to acquire the raw acquisition data to be compressed and the acquisition time corresponding to the raw acquisition data.
[0069] The data block module 220 is used to obtain a file block to be compressed based on the original collected data and the collection time corresponding to the original collected data. The file block to be compressed uses a columnar architecture to store data, and the file block to be compressed includes the column where the original collected data is located and the column where the collection time is located.
[0070] The data column compression module 230 is used to compress the column containing the original collected data in the file block to be compressed according to the compression algorithm corresponding to the data type of the original collected data, so as to obtain the compressed file block. The data type includes numeric and non-numeric types.
[0071] Furthermore, the data column compression module 230 may include: a first data column compression unit and a second data column compression unit, wherein: The first unit for compressing the column containing the data is used to compress the column containing the original collected data in the file block to be compressed according to the difference storage algorithm if the data type of the original collected data is numerical, so as to obtain the compressed file block.
[0072] The second unit for compressing the column containing the data is used to compress the column containing the original collected data in the file block to be compressed according to the short code mapping algorithm if the data type of the original collected data is non-numerical, so as to obtain the compressed file block.
[0073] Furthermore, the first unit for compressing the column containing the data may include: an initial acquisition value unit and a difference storage unit, wherein: The initial acquisition value acquisition unit is used to acquire the initial acquisition value corresponding to the original acquisition data in the file block to be compressed.
[0074] The difference storage unit is used to compress the column containing the original collected data in the file block to be compressed based on the initial acquisition value and the difference between the non-initial acquisition value and the initial acquisition value of the original acquired data in the file block to be compressed, thereby obtaining the compressed file block.
[0075] Furthermore, if the data type of the original collected data is numerical, the data column compression module 230 may include: a third unit for compressing the data column, wherein: The third unit for compressing the column containing the data is used to compress the consecutive identical values in the column containing the original collected data based on the run-length encoding compression algorithm, if the column containing the original collected data contains consecutive identical values, to obtain the compressed file block.
[0076] Furthermore, the second unit for compressing the column containing the data may include: a short code mapping unit, wherein: A short code mapping unit is used to convert the data in the column containing the original collected data in the file block to be compressed into a short code according to a preset mapping relationship, thereby obtaining the compressed file block. The preset mapping relationship includes a mapping relationship between data and short codes.
[0077] Furthermore, the data segmentation module 220 may include: a file block acquisition unit, a file block numbering unit, and a file block columnar storage unit, wherein: The file block acquisition unit is used to divide the original acquired data into blocks according to the file generation cycle and the acquisition time corresponding to the original acquired data, and obtain multiple file blocks to be compressed.
[0078] The file block numbering unit is used to number other file blocks to be compressed in the plurality of file blocks to be compressed according to the block division order, so as to obtain the file sequence number corresponding to the other file blocks to be compressed, wherein the other file blocks to be compressed are the file blocks to be compressed other than the first file block to be compressed in the plurality of file blocks to be compressed.
[0079] The file block columnar storage unit is used to store multiple file blocks to be compressed in a columnar manner according to the first file time corresponding to the first file block to be compressed and the file sequence number corresponding to the other file blocks to be compressed, wherein the first file time is the acquisition time of the first data collected in the first file block to be compressed.
[0080] Furthermore, the data compression device 200 may further include: a compression unit for the column containing the acquisition time, wherein: The compression unit for the column containing the acquisition time is used to compress the column containing the acquisition time in the file block to be compressed based on the acquisition time corresponding to the original acquisition data and the difference between the acquisition time corresponding to the original acquisition data and the first acquisition time, wherein the first acquisition time is the first acquisition time corresponding to the original acquisition data.
[0081] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the above-described device and module can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0082] In the several embodiments provided in this application, the coupling between modules can be electrical, mechanical, or other forms of coupling.
[0083] Furthermore, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module. The integrated modules described above can be implemented in hardware or as software functional modules.
[0084] Please see Figure 6 This document illustrates a structural block diagram of an electronic device according to an embodiment of this application. The electronic device 100 can be a vehicle, in-vehicle terminal, server, computer, or other device with processing capabilities. The electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs. The one or more application programs may be stored in the memory 120 and configured to be executed by one or more processors 110. The one or more programs are configured to perform the methods described in the foregoing method embodiments.
[0085] The processor 110 may include one or more processing cores. The processor 110 connects to various parts of the vehicle 100 via various interfaces and lines, and performs various functions and processes data of the vehicle 100 by running or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and by calling data stored in the memory 120. Optionally, the processor 110 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 110 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content to be displayed; and the modem handles wireless communication. It is understood that the modem may also not be integrated into the processor 110 and may be implemented separately through a communication chip.
[0086] The memory 120 may include random access memory (RAM) or read-only memory (ROM). The memory 120 can be used to store instructions, programs, code, code sets, or instruction sets. The memory 120 may include a program storage area and a data storage area. The program storage area may store instructions for implementing an operating system, instructions for implementing at least one function (such as touch functionality, sound playback functionality, image playback functionality, etc.), and instructions for implementing the various method embodiments described below. The data storage area may also store data created by the electronic device 100 during use (such as phonebook data, audio and video data, chat log data, etc.).
[0087] In this embodiment, a computer-readable medium stores program code, which can be called by a processor to execute the methods described in the above method embodiments.
[0088] Computer-readable storage media can be electronic storage devices such as flash memory, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM, hard disk, or ROM. Optionally, computer-readable storage media includes non-transitory computer-readable storage medium. The computer-readable storage medium has storage space for program code that performs any of the method steps described above. This program code can be read from or written to one or more computer program products. The program code can be compressed, for example, in a suitable form.
[0089] In this application, "multiple" refers to two or more.
[0090] In this application, unless otherwise expressly defined, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection between two components. Those skilled in the art can understand the specific meaning of the above terms in this application based on the specific circumstances.
[0091] The terms “first,” “second,” “third,” “fourth,” etc., in this application (if present) are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
[0092] In this application, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, in this application, the character " / " generally indicates that the preceding and following related objects have an "or" relationship.
[0093] Unless otherwise specified, all steps in this application may be performed sequentially or randomly. For example, if the method includes steps A and B, it means that the method may include steps A and B performed sequentially, or it may include steps B and A performed sequentially. For example, if the method may also include step C, it means that step C may be added to the method in any order. For example, the method may include steps A, B, and C, or it may include steps A, C, and B, or it may include steps C, A, and B, etc.
[0094] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A data compression method characterized by, The method includes: Obtain the raw acquisition data to be compressed and the acquisition time corresponding to the raw acquisition data; Based on the original collected data and the collection time corresponding to the original collected data, a file block to be compressed is obtained. The file block to be compressed uses a columnar architecture to store data, and the file block to be compressed includes the column where the original collected data is located and the column where the collection time is located. According to the compression algorithm corresponding to the data type of the original collected data, the column containing the original collected data in the file block to be compressed is compressed to obtain the compressed file block, wherein the data type includes numeric and non-numeric types.
2. The method of claim 1, wherein, The step of compressing the column containing the original collected data in the file block to be compressed, according to the compression algorithm corresponding to the data type of the original collected data, to obtain the compressed file block includes: If the original collected data is of numerical type, then according to the difference storage algorithm, the column containing the original collected data in the file block to be compressed is compressed to obtain the compressed file block; or If the data type of the original collected data is non-numerical, then according to the short code mapping algorithm, the column containing the original collected data in the file block to be compressed is compressed to obtain the compressed file block.
3. The method of claim 2, wherein, The step of compressing the column containing the original collected data in the file block to be compressed according to the difference storage algorithm to obtain the compressed file block includes: Obtain the initial acquisition value corresponding to the original acquisition data in the file block to be compressed; Based on the initial acquisition value and the difference between the non-initial acquisition value and the initial acquisition value of the original acquisition data in the file block to be compressed, the column containing the original acquisition data in the file block to be compressed is compressed to obtain the compressed file block.
4. The method of claim 3, wherein, If the original collected data is of numeric type, the step of compressing the column containing the original collected data in the file block to be compressed according to the compression algorithm corresponding to the data type of the original collected data to obtain the compressed file block includes: If the column containing the original collected data contains consecutive identical values, then the consecutive identical values in the column containing the original collected data are compressed based on the run-length encoding compression algorithm to obtain the compressed file block.
5. The method of claim 2, wherein, The step of compressing the column containing the original collected data in the file block to be compressed according to the short code mapping algorithm to obtain the compressed file block includes: According to a preset mapping relationship, the data in the column containing the original collected data in the file block to be compressed is converted into short codes to obtain the compressed file block. The preset mapping relationship includes a mapping relationship between data and short codes.
6. The method according to any one of claims 1 to 5, characterized in that, The step of obtaining the file block to be compressed based on the original collected data and the corresponding collection time includes: Based on the file generation cycle and the collection time corresponding to the original collected data, the original collected data is divided into blocks to obtain multiple file blocks to be compressed. The other file blocks to be compressed in the plurality of file blocks to be compressed are numbered according to the block division order to obtain the file sequence number corresponding to the other file blocks to be compressed, wherein the other file blocks to be compressed are the file blocks to be compressed other than the first file block to be compressed in the plurality of file blocks to be compressed; Based on the first file time corresponding to the first file block to be compressed and the file sequence number corresponding to the other file blocks to be compressed, multiple file blocks to be compressed are stored in a columnar manner, wherein the first file time is the acquisition time of the first data collected in the first file block to be compressed.
7. The method according to any one of claims 1 to 5, characterized in that, The method further includes: Based on the acquisition time corresponding to the original acquisition data and the difference between the acquisition time corresponding to the original acquisition data and the first acquisition time, the column containing the acquisition time in the file block to be compressed is compressed, wherein the first acquisition time is the first acquisition time corresponding to the original acquisition data.
8. A data compression device, characterized by, The device includes: The data acquisition module is used to acquire the raw acquisition data to be compressed and the acquisition time corresponding to the raw acquisition data; The data segmentation module is used to obtain a file block to be compressed based on the original collected data and the collection time corresponding to the original collected data. The file block to be compressed uses a columnar architecture to store data, and the file block to be compressed includes the column where the original collected data is located and the column where the collection time is located. The data column compression module is used to compress the column containing the original collected data in the file block to be compressed according to the compression algorithm corresponding to the data type of the original collected data, so as to obtain the compressed file block. The data type includes numeric and non-numeric types.
9. An electronic device, comprising: include: One or more processors; Memory; One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications being configured to perform the method as described in any one of claims 1-7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium contains program code that can be invoked by a processor to execute the method as described in any one of claims 1-7.