Methods, apparatus, devices, and readable media for data compression and decompression.
By using a language model-based data compression and decompression method and leveraging prompt words to generate vectorized representations, the problem of insufficient data processing efficiency and quality in existing technologies is solved, achieving fast and efficient data compression and decompression.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING YOUZHUJU NETWORK TECH CO LTD
- Filing Date
- 2024-02-01
- Publication Date
- 2026-06-30
AI Technical Summary
Existing data compression and decompression technologies are inadequate in terms of efficiency and quality, making it difficult to achieve fast and high-quality data processing.
A language model-based approach is adopted to generate and parse vectorized representations of data by generating and using prompt words, and to compress and decompress data using a target model constructed from the language model.
It enables fast and high-quality data compression and decompression, improving data storage and transmission efficiency.
Smart Images

Figure CN117955502B_ABST
Abstract
Description
Technical Field
[0001] The exemplary embodiments disclosed herein generally relate to the field of computers, and particularly to methods, apparatus, devices, and computer-readable storage media for data compression and decompression. Background Technology
[0002] With the rapid development of information technology, data compression technology (also known as information compression technology) has permeated all aspects of life and industry, bringing numerous conveniences to people. Data compression technology is a technique that improves storage efficiency and transmission speed by reducing data size. Data compression and decompression have important applications in many scenarios. Summary of the Invention
[0003] In a first aspect of this disclosure, a method for data compression is provided. The method includes: generating a first input sequence for a first target model based on a first cue word and target data to be compressed, the first target model being constructed based on a language model, the first cue word instructing the first target model to perform a data compression task; obtaining a first output sequence of the first target model by providing the first input sequence to the first target model; and extracting a compressed representation of the target data from the first output sequence, the compressed representation being a vectorized representation of the target data.
[0004] In a second aspect of this disclosure, a method for data decompression is provided. The method includes: obtaining a compressed representation of target data, the compressed representation being a vectorized representation of the target data; generating a second input sequence for a second target model based on a second prompt word and the compressed representation, the second target model being constructed based on a language model, the second prompt word instructing the second target model to perform a data decompression task; obtaining a second output sequence of the data decompression model by providing the second input sequence to the second target model; and determining the decompressed target data from the second output sequence.
[0005] In a third aspect of this disclosure, an apparatus for data compression is provided. The apparatus includes: a first input generation module configured to generate a first input sequence for a first target model based on a first prompt word and target data to be compressed, the first target model being constructed based on a language model, the first prompt word instructing the first target model to perform a data compression task; a first output acquisition module configured to obtain a first output sequence of the first target model by providing the first input sequence to the first target model; and a compressed representation extraction module configured to extract a compressed representation of the target data from the first output sequence, the compressed representation being a vectorized representation of the target data.
[0006] In a fourth aspect of this disclosure, an apparatus for data decompression is provided. The apparatus includes: a compression representation acquisition module configured to acquire a compressed representation of target data, the compressed representation being a vectorized representation of the target data; a second input generation module configured to generate a second input sequence for a second target model based on a second prompt word and the compressed representation, the second target model being constructed based on a language model, the second prompt word instructing the second target model to perform a data decompression task; a second output acquisition module configured to obtain a second output sequence of the data decompression model by providing the second input sequence to the second target model; and a target data determination module configured to determine the decompressed target data from the second output sequence.
[0007] In a fifth aspect of this disclosure, an electronic device is provided. The device includes at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit. When executed by the at least one processing unit, the instructions cause the device to perform either the method of the first aspect or the method of the second aspect.
[0008] In a sixth aspect of this disclosure, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program that can be executed by a processor to implement the method of the first aspect or the method of the second aspect.
[0009] It should be understood that the description in the Summary of the Invention section is not intended to limit the key or essential features of the embodiments of this disclosure, nor is it intended to restrict the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0010] The above and other features, advantages, and aspects of various implementations of this disclosure will become more apparent in the following detailed description, taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numerals denote the same or similar elements, wherein:
[0011] Figure 1 A schematic diagram of an example environment in which embodiments of the present disclosure can be implemented is shown;
[0012] Figure 2 A schematic diagram of an example framework for data compression and decompression according to some embodiments of the present disclosure is shown;
[0013] Figure 3 A schematic diagram of an example framework for data compression and decompression according to other embodiments of the present disclosure is shown;
[0014] Figure 4 A flowchart of a process for data compression according to some embodiments of the present disclosure is shown;
[0015] Figure 5 A flowchart of a data decompression process according to some embodiments of the present disclosure is shown;
[0016] Figure 6 A schematic structural block diagram of an apparatus for data compression according to some embodiments of the present disclosure is shown;
[0017] Figure 7 A schematic structural block diagram of an apparatus for data decompression according to some embodiments of the present disclosure is shown; and
[0018] Figure 8 A block diagram of an electronic device in which one or more embodiments of the present disclosure may be implemented is shown. Detailed Implementation
[0019] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0020] In the description of embodiments of this disclosure, the term "comprising" and similar terms should be understood as open-ended inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions may also be included below.
[0021] In this document, unless explicitly stated otherwise, performing a step in response to A does not mean that the step is performed immediately after A, but may include one or more intermediate steps.
[0022] It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition, use, storage or deletion of the data) shall comply with the requirements of relevant laws, regulations and related provisions.
[0023] It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, relevant users should be informed of the type, scope of use, and usage scenarios of the information involved in this disclosure through appropriate means in accordance with relevant laws and regulations, and authorization should be obtained from the relevant users. Among them, relevant users may include any type of rights holder, such as individuals, enterprises, and groups.
[0024] For example, in response to receiving an active request from a user, a prompt message is sent to the relevant user to clearly inform the user that the requested operation will require obtaining and using the user's information, thereby enabling the relevant user to choose whether to provide information to the software or hardware such as the electronic device, application, server, or storage medium that performs the operation of the technical solution disclosed herein based on the prompt message.
[0025] As an optional but non-restrictive implementation, in response to a user's active request, a prompt message can be sent to the user, such as a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide information to the electronic device.
[0026] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure. The activation of digital assistant-related functions, the acquisition of data, the processing and storage of data, etc., in the embodiments of this disclosure shall all require prior authorization from the user and other rights holders associated with the user, and shall comply with the agreements and rules between relevant laws and regulations and rights holders.
[0027] As used in this paper, the term "model" refers to a model that learns the relationship between inputs and outputs from training data, enabling it to generate corresponding outputs for a given input after training. Model generation can be based on machine learning techniques. Deep learning is a machine learning algorithm that processes inputs and provides corresponding outputs using multiple layers of processing units. A neural network model is an example of a deep learning-based model. In this paper, "model" may also be referred to as a "machine learning model," "learning model," "machine learning network," or "learning network," and these terms are used interchangeably.
[0028] Figure 1 A schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented is shown. Environment 100 relates to a compression device 110 and a decompression device 120.
[0029] In environment 100, during the data compression stage, target data 102 to be compressed is provided to compression device 110. Compression device 110 can use any appropriate data compression method to perform a data compression task on target data 102 to generate compressed data 112. During the data decompression stage, compressed data 112 is provided to decompression device 120. Decompression device 120 can also use any appropriate data decompression method to perform a data decompression task on compressed data 112 to generate decompressed data 122. Decompressed data 122 is the data obtained after target data 102 has undergone data compression and data decompression. The difference between decompressed data 122 and target data 102 should be less than a difference threshold. Ideally, decompressed data 122 should be identical to target data 102 or the error should be within an acceptable range.
[0030] In some applications, compression device 110 and decompression device 120 can be different devices. For example, compression device 110 can be used to generate compressed data 112 and send the compressed data 112 to decompression device 120. Decompression device 120 is used to receive compressed data 112 and perform data decompression on the compressed data 112 to generate decompressed data 122. In some applications, compression device 110 and decompression device 120 can also be the same device. In this case, the device can, for example, perform data decompression on compressed data 112 stored locally.
[0031] Both compression device 110 and decompression device 120 can be any type of computing-capable device, including terminal devices or server devices. Terminal devices can be any type of mobile terminal, fixed terminal, or portable terminal, including mobile phones, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, media computers, multimedia tablets, personal communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio / video players, digital cameras / camcorders, positioning devices, television receivers, radio receivers, e-book devices, gaming devices, or any combination of the foregoing, including accessories and peripherals of these devices or any combination thereof. Server devices may include, for example, computing systems / servers, such as mainframes, edge computing nodes, computing devices in cloud environments, and so on.
[0032] It should be understood that the structure and function of environment 100 are described for illustrative purposes only and do not imply any limitation on the scope of this disclosure.
[0033] As mentioned above, data compression technology is widely used. Information compression technology is divided into two categories: lossy compression and lossless compression. Lossless compression (such as ZIP) can completely restore the original data and is often used in fields where data integrity needs to be maintained (such as text file compression). Lossy compression (such as JPEG) removes some data to achieve a higher compression ratio and can be used for images and audio / video. Due to the limitations of human perception, the parts removed in lossy compression generally do not significantly affect the perceived effect. The core concept of information compression is to remove redundancy from data, which can greatly enhance the utilization efficiency of storage and transmission resources. Traditionally, data compression algorithms are usually used to achieve data compression. Data compression algorithms can include, for example, Huffman coding, dictionary coding, transformation coding, quantization, etc.
[0034] With the rapid development of artificial intelligence, the application of language models (LMs) is becoming increasingly widespread. Language models are important models in natural language processing scenarios. Traditionally, language models are used to implement dialogue systems between humans and machines, and between machines themselves. There is a growing expectation to combine language models with data compression techniques, that is, to leverage language models to achieve data compression and decompression.
[0035] In view of this, embodiments of the present disclosure provide an improved method for data compression and decompression. A method for data compression includes: generating a first input sequence for a first target model based on a first prompt word and target data to be compressed, the first target model being constructed based on a language model, the first prompt word instructing the first target model to perform a data compression task; obtaining a first output sequence of the first target model by providing the first input sequence to the first target model; and extracting a compressed representation of the target data from the first output sequence, the compressed representation being a vectorized representation of the target data.
[0036] A data decompression method includes: obtaining a compressed representation of target data, wherein the compressed representation is a vectorized representation of the target data; generating a second input sequence for a second target model based on a second prompt word and the compressed representation, wherein the second target model is constructed based on a language model, and the second prompt word instructs the second target model to perform a data decompression task; obtaining a second output sequence of the data decompression model by providing the second input sequence to the second target model; and determining the decompressed target data from the second output sequence.
[0037] In this way, data compression and decompression can be achieved by inputting corresponding prompt words into a language model. This allows for convenient and fast data compression and decompression while improving the quality of the compression and decompression processes.
[0038] The following description will continue with reference to the accompanying drawings, which will provide some exemplary embodiments of this disclosure.
[0039] In embodiments of this disclosure, the compression device generates a first input sequence for a first target model based on a first prompt word and target data to be compressed. The compression device obtains a first output sequence of the first target model by providing the first input sequence. The compression device extracts a compressed representation of the target data from the first output sequence; this compressed representation is a vectorized representation of the target data. The first target model can be a model built based on a language model. This language model can include Model-Based Language (LM), Large Scale Language Model (LLM), etc. The target data to be compressed can be data of any modality. For example, the target data can include any modality of data such as text, image, video, or speech.
[0040] The following is for reference. Figure 2 The following is an example description using target data that includes text modal data. Figure 2 A schematic diagram of an example framework 200 for data compression and decompression according to some embodiments of the present disclosure is shown. The example framework 200 includes a compression device 110 and a decompression device 120.
[0041] In some embodiments of this disclosure, both the compression device 110 and the decompression device 120 can utilize models to perform data compression and decompression. The model can be a model installed locally on the compression device 110 or the decompression device 120, or it can be a model installed on another device (e.g., on a remote device). The models used by the compression device 110 and the decompression device 120 can be of the same type but with different functions (e.g., both being CNNs), or they can be models of different types.
[0042] like Figure 2 As shown, compression device 110 can perform data compression tasks using the first target model 210, and decompression device 120 can perform data decompression tasks using the second target model 220. Both the first target model 210 and the second target model 220 can be models constructed based on language models.
[0043] like Figure 2 As shown, the compression device 110 can acquire the target data to be compressed (e.g., text 201) and a first prompt word. The first prompt word could be, for example, "Please represent the preceding text as a feature vector: [emb_token]". Based on the first prompt word and the target data to be compressed, the compression device 110 can generate a first input sequence 202 for the first target model 210. The first input sequence 202 is provided to the first target model 210.
[0044] The first prompt word instructs the first target model to perform the data compression task. For example, the first prompt word could be "Please represent the preceding data as a feature vector." In some embodiments, the first prompt word can also indicate the type of target data and / or the modality of the target data. For example, if the target data is text-modal data, the first prompt word could be "Please represent the preceding text as a feature vector," where "text" indicates that the target data is text-modal. As another example, if the target data is image-modal data, the first prompt word could be "Please represent the preceding visual information as a feature vector," where "visual information" indicates that the target data is image-modal. As yet another example, the first prompt word could be "Please represent the preceding news summary (or article title) as a feature vector," where "news summary (or article title)" indicates the type of target data. Thus, by explicitly indicating the data type to be compressed in the first prompt word, the first target model can be guided to better understand the modality and semantics of the data to be compressed, thereby generating a more accurate vectorized feature representation of the data to be compressed as a compressed representation.
[0045] In some embodiments, due to the language model's attention mechanism on the first input sequence, it focuses on the preceding part of the sequence rather than the latter part. Therefore, in the first input sequence, the target data to be compressed can be located before the first prompt word. For example, the first input sequence can be in the form of "target data to be compressed + first prompt word". Thus, by placing the data to be compressed before the prompt word, the speech model can better focus on the data to be compressed.
[0046] In some embodiments, the first input sequence may further include a predetermined symbol corresponding to the compressed representation of the target data. This predetermined symbol can be any suitable symbol. It may be pre-configured by the relevant user or determined by the compression device itself. For example, the predetermined symbol may be a symbol conforming to "[emb_token]". This predetermined symbol may be a symbol included in the first prompt word. In this case, the first prompt word may be, for example, "Please represent the preceding sentence as a feature vector: [emb_token]", and the first input sequence may be, for example, "Text to be compressed + Please represent the preceding sentence as a feature vector: [emb_token]". This predetermined symbol may also be a symbol directly included in the first input sequence. In this case, the first input sequence may be, for example, "Target data to be compressed + First prompt word + [emb_token]". It can be observed that regardless of whether the predetermined symbol is included in the first prompt word or the first input sequence, the predetermined symbol is fixed at the end of the first input sequence. The compression device may, for example, extract the compressed representation of the target data from the position corresponding to the predetermined symbol in the first output sequence. For example, the compression device may take the feature corresponding to the [emb_token] position as the compressed representation. That is, the compressed representation of the target data extracted by the compression device from the end of the first output sequence.
[0047] In some embodiments, at the first target model 210, the first target model 210 can perform tokenization on the first input sequence 202 to obtain a vectorized result. For example... Figure 2 As shown, the first target model 210 can perform vectorization on the first input sequence 202 to obtain a vectorized result 203 of the text (including text 201 and the first prompt word) and a vectorized result 204 of the predetermined symbols. The first target model 210 can, for example, generate a first output sequence based on the vectorization results. The first output sequence can be, for example, "feature vector: emb", where emb is the compressed representation of the target data (e.g., text 201).
[0048] Compression device 110 can obtain a first output sequence from the first target model 210 and extract a compressed representation 212 of text 201 from the first output sequence. For example, compression device 110 can extract the compressed representation 212 from the position (i.e., the end) corresponding to the predetermined symbol "[emb_token]" in the first output sequence. The compressed representation 212 is a vectorized representation of the target data 102.
[0049] The vectorized representation output by the first target model typically has a predetermined dimension. In some embodiments, in a data compression task, the first target model can be configured to output a single vectorized representation by default as the compressed representation of the target data. In some embodiments, to enable flexible compression of large target data (e.g., videos, long texts, or high-definition images), the first prompt can also indicate the number of vectorized representations of a predetermined dimension to be output. In this case, if the number can be any positive integer, the compressed representation can include at least one vectorized representation of a predetermined dimension. For example, if a video is to be compressed into K vectorized representations (i.e., the number of vectorized representations of a predetermined dimension is K), the first prompt can be constructed as "Please represent the preceding video visual information as feature vectors: [emb_token1], [emb_token2], ..., [emb_tokenK]". Accordingly, K compressed representations can be extracted from the K positions corresponding to the K [emb_tokens] in the first output sequence. It can be understood that the larger K is, the smaller the compression ratio; when K is 1, the compression ratio is the maximum. Therefore, the number of compressed representations included in the final compression result can be defined, enabling flexible and definable compression ratios in the data compression process.
[0050] The compressed representation 212 of the target data can be stored and transmitted. When decompression is required, the compressed representation 212 can be provided to the decompression device 220. The decompression device 220 can also obtain a second prompt word. The second prompt word could be, for example, "Please restore the preceding feature vector to plaintext". Based on the second prompt word and the compressed representation 212, the decompression device 220 can generate a second input sequence 205 for the second target model 220. The second input sequence 205 is provided to the second target model 220.
[0051] In embodiments of this disclosure, the decompression device acquires a compressed representation of the target data, which is a vectorized representation of the target data. For example, the decompression device can acquire a compressed representation of the target data provided by a compression device. Based on the second prompt word and the compressed representation, the decompression device generates a second input sequence for a second target model. By providing the second input sequence to the second target model, the decompression device obtains a second output sequence of the data decompression model and determines the decompressed target data from the second output sequence. The second target model can also be a model built based on a language model. The language model here can include LM, Large Language Model (LLM), etc. It should be noted that although the first target model and the second target model can be of the same type, their model parameters can be the same or different because they perform different tasks. The decompressed target data can also be data of any modality, and all data in the target data have the same modality. For example, the target data can include data of any modality such as text, image, video, or speech.
[0052] Similar to the first prompt, the second prompt instructs the second target model to perform the data decompression task. In some embodiments, the second prompt may also indicate the type of target data to be decompressed, and / or the modality of the target data to be decompressed. For example, the second prompt may be "Please restore the preceding feature vectors to text," "Please restore the preceding feature vectors to images," "Please restore the preceding feature vectors to videos," etc., where "text," "image," and "video" indicate the modality of the target data as text modality, image modality, and video modality, respectively. As another example, the second prompt may be "Please restore the preceding feature vectors to news summaries (or article titles)," where "news summaries (or article titles)" indicates the type of target data to be decompressed. Thus, by explicitly indicating the data type to be compressed in the second prompt, the second target model can be guided to better understand the modality and semantics of the target data to be compressed, thereby enabling more accurate decompression of the target data from the compressed representation.
[0053] In some embodiments, due to the language model's attention mechanism on the second input sequence, it focuses on the preceding sequence rather than the latter. Therefore, in the second input sequence, the compressed representation can be placed before the second cue word. For example, the second input sequence can be in the form of "compressed representation + second cue word". Thus, by placing the compressed representation to be decompressed before the cue word, the model can better focus on the compressed representation to be decompressed.
[0054] In some embodiments, at the second target model 220, the second target model 220 can vectorize the second input sequence 205 to obtain a vectorized result. Since the compressed representation 212 is itself a vectorized representation of the target data 102, the second target model 220 may not perform vectorization processing on the compressed representation 212. Figure 2 As shown, the second target model 220 can vectorize the second prompt word in the second input sequence 205 to obtain a vectorized result 206. The second target model 220 can, for example, generate a second output sequence based on the compressed representation 212 and the vectorized result 206. The second output sequence can, for example, be "plaintext as follows: XXXX", where "XXXX" can, for example, be the decompressed text 222.
[0055] The decompression device 120 can obtain the second output sequence from the second target model 220 and extract the decompressed text 222 from the second output sequence. For example, the decompression device 120 can also extract the decompressed text 222 from the end of the second output sequence.
[0056] In some embodiments, the first target model 210 and the second target model 220 may be jointly trained. The first target model 210 and the second target model 220 may be deployed in the same electronic device for training, and after training, they may be deployed in the compression device 110 and the decompression device 120, respectively, to perform data compression and data decompression tasks.
[0057] The first target model 210 and the second target model 220 can be executed using a supervised fine-tuning (SFT) method to continuously reduce or minimize the error between the training text as input to the model and the decompressed text as output. During the training phase of the first target model 210 and the second target model 220, the difference between the training text to be compressed and the decompressed training text can be obtained. The first target model 210 and the second target model 220 can be trained by reducing this difference. In response to this difference being less than a predetermined threshold, it is determined that the similarity between the training text to be compressed and the decompressed text is high, thus determining that the training of the first target model 210 and the second target model 220 is complete. It is understood that after training, the model parameters of the first target model 210 and the second target model 220 are fixed. After training, the difference between the training text to be compressed and the decompressed text should be sufficiently small.
[0058] The above described an example of target data including text modal data; the following describes an example of target data including non-text modal data (e.g., image modal, video modal, audio modal, etc.). In some embodiments, when the target data includes non-text modal data, the compression device may utilize a feature encoder corresponding to the modality of the target data to encode at least one feature representation from the target data. The compression device may then generate a first input sequence for a first target model based on at least one feature representation and a first cue word.
[0059] Accordingly, when the target data includes non-textual modal data, the decompression device can extract the decompressed feature representation of the target data from the second output sequence. The decompression device can then use a feature decoder corresponding to the modality of the target data to decode the target data from the decompressed feature representation.
[0060] The following is combined Figure 3 Examples of data that describe target data including image modalities. Figure 3 A schematic diagram of an example framework 300 for data compression and decompression according to other embodiments of the present disclosure is shown. Example framework 300 also includes a compression device 110 and a decompression device 120. The compression device 110 and decompression device 120 can also perform data compression and data decompression tasks respectively using a first target model 210 and a second target model 220.
[0061] like Figure 3 As shown, the compression device 110 can acquire the target data to be compressed (e.g., image 301) and a first prompt word. The first prompt word could be, for example, "Please represent the preceding visual information as a feature vector." Since the first target model 110 is a language model, it cannot directly process the non-textual modal target data 301 (e.g., images, videos, or audio). The compression device 110 can provide the non-textual modal target data 301 to a trained feature encoder 310, and obtain a feature representation 311 using the feature encoder 310. For example, if the modality of the target data 301 is an image modality or a video modality, the corresponding feature encoder 310 can be an image encoder. For video modal target data 301, the image encoder can extract feature representations of each video frame. The feature encoder 310 can be, for example, a visual transducer (ViT). The feature representation 311 includes at least one image feature representation (e.g., it can include 196 image feature representations output by the ViT). If the modality of the target data is an audio modality, the corresponding feature encoder can be an audio encoder.
[0062] Compression device 110 can generate a first input sequence 312 for first target model 210 based on the first prompt word and feature representation 311. The first input sequence 312 is provided to the first target model 210. Similar to text, at the first target model 210, the first target model 210 can vectorize the first input sequence 212 to obtain a vectorized result. In the vectorized result, the vectorized result 313 of the predetermined symbols is located at the end. The first target model 210 can, for example, generate a first output sequence based on the vectorized result. The first output sequence can, for example, be "feature vector: emb", where emb is the compressed representation of the target data 301.
[0063] Compression device 110 can obtain a first output sequence from the first target model 210 and extract a compressed representation 314 of the non-text modality target data 301 from the first output sequence. For example, compression device 110 can extract the compressed representation 314 from the position (i.e., the end) of the first output sequence corresponding to the predetermined symbol "[emb_token]" in the first input sequence 312. The compressed representation 314 is a vectorized representation of the feature representation 311. The compressed representation 314 is provided to decompression device 220. Decompression device 220 can also obtain a second prompt word. The second prompt word could be, for example, "Please restore the preceding feature vectors to visual information". Decompression device 220 can generate a second input sequence 315 for the second target model 220 based on the second prompt word and the compressed representation 314. The second input sequence 315 is provided to the second target model 220.
[0064] In some embodiments, at the second target model 220, the second target model 220 can vectorize the second input sequence 315 to obtain a vectorized result. Since the compressed representation 314 is itself a vectorized representation, the second target model 220 may not perform vectorization processing on the compressed representation 314. Figure 3 As shown, the second target model 220 can vectorize the second prompt word in the second input sequence 315 to obtain a vectorized result. The second target model 220 can, for example, generate a second output sequence based on the compressed representation 212 and the vectorized result. The second output sequence can be, for example, "Visual information is as follows: XXXX", where "XXXX" can be, for example, the decompressed feature representation 316. The decompressed feature representation 316 can include at least one image feature representation.
[0065] The decompression device 120 can obtain a second output sequence from the second target model 220 and extract the decompressed feature representation 316 from the second output sequence. For example, the decompression device 120 can also extract the decompressed feature representation 316 from the end of the second output sequence. The decompression device 120 can provide the decompressed feature representation 316 to the trained feature decoder 320. The decompression device 120 can use the feature decoder 320 to decode the decoded target data 321 (e.g., decoded image, video, or audio) from the decompressed feature representation 316. The selection of the feature decoder 320 is related to the modality of the target data to be decompressed. For example, if the modality of the target data 301 is an image modality or a video modality, the corresponding feature decoder 320 can be an image decoder. If the modality of the target data 301 is an audio modality, the corresponding feature decoder can be an audio decoder.
[0066] During the training phase of the first target model 210 and the second target model 220, training can be performed by continuously reducing or minimizing the difference between the feature representation before compression and the feature representation after decompression. By reducing this difference, it can be determined that the similarity between the feature representation 311 to be compressed and the feature representation 316 after decompression is high, thereby determining that the training of the first target model 210 and the second target model 220 is complete. In some embodiments, since both the feature encoder 310 and the feature decoder 320 are trained encoders and decoders, the first target model 210 and the second target model 220 can also be trained by reducing or minimizing the difference between the training data to be compressed and the compressed target data (while fixing the parameters of the feature encoder 310 and the feature decoder 320). When this difference is small or minimized, it can be determined that the similarity between the data to be compressed and the data after decoding is high, thereby determining that the training of the first target model 210 and the second target model 220 is complete. Thus, at the end of training, the difference between the feature representation to be compressed and the feature representation after decompression should be sufficiently small, and the difference between the input data and the output data should also be sufficiently small.
[0067] In summary, in the embodiments of this disclosure, data compression and decompression can be achieved by inputting corresponding prompt words into a language model. This allows for convenient and quick data compression and decompression while improving the quality of data compression and decompression.
[0068] Figure 4 A flowchart of a data compression process 400 according to some embodiments of the present disclosure is shown. Process 400 can be implemented at compression device 110. Reference is made below. Figure 1 Describe the process 400.
[0069] In box 410, compression device 110 generates a first input sequence for a first target model based on a first prompt word and the target data to be compressed. The first target model is constructed based on a language model, and the first prompt word instructs the first target model to perform a data compression task.
[0070] In box 420, compression device 110 obtains a first output sequence of the first target model by providing a first input sequence to the first target model.
[0071] In box 430, compression device 110 extracts a compressed representation of the target data from the first output sequence, the compressed representation being a vectorized representation of the target data.
[0072] In some embodiments, the first prompt word also indicates at least one of the following: the type of target data, the modality of the target data.
[0073] In some embodiments, in the first input sequence, the target data is located before the first prompt word.
[0074] In some embodiments, the first input sequence further includes a predetermined symbol corresponding to the compressed representation of the target data, and wherein extracting the compressed representation of the target data from the first output sequence includes: extracting the compressed representation of the target data from the position in the first output sequence corresponding to the predetermined symbol.
[0075] In some embodiments, the target data includes one of the following: text, image, video, and voice.
[0076] In some embodiments, the target data includes non-textual modal data, and generating a first input sequence for a first target model includes: encoding at least one feature representation from the target data using a feature encoder corresponding to the modality of the target data; and generating a first input sequence for the first target model based on the at least one feature representation and a first cue word.
[0077] In some embodiments, the compressed representation includes a vectorized representation of at least one predetermined dimension, and wherein a first prompt indicates the number of vectorized representations of the predetermined dimension to be output.
[0078] Figure 5 A flowchart of a data decompression process 500 according to some embodiments of the present disclosure is shown. Process 500 can be implemented at a decompression device 120. Reference is made below. Figure 1 Describe the process 500.
[0079] In box 510, the decompression device 120 obtains the compressed representation of the target data, which is a vectorized representation of the target data.
[0080] In box 520, decompression device 120 generates a second input sequence for a second target model based on a second cue word and a compression representation. The second target model is constructed based on a language model, and the second cue word instructs the second target model to perform a data decompression task.
[0081] In box 530, the decompression device 120 obtains the second output sequence of the data decompression model by providing the second input sequence to the second target model.
[0082] In box 540, decompression device 120 determines the decompressed target data from the second output sequence.
[0083] In some embodiments, the second prompt word also indicates at least one of the following: the type of the target data to be decompressed, and the modality of the target data to be decompressed.
[0084] In some embodiments, in the second input sequence, the compressed representation is positioned before the second cue word.
[0085] In some embodiments, the target data includes one of the following: text, image, video, and voice.
[0086] In some embodiments, the target data includes non-textual modal data, and determining the decompressed target data from the second output sequence includes: extracting the decompressed feature representation of the target data from the second output sequence; and decoding the target data from the decompressed feature representation using a feature decoder corresponding to the modality of the target data.
[0087] Embodiments of this disclosure also provide corresponding apparatus for implementing the above methods or processes.
[0088] Figure 6 A schematic structural block diagram of an apparatus 600 for data compression according to certain embodiments of the present disclosure is shown. The apparatus 600 may be implemented as or included in a compression device 110. The various modules / components in the apparatus 600 may be implemented by hardware, software, firmware, or any combination thereof.
[0089] like Figure 6 As shown, the device 600 includes a first input generation module 610, configured to generate a first input sequence for a first target model based on a first prompt word and target data to be compressed. The first target model is constructed based on a language model, and the first prompt word instructs the first target model to perform a data compression task. The device 600 also includes a first output acquisition module 620, configured to obtain a first output sequence of the first target model by providing the first input sequence to the first target model. The device 600 further includes a compressed representation extraction module 630, configured to extract a compressed representation of the target data from the first output sequence. The compressed representation is a vectorized representation of the target data.
[0090] In some embodiments, the first prompt word also indicates at least one of the following: the type of target data, the modality of the target data.
[0091] In some embodiments, in the first input sequence, the target data is located before the first prompt word.
[0092] In some embodiments, the first input sequence further includes a predetermined symbol corresponding to the compressed representation of the target data, and the compressed representation extraction module 630 may be specifically configured to extract the compressed representation of the target data from the position corresponding to the predetermined symbol in the first output sequence.
[0093] In some embodiments, the target data includes one of the following: text, image, video, and voice.
[0094] In some embodiments, the target data includes non-textual modal data, and the first input generation module 610 includes: an encoding module configured to encode at least one feature representation from the target data using a feature encoder corresponding to the modality of the target data; and a generation module configured to generate a first input sequence for a first target model based on at least one feature representation and a first cue word.
[0095] In some embodiments, the compressed representation includes a vectorized representation of at least one predetermined dimension, and wherein a first prompt indicates the number of vectorized representations of the predetermined dimension to be output.
[0096] Figure 7 A schematic structural block diagram of an apparatus 700 for data decompression according to certain embodiments of the present disclosure is shown. The apparatus 700 may be implemented as or included in a decompression device 120. The various modules / components in the apparatus 700 may be implemented by hardware, software, firmware, or any combination thereof.
[0097] like Figure 7 As shown, the device 700 includes a compressed representation acquisition module 710, configured to acquire a compressed representation of the target data, wherein the compressed representation is a vectorized representation of the target data. The device 700 also includes a second input generation module 720, configured to generate a second input sequence for a second target model based on a second prompt word and the compressed representation. The second target model is constructed based on a language model, and the second prompt word instructs the second target model to perform a data decompression task. The device 700 also includes a second output acquisition module 730, configured to obtain a second output sequence of the data decompression model by providing the second input sequence to the second target model. The device 700 also includes a target data determination module 740, configured to determine the decompressed target data from the second output sequence.
[0098] In some embodiments, the second prompt word also indicates at least one of the following: the type of the target data to be decompressed, and the modality of the target data to be decompressed.
[0099] In some embodiments, in the second input sequence, the compressed representation is positioned before the second cue word.
[0100] In some embodiments, the target data includes one of the following: text, image, video, and voice.
[0101] In some embodiments, the target data includes non-textual modal data, and the target data determination module 740 includes: an extraction module configured to extract a decompressed feature representation of the target data from a second output sequence; and a decoding module configured to decode the target data from the decompressed feature representation using a feature decoder corresponding to the modality of the target data.
[0102] The units and / or modules included in apparatus 600 and apparatus 700 can be implemented in various ways, including software, hardware, firmware, or any combination thereof. In some embodiments, one or more units and / or modules can be implemented using software and / or firmware, such as machine-executable instructions stored on a storage medium. In addition to or as an alternative to machine-executable instructions, some or all of the units and / or modules in these apparatuses can be implemented at least partially by one or more hardware logic components. By way of example and not limitation, exemplary types of hardware logic components that can be used include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), and so on.
[0103] Figure 8 A block diagram of an electronic device 800 in which one or more embodiments of the present disclosure may be implemented is shown. It should be understood that... Figure 8 The electronic device 800 shown is merely exemplary and should not be construed as limiting the functionality and scope of the embodiments described herein. Figure 8 The electronic device 800 shown can be used to achieve Figure 1 Compression device 110 and / or decompression device 120.
[0104] like Figure 8 As shown, electronic device 800 is in the form of a general-purpose electronic device. Components of electronic device 800 may include, but are not limited to, one or more processors or processing units 810, memory 820, storage device 830, one or more communication units 840, one or more input devices 850, and one or more output devices 860. Processing unit 810 may be a physical or virtual processor and is capable of performing various processes according to programs stored in memory 820. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capability of electronic device 800.
[0105] Electronic device 800 typically includes multiple computer storage media. Such media can be any available media accessible to electronic device 800, including but not limited to volatile and non-volatile media, removable and non-removable media. Memory 820 can be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 830 can be removable or non-removable media and can include machine-readable media, such as flash drives, disks, or any other media capable of storing information and / or data and accessible within electronic device 800.
[0106] Electronic device 800 may further include additional removable / non-removable, volatile / non-volatile storage media. Although not explicitly stated... Figure 8 As shown, disk drives for reading from or writing to removable, non-volatile disks (e.g., "floppy disks") and optical disk drives for reading from or writing to removable, non-volatile optical disks can be provided. In these cases, each drive can be connected to a bus (not shown) via one or more data media interfaces. Memory 820 may include computer program product 825 having one or more program modules configured to perform various methods or actions of various embodiments of this disclosure.
[0107] The communication unit 840 enables communication with other electronic devices via a communication medium. Additionally, the functionality of the components of the electronic device 800 can be implemented using a single computing cluster or multiple computing machines capable of communicating via communication connections. Therefore, the electronic device 800 can operate in a networked environment using logical connections to one or more other servers, network personal computers (PCs), or another network node.
[0108] Input device 850 can be one or more input devices, such as a mouse, keyboard, trackball, etc. Output device 860 can be one or more output devices, such as a monitor, speaker, printer, etc. Electronic device 800 can also communicate with one or more external devices (not shown) via communication unit 840 as needed. These external devices include storage devices, display devices, etc., and can communicate with one or more devices that enable user interaction with electronic device 800, or with any device that enables electronic device 800 to communicate with one or more other electronic devices (e.g., network card, modem, etc.). Such communication can be performed via input / output (I / O) interface (not shown).
[0109] According to an exemplary implementation of this disclosure, a computer-readable storage medium is provided that stores computer-executable instructions thereon, wherein the computer-executable instructions are executed by a processor to implement the methods described above. According to an exemplary implementation of this disclosure, a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, which are executed by a processor to implement the methods described above.
[0110] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatuses, devices, and computer program products implemented according to this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0111] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0112] Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions that execute on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0113] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0114] Various implementations of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed implementations. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to technology in the market, or to enable others skilled in the art to understand the various implementations disclosed herein.
Claims
1. A method for data compression, comprising: Based on the first prompt word and the target data to be compressed, a first input sequence is generated for the first target model, which is constructed based on a language model. The first prompt word instructs the first target model to perform a data compression task. By providing the first input sequence to the first target model, the first output sequence of the first target model is obtained; as well as A compressed representation of the target data is extracted from the first output sequence, wherein the compressed representation is a vectorized representation of the target data, and the compressed representation is provided to a second target model, which is configured to perform a data decompression task based on the compressed representation to recover the target data.
2. The method of claim 1, wherein the first prompt word further indicates at least one of the following: the type of the target data, the modality of the target data.
3. The method of claim 1, wherein in the first input sequence, the target data is located before the first prompt word.
4. The method of claim 1, wherein the first input sequence further includes a predetermined symbol corresponding to the compressed representation of the target data, and wherein extracting the compressed representation of the target data from the first output sequence includes: Extract the compressed representation of the target data from the position corresponding to the predetermined symbol in the first output sequence.
5. The method according to claim 1, wherein the target data includes one of the following: text, image, video, and voice.
6. The method according to any one of claims 1 to 5, wherein the target data includes non-textual modal data, and generating the first input sequence for the first target model comprises: At least one feature representation is encoded from the target data using a feature encoder that corresponds to the modality of the target data; as well as Based on the at least one feature representation and the first prompt word, a first input sequence is generated for the first target model.
7. The method of claim 1, wherein the compressed representation comprises a vectorized representation of at least one predetermined dimension, and wherein the first prompt indicates the number of vectorized representations of the predetermined dimension to be output.
8. A method for data decompression, comprising: Obtain a compressed representation of the target data, wherein the compressed representation is a vectorized representation of the target data, and the compressed representation is obtained by the first target model performing a data compression task on the target data; Based on the second prompt word and the compressed representation, a second input sequence is generated for the second target model, which is constructed based on a language model. The second prompt word instructs the second target model to perform a data decompression task. By providing the second input sequence to the second target model, the second output sequence of the data decompression model is obtained; and The decompressed target data is determined from the second output sequence.
9. The method of claim 8, wherein the second prompt word further indicates at least one of the following: the type of the target data to be decompressed, and the modality of the target data to be decompressed.
10. The method of claim 8, wherein in the second input sequence, the compressed representation is located before the second prompt word.
11. The method of claim 8, wherein the target data includes one of the following: text, image, video, and voice.
12. The method according to any one of claims 8 to 11, wherein the target data comprises non-textual modal data, and determining the decompressed target data from the second output sequence comprises: Extract the decompressed feature representation of the target data from the second output sequence; as well as The target data is decoded from the decompressed feature representation using a feature decoder corresponding to the modality of the target data.
13. An apparatus for data compression, comprising: The first input generation module is configured to generate a first input sequence for a first target model based on a first prompt word and target data to be compressed. The first target model is constructed based on a language model, and the first prompt word instructs the first target model to perform a data compression task. The first output acquisition module is configured to obtain a first output sequence of the first target model by providing the first input sequence to the first target model; and A compression representation extraction module is configured to extract a compressed representation of the target data from the first output sequence, wherein the compressed representation is a vectorized representation of the target data, and the compressed representation is provided to a second target model, which is configured to perform a data decompression task based on the compressed representation to recover the target data.
14. An apparatus for data decompression, comprising: The compression representation acquisition module is configured to acquire a compressed representation of the target data, wherein the compressed representation is a vectorized representation of the target data, and the compressed representation is obtained by the first target model performing a data compression task on the target data; The second input generation module is configured to generate a second input sequence for a second target model based on a second prompt word and the compressed representation. The second target model is constructed based on a language model, and the second prompt word instructs the second target model to perform a data decompression task. The second output acquisition module is configured to obtain the second output sequence of the data decompression model by providing the second input sequence to the second target model; as well as The target data determination module is configured to determine the decompressed target data from the second output sequence.
15. An electronic device comprising: At least one processing unit; as well as At least one memory, coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions causing the electronic device to perform the method according to any one of claims 1 to 7 or the method according to any one of claims 8 to 12 when executed by the at least one processing unit.
16. A computer-readable storage medium having a computer program stored thereon, the computer program being executable by a processor to implement the method according to any one of claims 1 to 7 or the method according to any one of claims 8 to 12.