Method and apparatus for inadvertent transmission using a trusted environment

By dividing data objects into data blocks and distributing them to multiple data buckets, and using encoded information and data compression techniques, the memory limitation problem in TEE is solved, achieving low-cost data privacy protection and secure distribution.

CN115244524BActive Publication Date: 2026-06-12HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2020-07-30
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, when using a Trusted Execution Environment (TEE), the total size of the distributed data objects (DOs) is much larger than the available memory in the TEE, which makes practical implementation difficult and fails to effectively protect data privacy and achieve secure distribution.

Method used

The data object is divided into multiple data blocks and allocated to multiple data buckets. The corresponding data blocks in the data stream are identified using encoding information. The data is transmitted and decoded through a trusted environment. Data compression and reordering are combined to reduce computation and storage costs.

🎯Benefits of technology

It achieves secure transmission that effectively protects data privacy with low computational and storage costs, reduces the data stream size for each query, and improves the efficiency and feasibility of data transmission.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115244524B_ABST
    Figure CN115244524B_ABST
Patent Text Reader

Abstract

Methods and apparatus for inadvertent transmission using a trusted intermediary environment are described. A requested data object is identified using a data object identifier. The requested data object is stored as a plurality of corresponding data blocks on a plurality of data buckets. The data object identifier is encoded with information identifying each data block of the plurality of corresponding data blocks within each respective data bucket. A trusted intermediary environment receives a data stream including data blocks stored in an assigned data bucket. Using the encoded information from the data object identifier, the trusted intermediary environment determines which data block of the data stream is a corresponding data block streamed from the assigned data bucket.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-reference to related applications

[0002] This application claims priority to U.S. Patent Application No. 16 / 816,120, filed March 11, 2020, entitled “METHODS AND APPARATUSES FOR OBLIVIOUS TRANSFERUSING TRUSTED ENVIRONMENT”, the contents of which are incorporated herein by reference in their entirety. Technical Field

[0003] The present invention relates to methods and apparatus for securely transmitting electronic data, such as unintentional transmission of data, which may include the use of a trusted environment, such as a trusted execution environment. Background Technology

[0004] Content distributed over a network (such as the Internet) can be provided from third-party servers that are neither operated and controlled by the data publisher nor by the intended recipient (such as a consumer). For example, data objects (DOs) including video can be distributed to on-demand video Internet resource customers with the help of content delivery network (CDN) services or cloud hosting providers.

[0005] Because third-party servers service the actual content requests, the server operators can see the specific DOs requested by the recipients. This means that such distribution cannot guarantee the privacy of the recipients' data consumption. In cases where the recipient pool is publicly accessible, such as in on-demand video internet resources, encrypting the data before distribution cannot solve this problem.

[0006] In academic literature, the issue of data privacy from server operators is more generally referred to as oblivious transfer (OT) or private information retrieval (PIR).

[0007] One approach to addressing operational challenges (OT) is to add a trusted environment, such as a trusted execution environment (TEE) hardware, to a third-party server. Some computational operations performed within the TEE are protected by the server operator. Therefore, users of the TEE can obfuscate the specific DOs requested by the server operator in each specific content request. However, this approach suffers from the problem that the total size of the distributed DOs is often far greater than the available memory within the TEE, hindering practical implementation. Ideally, a method should be provided to improve the use of TEEs to achieve secure and private distribution of DOs. Summary of the Invention

[0008] The various examples described herein provide methods and apparatuses that help protect data privacy during content transfers from third-party servers. Trusted environments, such as TEEs, are used. The disclosed examples can be implemented practically because the computational and memory costs are relatively lower than other methods.

[0009] In some aspects, the present invention describes a method for a trusted environment within a server. The method includes: receiving a data object identifier identifying a requested data object in the trusted environment, the requested data object being stored as a plurality of corresponding data blocks in corresponding data buckets, the data object identifier being encoded with information identifying each of the plurality of corresponding data blocks within the corresponding data buckets; wherein an allocated data bucket has been assigned to the trusted environment, and a data stream corresponding to the allocated data bucket is received in the trusted environment, the data stream including all data blocks stored in the allocated data buckets; using the encoded information to determine which data block in the data stream is the corresponding data block streamed from the allocated data buckets; and sending the corresponding data block from the trusted environment to the server.

[0010] In some examples, the method further includes receiving the data stream from the server into the trusted environment.

[0011] In some examples, the encoded information identifying the plurality of corresponding data blocks may be multiple fingerprint values, each fingerprint value uniquely identifying the corresponding data block within the corresponding data bucket.

[0012] In some examples, determining the corresponding data block may include: determining a correct fingerprint value from the data object identifier, the correct fingerprint value identifying the corresponding data block of the allocated data bucket; determining the fingerprint value for each data block in the data stream; and comparing the fingerprint value of each data block in the data stream with the correct fingerprint value to determine the corresponding data block.

[0013] In some examples, the data object identifier may also be encoded with information used to sort the corresponding data blocks to recover the requested data object.

[0014] In some examples, determining which data block among the data blocks is the corresponding data block may include: using the information used to sort the corresponding data blocks to determine the information identifying the corresponding data block of the allocated data bucket.

[0015] In some examples, the encoded information used to sort the corresponding data blocks may be a permutation token.

[0016] In some examples, the method may further include: receiving the data object identifier as an encrypted data object identifier in the trusted environment; decrypting the data object identifier using the private key of the trusted environment; and sending the corresponding data block as an encrypted corresponding data block from the trusted environment to the server.

[0017] In some examples, the method may further include performing the following steps: receiving the data object identifier; determining the corresponding data block; and sending the corresponding data block, at least partially in parallel, for the first received data object identifier and the second received data object identifier.

[0018] In some examples, the data stream may include compressed data, and the method may further include decompressing the compressed data.

[0019] In some aspects, the present invention describes a method for publishing a server. The method includes: for each given data object among a plurality of data objects: dividing the given data object into a plurality of corresponding data blocks; assigning each of the plurality of corresponding data blocks to a corresponding data bucket among the plurality of data buckets based on the similarity of each corresponding data block to any other data blocks assigned to the corresponding data bucket; wherein all data blocks of all data objects are assigned to one of the plurality of data buckets; for each given data object, generating a data object identifier for retrieving the given data object from the plurality of data buckets, the data object identifier being encoded with information identifying each of the plurality of corresponding data blocks within the corresponding data bucket, the data object identifier also being encoded with information for sorting the corresponding data blocks to retrieve the requested data object; compressing each data bucket into a corresponding compressed dataset; and publishing the compressed dataset and the generated data object identifier.

[0020] In some examples, the data blocks assigned to each corresponding data bucket can be sorted based on similarity within each corresponding data bucket.

[0021] In some examples, generating the data object identifier may include: for each given data bucket, determining a fingerprint value for each data block assigned to the given data bucket, the fingerprint value uniquely identifying each corresponding data block within the given data bucket; and for each given data object, encoding the fingerprint value of each corresponding data block in the data object identifier.

[0022] In some examples, generating the data object identifier may include: for each given data object: determining a bucket order, assigning each corresponding data block to a corresponding data bucket according to the bucket order; generating a permutation token representing the bucket order; and encoding the permutation token in the data object identifier.

[0023] In some examples, each data bucket may have been allocated a single corresponding data block from each data object.

[0024] In some examples, each data bucket can be compressed using adaptive statistical coding compression.

[0025] In some aspects, the present invention describes a method for an electronic device. The method includes: receiving a plurality of data blocks in response to a query for a data object; determining information from a data object identifier associated with the data object for sorting the data blocks to recover the data object; and reordering the data blocks to recover the data object.

[0026] In some examples, the method further includes sending a query to the data object, the query including the data object identifier.

[0027] In some examples, the data object identifier sent in the query can be encoded with information identifying each of the plurality of data blocks within the corresponding data bucket, and the data object identifier can also be encoded with information used to sort the data blocks.

[0028] In some examples, the information used to determine the sorting of the data blocks may include a permutation token encoded in the data object identifier.

[0029] In some examples, the method may further include: receiving the plurality of data blocks as a plurality of encrypted data blocks; and decrypting the plurality of encrypted data blocks using a private key. Attached Figure Description

[0030] The accompanying drawings, which now illustrate exemplary embodiments of this application, are shown by way of example, wherein:

[0031] Figure 1This is a block diagram of an exemplary simplified system that can implement the examples disclosed herein;

[0032] Figure 2 This is a block diagram of an exemplary server that can be used to implement the examples described herein;

[0033] Figure 3 This is a block diagram of an exemplary electronic device that can be used to implement the examples described herein;

[0034] Figure 4 This is a signaling diagram representing the existing content distribution scheme;

[0035] Figure 5 This is a flowchart of an exemplary method for generating data object identifiers provided in the examples described in this article;

[0036] Figures 6A to 6E Is to implement Figure 5 A schematic diagram of exemplary steps of the method;

[0037] Figure 7 This is a signaling diagram representing an example of content distribution provided in the examples described in this article;

[0038] Figure 8 Is Figure 7 A flowchart of an exemplary method executed by the server within the context of the operation;

[0039] Figure 9 Is Figure 7 A flowchart of an exemplary method executed by the TEE within the context of the operation;

[0040] Figure 10 Is Figure 7 A flowchart of an exemplary method executed by a client electronic device in the context of an operation.

[0041] The same reference numerals may be used to denote the same components in different figures. Detailed Implementation

[0042] The examples disclosed herein describe methods and apparatus that facilitate the transfer of content from a server and the implementation of data privacy from the server's operator using a trusted, secure intermediary environment. The disclosed methods can be implemented using relatively low computational and memory costs per service request, making this approach economically viable for commercialization. To aid in understanding the invention, the following descriptions are provided first. Figures 1 to 3 .

[0043] Figure 1 An exemplary system 100 including network 105 is illustrated. For ease of understanding, system 100 is simplified in this example; typically, the entities and components in system 100 can be more... Figure 1More entities and components are shown. Network 105 can be any form of network (e.g., intranet, internet, P2P network, WAN, and / or LAN) and can be a public network. System 100 can be used to distribute content published by publisher 108 and stored on a third-party data center (server 110 in this example). Publisher 108 can be a publisher server, referred to herein as publisher 108. Content is distributed from server 110 to electronic device (ED) 150 via wireless communication over network 105. Although Figure 1 Not shown, but communication between publisher 108 and server 110 can also be via network 105 or another network (which can be a private or public network), or via a wired connection. As further described below, a trusted security environment (e.g., a trusted execution environment (TEE) 130) exists within server 110. Although Figure 1 A single TEE 130 within server 110 is shown, but it should be understood that there can be multiple TEEs 130 within a server.

[0044] Figure 2 This is a simplified example block diagram of server 110. Other examples suitable for implementing the embodiments described in this invention may be used, and these examples may include components different from those described below. Although Figure 2 A single instance of each component is shown, but multiple instances of each component may exist in server 110.

[0045] Server 110 may include one or more processing devices 114, such as processors, microprocessors, digital signal processors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), dedicated logic circuits, dedicated artificial intelligence processor units, or combinations thereof. Server 110 may also include one or more optional input / output (I / O) interfaces 116, which may support connection to one or more optional input devices 118 and / or optional output devices 120.

[0046] In the example shown, one or more input devices 118 (e.g., keyboard, mouse, microphone, touchscreen, and / or keypad) and one or more output devices 120 (e.g., monitor, speaker, and / or printer) are shown as optional and external to server 110. In other examples, there may be no input devices 118 and output devices 120, in which case I / O interface 116 may not be required.

[0047] Server 110 may include one or more network interfaces 122 for wired or wireless communication with other entities or nodes in network 105, ED 150, or system 100. The one or more network interfaces 122 may include wired links (e.g., Ethernet cables) and / or wireless links (e.g., one or more antennas) for intra-network and / or inter-network communication.

[0048] Server 110 may also include one or more storage units 124, which may include high-capacity storage units such as solid-state drives, hard disk drives, disk drives, and / or optical disk drives. The one or more storage units 124 may store one or more data objects (DOs) 126, which may be requested and sent to ED 150, as further described below. DOs 126 may include content with a large amount of data, such as video files. DOs 126 stored by third-party server 110 may be published by publishers other than the operator of server 110. DOs 126 may be encrypted, and the content of DOs 126 may be unknown to the operations of server 110. DOs 126 may be stored in compressed form, as further described below.

[0049] Server 110 may include one or more memories 128, which may include volatile or non-volatile memories (e.g., flash memory, random access memory (RAM), and / or read-only memory (ROM)). One or more non-transitory memories 128 may store instructions executable by one or more processing devices 114, for example, to perform the examples described herein. One or more memories 128 may include other software instructions, such as software instructions for implementing an operating system and other applications / functions. In some examples, one or more memories 128 may include software instructions executable by processing device 114 to retrieve and send one or more DOs 126 in response to a request from ED 150, as further described below. In some examples, additionally or alternatively, server 110 may execute instructions from external memory (e.g., an external drive connected to server 110 via wired or wireless communication), or executable instructions may be provided by transient or non-transitory computer-readable media. Examples of non-transitory computer-readable media include RAM, ROM, erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, CD-ROM, or other portable storage devices.

[0050] Server 110 also includes a trusted secure environment, such as TEE 130, in which operations that obfuscate or hide the network identity of server 110 can be performed. While TEE 130 is shown as an example of a trusted environment supported by hardware within server 110, trusted environments can be implemented in other ways. Whether implemented using TEE 130 or otherwise, a trusted environment protects the privacy and security of data and operations performed within it. The trusted environment has security features (typically physical security features) that provide tamper-proof capabilities, thereby protecting the integrity of stored data, encryption keys, and instructions performed within the trusted environment. The trusted environment provides a secure intermediate environment trusted by ED 150 for transferring private and secure data from server 110. In some examples, the trusted environment can be any secure environment that serves as the medium for data transfer from server 110 to ED 150, and can include trusted environments provided by hardware external to server 110. For example, the trusted environment can be within ED 150 or within a third network entity, etc. For simplicity, this invention will refer to TEE 130; however, it should be understood that a trusted environment can be implemented in other ways. For example, a trusted environment can also be implemented using a hardware security module (HSM) or a trusted platform module (TPM), etc.

[0051] It should be noted that although TEE 130 is located within server 110, because TEE 130 is protected by the operator of server 110, even if TEE 130 can be physically implemented within server 110, TEE 130 can be described and represented herein as a component separate from the overall server 110. In the example where TEE 130 is located within server 110, signals and data can communicate between TEE 130 and the overall server 110 (i.e., the server environment outside of TEE 130). Therefore, although in some examples TEE 130 is a physical part of server 110, TEE 130 can be considered as receiving signals and data from server 110 and sending electronic signals and data to server 110. For example, such communication between server 110 and TEE 130 within server 110 can involve internal communication between the physical chips of server 110, as well as other possible implementations.

[0052] Server 110 may also include bus 132, providing communication between components of server 110, including those described above. Bus 132 may be any suitable bus architecture, such as a memory bus, peripheral bus, or video bus.

[0053] Figure 3This is a simplified example block diagram of ED 150. Other examples suitable for implementing the embodiments described in this invention may be used, and these examples may include components different from those described below. Although Figure 3 A single instance of each component is shown, but multiple instances of each component may exist in ED 150.

[0054] Each ED 150 can be any suitable end-user equipment for wireless operation and can include (or may be referred to as): user equipment (UE), wireless transmit / receive unit (WTRU), mobile station, fixed or mobile subscriber unit, cellular phone, station (STA), personal digital assistant (PDA), smartphone, laptop, computer, tablet, wireless sensor, smart device or consumer electronics device, etc.

[0055] like Figure 3 As shown, ED 150 includes at least one processing device 154. Processing device 154 implements various processing operations of ED 150. For example, processing device 154 may perform signal encoding, data processing, power control, input / output processing, or any other function that enables ED 150 to operate within system 100. Processing device 154 may also be used to implement some or all of the functions and / or embodiments described herein. For example, processing device 154 may be a processor, microprocessor, digital signal processor, application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), dedicated logic circuit, dedicated artificial intelligence processor unit, or a combination thereof.

[0056] ED 150 may also include one or more optional input / output (I / O) interfaces 156, which may support connection to one or more optional input devices 158 and / or optional output devices 160. For example, one or more input / output devices 158, 160 support user interaction. Each input / output device 158, 160 includes any suitable structure for providing or receiving information from the user, such as a speaker, microphone, keypad, keyboard, display, or touchscreen, etc. In some examples, a single device may provide both input and output capabilities, such as a touchscreen.

[0057] ED 150 also includes one or more network interfaces 162 to support communication within system 100. The one or more network interfaces 162 include any suitable structures for generating signals for wireless or wired transmission and / or for processing wirelessly or wired received signals. The one or more network interfaces 162 may include wired links (e.g., Ethernet cables) and / or wireless links (e.g., one or more antennas) for intra-network and / or inter-network communication. In some examples, the one or more network interfaces 162 may include separate transmitter and receiver components; in other examples, the one or more network interfaces 162 may include transceiver components that combine the functions of a transmitter and a receiver.

[0058] ED 150 may also include one or more storage units 164, which may include mass storage units such as solid-state drives, hard disk drives, disk drives, and / or optical disk drives. ED 150 may also include one or more memories 168, which may include volatile or non-volatile memories (e.g., flash memory, random access memory (RAM), and / or read-only memory (ROM)). The one or more non-transient memories 168 may store instructions executed by the one or more processing devices 154, for example, to perform the examples described in this invention. The one or more memories 168 may include other software instructions, such as software instructions for implementing operating systems and other applications / functions. In some examples, the one or more memories 168 may include software instructions executable by the processing device 154 to request and retrieve (e.g., decrypt) one or more DOs 126 from the server 110, as further described below.

[0059] In some examples, ED 150 may additionally or otherwise execute instructions from external memory (e.g., an external drive that is wired or wirelessly connected to server 150), or the executable instructions may be provided by transient or non-transient computer-readable media. Examples of non-transient computer-readable media include RAM, ROM, erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, CD-ROM, or other portable storage devices.

[0060] As described above, when content is requested by ED 150, the issue of data privacy obtained from the operator of server 110 can be referred to in the art as oblivious transfer (OT). By using OT, ED 150 can request a specific DO within the dataset stored by server 110, and can receive the correct response to the request in such a way that the operator of server 110 is unaware of which DO in the dataset was requested and sent.

[0061] For example, Smith et al. (SW. Smith and D. Safford; "Practical server privacy with secure coprocessors," IBM Systems Journal, Vol. 40, No. 3, 2001) have proposed solutions to the OT problem. To help understand the shortcomings of existing solutions, a simple example is presented below.

[0062] Figure 4 This is a signaling diagram illustrating a simple, existing scenario. In this example, multiple DOs are published as a dataset (e.g., published by publisher 108). The dataset is then stored on an insecure storage device within server 110 (e.g., ...). Figure 2 The storage unit 124 is located within server 110. Within server 110, one or more TEEs 130 can perform secure computations and store protected private keys signed by the certificate authority (CA) of the TEE's manufacturer. The operator of server 110 is considered honest-but-curious (HBC). That is, when performing actions required by a service request ED, server 110 does not reveal any information to the operator about which DO in the dataset requested by ED 150. While this invention describes some examples of operators referencing HBC, it should be understood that the examples described herein can also be used in situations where the operator of server 110 is not necessarily considered HBC, or where a dishonest party exists. For example, the examples described herein can also be used in situations where a dishonest party (e.g., an unauthorized party) attempts to obtain information about which DO has been requested through unauthorized access to server 110. It is assumed that the operator of server 110 has information about the content stored on server 110 and the content sent from insecure storage, but not about the operations performed within server 110 using TEE 130.

[0063] A request for a specific DO within a dataset can begin with a handshake between ED 150 and server 110. An exemplary handshake process is described first. ED 150 sends a handshake request to server 110, which includes an identifier (ID) of the dataset containing the DO of interest, and a public key (ReqPU) for the request. Server 110 responds to the handshake message using the public key (TEE PU) of TEE 130, which has been designated to handle the request, and a digital signature of the TEE PU. ED 150 verifies the digital signature using the public key provided by the CA of TEE 130. Only genuine TEE PUs are signed by a CA. The corresponding private key (TEE PR) of TEE 130 is fully secure and can only be used within TEE 130. Therefore, ED 150 guarantees that any data encrypted with the TEE PU can only be decrypted using a certified TEE PR within TEE 130. If ED 150 verifies the digital signature, ED 150 can proceed with requesting the DO of interest. If the digital signature is not verified, ED 150 can abort the request.

[0064] Assuming the handshake process is successful, at 402, ED 150 uses TEE PU to encrypt the ID of the requested DO and sends the query to server 110.

[0065] Server 110 is unable to decrypt and read the query, and forwards the still-encrypted query to TEE 130 at a 404 error. The query is decrypted within TEE 130, making the ID of the requested DO known within TEE 130.

[0066] At 406, server 110 sends (e.g., streams) all DOs in the dataset (identified during the handshake) from insecure data storage to TEE 130. TEE 130 receives each DO but only stores (e.g., in a local buffer) the DO that corresponds to the request ID and discards all other DOs.

[0067] After 406, TEE 130 encrypts the requested DO with ReqPU (which can be stored in TEE 130's local buffer), and at 408, sends the encrypted DO to server 110 (i.e., sends the encrypted DO from TEE 130's protected environment to server 110's general environment).

[0068] At point 410, server 110 sends the encrypted DO in a response message to ED 150. Since server 110 cannot decrypt the encrypted DO, it does not know which DO has been sent to ED 150. ED 150 receives the encrypted DO and decrypts it using its private key.

[0069] In the above scheme, OT (Operational Time) is implemented. DO (Document Request) requests are delivered to server 110 without any information about which DO in the dataset was requested, thus ensuring the privacy of ED 150. However, the problem is that all DOs in the dataset must be sent from an insecure data store to TEE 130 to serve a single query. This can consume significant resources, especially when the DOs are large (e.g., large video files).

[0070] This invention describes an example that helps reduce the size of the streaming data from server 110 to TEE 130 for each query. In the example described below, multiple TEEs 130 exist within server 110. The dataset is streamed from multiple buckets, each bucket containing only a portion of the data in the dataset, and each bucket is streamed to a corresponding different TEE 130. TEEs 130 can also pool queries, meaning that multiple queries related to the same dataset can be processed simultaneously for each data stream. In this invention, the term "bucket" or "data bucket" is used to refer to a buffer or other storage from which the stored data can be streamed. In particular, data (e.g., a single DO) can be divided into separate parts (or "blocks") and stored across multiple buckets, as further described below. In this invention, the term "block" or "data block" is used to refer to a portion or fragment of data combined together to reconstruct the DO. A data block itself may not provide any meaningful content. Data blocks may include formatting (e.g., headers) to make the data blocks identifiable and manageable.

[0071] Existing solutions can use multiple buckets to stream data in parallel to multiple TEEs, but the datasets are divided into data streams in a strict manner, which does not allow similar dataset fragments to be grouped together. Existing solutions require data in the streams to be in a specific order so that the TEEs can match the streaming data blocks with queries.

[0072] To further reduce the size of the data stream per TEE, the example described in this paper can take into account the statistical properties of the dataset and reorder the data in the stream to enable data compression. This can reduce computational and memory costs compared to existing solutions.

[0073] This invention also describes exemplary techniques that enable each TEE to identify the appropriate data block to satisfy a query and enable the receiving ED to recover the requested DO from the data block. The examples provided herein describe a technique for generating unique identifiers (also known as “fingerprints” or “fingerprint-based naming”) that enables the DO to be correctly recovered from data blocks that have been merged and reordered in some way. By enabling the reordering and combination of data blocks, a higher compression ratio can be achieved for each data stream.

[0074] Refer again Figure 1 Publisher 108 can (e.g., execute instructions in a publisher software program) acquire a dataset with multiple DOs (each DO may have equal or unequal sizes) and divide the dataset into multiple buckets. The dataset can be divided into data blocks and placed in different buckets in such a way as to achieve higher compression, as further described below. Publisher 108 can further compress the data within each bucket. The dataset divided into multiple buckets and compressed can be sent by publisher 108 to be stored on a third-party server 110. Server 110 can (e.g., execute instructions in a carrier server program) store the dataset in a format provided by publisher 108 and listen for data queries from ED 150. TEE 130 within server 110 communicates with the carrier server to satisfy data queries. ED 150 (e.g., execute instructions in a client program) sends a DO query to server 110 and, upon receiving a response to the query, reassembles the requested DO from the received data blocks.

[0075] While the invention may be described as examples performed by different entities in system 100, it should be understood that aspects of the invention may be performed entirely by a single entity.

[0076] Figure 5 This is a flowchart of an exemplary method 500, which can be performed by publisher 108, for formatting a dataset into data blocks and generating identifiers for the data blocks. Although the operations performed by publisher 108 are mentioned below, it should be understood that any or all of these operations can be performed by another entity, such as third-party server 110, or another entity in system 100.

[0077] Also refer to Figures 6A to 6E Description method 500, Figures 6A to 6E An exemplary instance of method 500 performed on an exemplary dataset is shown.

[0078] In a 502 error, publisher 108 divides the DOs in the dataset into data blocks. These data blocks can have equal or unequal sizes, as described below.

[0079] Figure 6AA simple exemplary dataset 600 is shown. For example, dataset 600 consists of five DOs 605, which are ASCII strings of animal names: “ZEBRA”, “CAT”, “DOG”, “BIRD”, and “PIG”. Publisher 108 divides each DO 605 into a set of corresponding data blocks 610 (i.e., data blocks 610 that correspond to, are related to, or are derived from DO 605). In the simple example shown, each data block 610 can have the same size (e.g., one byte). Typically, each data block 610 can have any suitable size, such as 128Kb (or it can be larger or smaller). In some examples, the sizes of data blocks 610 can be unequal. For example, one or more data blocks in data block 610 can be padded with dummy data (e.g., with random data or zeros, etc.) to ensure that data blocks 610 have the same size and that each DO 605 is divided into the same number of data blocks 610. Having equal sizes and numbers of data blocks 610 corresponding to each DO 605 helps ensure data privacy, as statistics about data size cannot be used to uniquely identify any DO 605. However, publisher 108 does not necessarily need to ensure that the size or number of data blocks 610 for each DO 605 is equal. In some examples, a similar padding technique could be performed at the TEE when responding to a query, rather than at publisher 108, to ensure that data blocks 610 sent from the TEE to the server in a general environment are statistically indistinguishable.

[0080] In 504, each data block 610 is placed into bucket 615 (see...). Figure 6B Each data block 610 is assigned a specific bucket 615. In some examples, data blocks 610 may be assigned to buckets 615 such that each bucket 615 has the same number of blocks 610 from each DO 605 (e.g., exactly one block 610). As mentioned above, one or more data blocks 610 may include empty data or be padded with random data or zeros. This ensures that each bucket 615 has the same number of data blocks 610, and that each data block 610 is of equal size. This, in turn, helps to ensure data privacy, because statistics about how data blocks 610 are distributed among buckets 615 cannot be used to uniquely identify any data block 610 from any DO 605. However, in some examples, different buckets 615 may have different numbers of data blocks 610, and nevertheless, various techniques can be used to ensure data privacy, as described below.

[0081] While the present invention describes examples in which each DO 605 is divided into corresponding data blocks 610 of equal size, and each bucket 615 is allocated the same number of blocks 610, it should be understood that this is not intended to be limiting. For example, DO 605 may be divided into corresponding data blocks 610 of different sizes, DO 605 may be divided into different numbers of corresponding data blocks 610, bucket 615 may be allocated different numbers of data blocks 610, and variations thereof may exist. Even when such variations exist in the size and / or number of data blocks 610 within bucket 615, various techniques can be used to ensure data privacy. For example, if the data blocks 610 are not of equal size, padding (e.g., with random data or zeros) can be used to make the data blocks 610 appear to be the same size from the perspective of an external observer (e.g., the server operator). In another example, if unequal numbers of data blocks 610 are allocated to different buckets 615, a given bucket 615 may not include data blocks 610 from a particular DO 605. In this scenario, virtual data blocks (e.g., random data blocks) that appear to be data blocks (e.g., have the same size) can be allocated to a given bucket 615. As described above, this technique of hiding distinguishing information can be performed by the publisher 108 when partitioning and allocating data blocks 610 among buckets 615, or by the TEE when responding to a query.

[0082] It should be noted that blocks 610 are not placed in buckets 615 arbitrarily, nor are they simply assigned to buckets 615 in order or according to the order of blocks 610 in each DO 605. Instead, blocks 610 are assigned to the corresponding buckets 615 and ordered within each bucket 615 in such a way that more similar blocks 610 are grouped together and ordered to be more closely together within the same bucket 615.

[0083] For example, in Figure 6BIn this example, data block "A" from DO "ZEBRA" is assigned to bucket #4 and positioned within the same bucket #4, close to data block "A" from DO "CAT". It's important to note that data blocks "C", "A", "T", and the two "" (space characters) blocks are reordered when assigned to buckets #0 through #4, compared to the bucket assignment order of data blocks in "ZEBRA". That is, the first data block 610 in each DO 605 is not necessarily assigned to the same "first" bucket 615, and the second data block 610 is similar, and so on. In other words, the "bucket order" used to assign data blocks 610 from DO 605 to bucket 615 can differ from the original order in which data blocks 610 are arranged within DO 605. This differs from assigning each data block 610 to bucket 615 based on the order in which it appears in the original DO 605. By using the similarity of data blocks 610 to assign and order blocks 610 within bucket 615, better data compression can be achieved.

[0084] The similarity of data blocks 610 serves as the basis for allocating bucket 615 and sorting blocks 610 within bucket 615. The similarity of two data blocks 610 can be determined using any suitable computation. One example is to compute the Kullback-Leibler (KL) divergence, which compares the probability distribution of tokens for unassigned candidate blocks 610 with those for one or more blocks 610 already assigned to a given bucket 615.

[0085] An example calculation that can be performed to compute the KL divergence is:

[0086]

[0087] Where M represents the mutual information measure of similarity between two tokens, Q(x) is the token probability distribution function for a given bucket 615, P(x) is the token probability distribution function for an unallocated candidate block 610, and S(x) is the joint probability distribution function of bucket 615 and unallocated block 610. This can be represented as KL(Q(x)||S(x)), terms This can be represented as KL(P(x)||S(x)). For example, if candidate data block 610 is the same size as given bucket 615:

[0088]

[0089] Therefore, M can be rewritten as:

[0090]

[0091] Among them, item This can be represented as KL(Q(x)||1 / 2(Q(X)+P(X))), terms This can be represented as KL(P(x)||1 / 2(Q(X)+P(X))).

[0092] Therefore, another way to consider how to place data block 610 in data bucket 615 is to group block 610 into bucket 615 in such a way that the total cross-entropy of block 610 within each bucket 615 is minimized.

[0093] The following describes an exemplary algorithm that uses KL as a measure of the similarity between a given unassigned data block C and a given bucket B.

[0094] Assume the token size is fixed, for example, one byte.

[0095] Parse data block C to calculate the number of each unique token that the block contains. For example, if the block contains five letters "L", then N C ("L") = 5. The total number of tokens in the block is represented by T. C For example, if the block size is 14 bytes and the token size is fixed at 2 bytes, then:

[0096] T C =14 / 2=7

[0097] The same applies to bucket B. If bucket B contains 17 "L" tokens, it is represented as N. B ("L") = 17. T B It represents the total number of tokens in bucket B.

[0098] The pseudocode for the exemplary algorithm is as follows:

[0099] KL=0

[0100] For x, each unique token in C:

[0101] KL = KL + N C (x)*Log[N C (x)+N B (x)]

[0102] Return to KL

[0103] It should be noted that this calculation is just an example. Any suitable metric can be used to measure similarity (e.g., mutual information or cross-entropy), or to reflect the degree to which two sets of data can be compressed together.

[0104] exist Figure 6BIn the example, since each data block 610 consists of a single ASCII character, placing blocks 610 based on similarity is relatively simple. For example, the ASCII characters in one data block 610 may be the same as or different from the ASCII characters in another data block. In a more complex example, each data block 610 may consist of two ASCII characters. For example, based on similarity calculations, data block "BB" may be placed together with data block "BR" in a given bucket 615 and placed close together, while data blocks "BB" and "RA" may be placed more separately in a given bucket 615, or may be assigned to different buckets 615.

[0105] In step 506, after all data blocks 610 have been placed in bucket 615, publisher 108 generates a DO identifier for retrieving each DO from the corresponding data block 610 in bucket 615. This can be performed using steps 508-512.

[0106] In step 508, a unique identifier (referred to herein as a fingerprint value) is generated to identify each data block 610 within each data bucket 615. This can be performed using a fingerprint function. An exemplary fingerprint function for a given bucket 615 is as follows:

[0107] F(x) = x mod p

[0108] Here, x represents the value of data block 610, mod is the modulo operator, and p is a prime number. The value p can be called the eigenvalue of a given bucket 615. For example, the value p of any given bucket 615 can be determined probabilistically or through trial and error. It should be noted that p should be greater than the number of unique data blocks 610 in bucket 615 so that unique data blocks 610 are uniquely distinguishable. It should also be noted that different buckets 615 can have the same eigenvalue p.

[0109] To implement the exemplary fingerprint function described above, we need to find the smallest possible value p for a given bucket 615 such that a distinct fingerprint value F(x) is obtained for each unique data block 610 within the bucket. To minimize the fingerprint values, we need the smallest possible p, thereby reducing the number of bits required to represent each fingerprint value. Typically, the minimum number of bits required to reference a data block 610 within a given bucket 315 using a fingerprint value is the number of binary bits required to represent the feature value p, which can be calculated as a rounded-up integer value of log2 of p.

[0110] This number of bits can be called the fingerprint size, or c, and can be represented as:

[0111] c = roundup(log2(maximum fingerprint in bucket))

[0112] Figure 6CAn exemplary implementation of the fingerprint function F(x) described above is shown. In this example, 3 can be used as the feature value p to uniquely identify data block 610 in bucket #0. In other words: F(x) = x mod 3.

[0113] x is the numerical representation of each data block 610. In this example, x is the numerical equivalent of the ASCII character included in each data block 610. For example, the numerical value of the ASCII character "Z" is 90. Therefore, the fingerprint value of the data block 610 including "Z" is:

[0114] F(90) = 90 mod 3 = 0

[0115] Furthermore, the eigenvalue p = 3 leads to:

[0116] c = roundup(log2(3)) = 2

[0117] Similarly, the corresponding feature value p and bucket feature c can be used to calculate the feature value p for each data block 610 in each bucket 615. Figure 6C The fingerprint value 620 is shown. It can be noted that data blocks 610 containing the same data (e.g., the same ASCII characters) have the same fingerprint value 620. Furthermore, it can be noted that the fingerprint value 620 uniquely identifies only data blocks 610 within a given bucket 615 (e.g., “Z” in bucket #0 and “E” in bucket #1 have the same fingerprint value 0).

[0118] The above describes an exemplary fingerprint function. Different fingerprint functions can be used. Generally, a fingerprint function should be one that, when applied to each data block 610, generates a unique value for each unique data block 610 in a given bucket 615. Furthermore, a fingerprint function can be chosen such that the generated fingerprint value 620 is relatively small to reduce the number of bits required to represent the fingerprint value 620. It should be understood that, depending on the fingerprint function used, the bucket feature value p and the fingerprint size c can have different meanings, the bucket can be represented in different ways, or there can be no bucket feature value.

[0119] Using a fingerprint function, a unique identifier (referred to herein as fingerprint value 620) is generated to identify each data block 610 within each data bucket 615. By referencing a specific bucket 615 and a specific fingerprint value 620, a specific desired data block 610 (e.g., to be stored and transmitted, as further described below) can be identified.

[0120] In 510, the fingerprint values ​​620 corresponding to each data block 610 of DO 605 are combined. The combined fingerprint values ​​620 can be represented by simply concatenating the bits of the fingerprint values ​​620 in the order in which the corresponding data blocks 610 appear in DO 605. Alternatively, the fingerprint values ​​620 can be concatenated according to the order of the buckets 615.

[0121] Refer again Figure 6C For example, DO 605 "ZEBRA" can be identified by combining the fingerprint values ​​620 of the corresponding data blocks 610 "Z", "E", "B", "R" and "A" in each data bucket 615, as shown in the table below:

[0122] Z E B R A 0b00 (Barrel #0) 0b00 (Bucket #1) 0b0000 (Barrel #2) 0b01 (Bucket #3) 0b010 (Bucket #4)

[0123] Here, the symbol "0b" represents a binary integer. The bucket 615 of each data block 610 is indicated in parentheses.

[0124] In example “ZEBRA”, the corresponding data block 610 is placed into bucket 615 in bucket order that matches the original order in which data block 610 appears in DO 605:

[0125] Z E B R A Original order 0 1 2 3 4 Bucket order 0 1 2 3 4

[0126] The index number starts from 0.

[0127] However, this is not necessarily correct. For example, DO 605 "CAT" has corresponding data blocks "C", "A", and "T", as well as two empty blocks placed in bucket 615 in a bucket order different from the original order, as shown in the table below:

[0128] C A T 0b0001 (Bucket #2) 0b010 (Bucket #4) 0b00 (Barrel #3) 0b10 (Barrel #0) 0b010 (Bucket #1)

[0129] C A T Original order 0 1 2 3 4 Bucket order 2 4 3 0 1

[0130] Since data block 610 is placed into data bucket 615 based on similarity rather than order in step 504, a given DO 605 may not be recoverable simply by sequentially retrieving data block 610 from data bucket 615.

[0131] In step 512, a replacement token is generated. The replacement token provides information for reordering the corresponding data blocks 610 retrieved from each bucket 615 in order to recover the required DO 605. In the "CAT" example above, the replacement token indicates that the corresponding data block 610 retrieved from bucket #2 should be the first, followed by the corresponding data block 610 retrieved from bucket #4, and so on.

[0132] In one example, the permutation token could be based on the Lehmer permutation coding algorithm. Lehmer codes are a technique for encoding specific permutations in a sequence of numbers. More broadly, Lehmer codes are a method of representing choices from a sequence of n numbers, where each choice reduces the number of remaining numbers available for subsequent choices.

[0133] Figure 6DAn example illustrating this technique is provided. In this example, DO 605 "CAT" is recovered by retrieving the corresponding data block 610 from the corresponding bucket 615 according to the bucket order: [#2,#4,#3,#0,#1]. After retrieving the first corresponding data block 610 "C" from bucket #2, bucket #2 is no longer available (as described in step 504 above, each bucket 615 contains only one corresponding data block 610 from each DO 605), which in Figure 6D The bucket is indicated by shading. Furthermore, the remaining buckets 615 can be renumbered as buckets #0-#3. Similarly, the corresponding data block 610 "A" is then retrieved from bucket #4, which has been renumbered as #3, and that bucket is subsequently no longer available (as indicated by shading). This process continues until all corresponding data blocks 610 have been retrieved from the appropriate buckets 615. Based on the order in which each corresponding data block 610 is retrieved from bucket 615, and taking into account the renumbering of buckets 615 at each step, the permutation can be represented by the following sequence: [2,3,2,0,0].

[0134] From this sequence, a permutation token T can be generated by multiplying it by a factorial, as shown below:

[0135] T=2*4! +3*3! +2*2! +0*1! =70

[0136] This can be represented by the bit integer 0b1000110.

[0137] It can be explained that in this example with five buckets, the value of T is limited by a lower bound of 0 and an upper bound of 5!. Due to this upper bound, the substitution token can be represented by the number of bits of a rounded-up integer value equal to log2 ((total number of buckets)!). In this example, the number of bits required to represent the substitution token is roundup(log2(5!)) = 7.

[0138] Other methods can be used to generate permutation tokens. Typically, a permutation token can be any set of bits indicating the order in which data blocks retrieved from different buckets should be ordered. For example, a permutation token can be any suitable way to represent the bucket order, or any suitable way to represent the difference between the bucket order and the original order. Permutation tokens can be generated in the manner described above because the number of bits required to represent the permutation token can be relatively small.

[0139] Although the Lymer code is described as an example, it is not intended to be restrictive. Any technique can be used to distinguish the order (or permutation) of the corresponding data block 610. The permutation token can be encoded using a number of bits equal to an integer value rounded up to log2 ((total number of buckets)!).

[0140] After generating a data block fingerprint value and a replacement token for each DO 605, a DO identifier can be generated for each DO 605 by combining the fingerprint value and the replacement token for the corresponding data block 605. For example, the replacement token can be concatenated with the fingerprint value, as shown in the following example:

[0141] Replacement tokens Z E B R A 0b0000000 0b00 0b00 0b0000 0b01 0b010

[0142] Replacement tokens C A T 0b1000110 0b0001 0b010 0b00 0b10 0b010

[0143] This allows DO "ZEBRA" to be represented by the identifier 0b00000000000000001010, which is converted to the hexadecimal value 0x0000A0; DO "CAT" can be represented by the identifier 0b100011000010100010010, which is converted to the hexadecimal value 0x8C2890 (where the symbol 0x represents a hexadecimal value). Any other number base can be used to represent identifiers.

[0144] The example above generates a DO identifier, where first there is a set of permutation token bits (representing the permutation token), followed by multiple sets of fingerprint value bits, each fingerprint value bit representing a fingerprint value corresponding to the corresponding data block of the DO. The order in which these bit sets are arranged in the DO identifier may differ from that shown above. For example, the permutation token bit set may be placed at the end of the DO identifier instead of at the beginning. In another example, the fingerprint value bit sets may be arranged based on bucket order (e.g., the fingerprint value bit for the corresponding data block in bucket #0 is followed by the fingerprint value bit for the corresponding data block in bucket #1, and so on), rather than based on the order of the corresponding data blocks in the DO. Typically, the configuration of the bits in the DO identifier can be predefined and known to publisher 108, TEE 130, and ED 150 (e.g., according to a known standard, or transmitted by publisher 108) to enable TEE 130 to parse the DO identifier and ED 150 to recover the DO, as further described below.

[0145] Publisher 108 can create references (e.g., lookup tables) to associate each DO 605 with a corresponding DO identifier. This information can be published 108 along with compressed data (described below) to allow for proper recovery of each DO 605.

[0146] In step 514, each data bucket 615 is compressed. As described above, data blocks 610 are placed into their corresponding buckets 615 based on data similarity. Furthermore, the data blocks 610 within a given bucket 615 are sorted so that more similar data blocks 610 are placed closer together. This reordering of data blocks 610 aims to achieve the highest local similarity or the lowest cross-entropy.

[0147] For example, in Figure 6E In this process, each data block 610 containing two ASCII characters can be reordered so that more similar data blocks 610 are placed closer together. For example, data blocks 610 “BB” and “BR” are placed closer together, and data blocks 610 “RA” and “CA” are also placed closer together.

[0148] In some examples, data blocks 610 may be sorted by similarity within a given data bucket 615 as part of step 504. For example, unassigned data blocks 610 may be placed into a given bucket 615 and sorted based on similarity among other data blocks 610 already assigned to bucket 615. In some examples, after all data blocks 610 have been placed into bucket 615, the data blocks 610 within a given bucket 615 may be sorted by similarity. In still other examples, the data blocks 610 may be sorted within a given bucket 615 in step 514 as part of or just before compression. It should be noted that the order of data blocks 610 within data bucket 615 does not affect the generation of the DO identifier described above.

[0149] Grouping and sorting the data blocks 610 in bucket 615 in the manner described above can be used to achieve a higher compression ratio. Each bucket 615 includes data blocks 610 with relatively high mutual information (generally considered to have high similarity), and within each bucket 615, the blocks 610 are sorted in such a way that blocks 610 with high mutual information tend to be closer to each other. These characteristics are expected to improve the compression ratio when bucket 615 is compressed using a suitable adaptive statistical coding compression algorithm.

[0150] In some examples, compression of data bucket 615 can result in fewer data blocks 610 than originally allocated to bucket 615. For instance, if bucket 615 contains two identical data blocks 610, then only one instance of data block 610 may be retained during compression. Figure 6C Bucket #0 in the data structure includes one "Z" instance and four "" (space character) data block 610 instances. The compressed data in bucket #0 may include one "Z" instance and one "" (space character) instance during decompression. Since the fingerprint value 620 is used to identify data block 610 within a given bucket 615, and the same data block 610 has the same fingerprint value 620, it is not necessary to retain multiple instances of the same data block 610 in the compressed data. This can be used not only to improve compression but also, as further described below, to reduce the amount of data that must be streamed from server 110 to TEE 130.

[0151] In step 516, publisher 108 publishes compressed data of dataset 600, for example, by sending compressed data for each bucket 615 to third-party server 110. Publisher 108 may publish the dataset ID along with the compressed data. Publisher 108 may also provide information about bucket 615, which will be used by server 110 (specifically, TEE 130 within server 110) in response to queries for DO 605 in the dataset. For example, such information about bucket 615 may include the compression algorithm used, the feature value p for each bucket 615, the fingerprint function used to generate fingerprint values ​​for data blocks 610 within bucket 615, and optionally the number of bits c encoding the fingerprint value for each bucket 615 (if c is not provided by publisher 108, c may be calculated based on the feature value p, as described above). The way publisher 108 generates DO identifiers may be predefined and known to TEE 130, enabling TEE 130 to extract relevant information from the DO identifiers, as further described below.

[0152] Publisher 108 may also provide ED 150 with a dataset ID (e.g., published on a third-party server 110 and / or other publicly available databases), which ED 150 can use in queries against server 110. Publisher 108 also provides available information about DO identifiers, such as in the form of lookup tables or other references, which allows the DO identifiers of desired DOs in the dataset to be determined. In some examples, publisher 108 may not need to provide server 110 with any information referencing the DO identifier of a specific DO 605 in the published dataset 600.

[0153] After executing the exemplary method 500, the dataset 600 becomes available on the third-party server 110 so that it can be queried by one or more ED 150s and distributed to one or more ED 150s.

[0154] Figure 7 This is a signaling diagram of the content distribution example provided in the example described in this article. Figure 7 The operation can occur after compressing and publishing the dataset, as described above. Figure 5 As described. Figure 7 The example involves the operation of server 110, multiple TEEs 130 within server 110, and ED 150. Server 110 stores datasets on insecure storage devices within server 110 (e.g., Figure 2The dataset is stored in storage unit 124. The dataset is stored as a compressed bucket stream at server 110, where the compressed buckets have been generated using, for example, method 500. The number of TEEs 130 within server 110 is at least equal to the number of buckets storing the dataset; the number of TEEs 130 can be more. Each TEE 130 has a set of public and private keys that have been digitally signed by the TEE's manufacturer, CA. The private key is protected by hardware features, and the digital signature can be verified to prove that the key is indeed a protected TEE key. The operator of server 110 is considered to be an HBC. It is assumed that the operator of server 110 has information about the content stored on server 110 and the content sent from insecure storage, but not information about the operations performed using TEEs 130 within server 110.

[0155] A request for a specific DO within the dataset can begin with a handshake between ED 150 and server 110. The handshake process is similar to that described above. Figure 4 The process is described, and only briefly here. ED 150 sends a handshake request to server 110, including the ID of the dataset containing the DO of interest, and the requested public key (ReqPU). Server 110 responds to the handshake message using the public key (TEE PU) of TEE 130 and the digital signature of the TEE PU. ED 150 verifies the digital signature using the public key provided by the CA of TEE 130. If ED 150 verifies the digital signature, ED 150 can continue to request the DO of interest. If the digital signature is not verified, ED 150 can abort the request.

[0156] After verifying the digital signature, ED 150 can proceed to send the query to server 110. In step 702, ED 150 sends the query to server 110. The query includes an encrypted DO ID (e.g., encrypted with a TEE PU). It should be noted that the DO ID included in the query in step 702 is encoded with information identifying the corresponding data block of the DO stored in each bucket (e.g., a fingerprint value as described above), and the DO ID is also encoded with information used to sort the corresponding data blocks to recover the DO (e.g., a permutation token as described above). Specifically, the DO ID can be an identifier generated using exemplary method 500.

[0157] Upon receiving a query, server 110 allocates a TEE 130 to process the query. Specifically, server 110 has information about which dataset (but not which DO within the dataset) is being queried and about the number of buckets storing the dataset's DOs. Therefore, server 110 allocates one TEE 130 to process each bucket associated with the dataset, such that the total number of allocated TEEs 130 equals the total number of buckets associated with the dataset. Each TEE 130 also provides information about the characteristic value p of the bucket to which it is allocated. For example, the characteristic value p of each bucket may be known to server 110 (e.g., published by the publisher), and server 110 may provide appropriate information to each TEE 130.

[0158] In a 704 error, the encrypted query, including the DO ID, is forwarded to each TEE 130. It can be explained that the same encrypted query is forwarded to each TEE 130 because server 110 does not know which TEE 130 will be responsible for retrieving which data block from the corresponding allocation bucket.

[0159] In 706, each TEE 130 (using a TEE private key (TEE PR)) decrypts the query. Each TEE 130 can parse the DO ID based on its known configuration. For example, each TEE 130 can identify which bits in the DO ID correspond to the set of fingerprint value bits associated with its allocation bucket. Therefore, each TEE 130 determines the correct fingerprint value for the corresponding data block in its respective allocation bucket from the DO ID.

[0160] In 708, server 110 streams the compressed data from each bucket to the corresponding allocation TEE 130.

[0161] Each TEE 130 receives a compressed data stream from server 110 for its corresponding allocation bucket. For simplicity, the operation of a single given TEE 130 allocated to a given bucket will now be described. It should be understood that the following operations are performed at each TEE 130 allocated to each corresponding bucket. In 710, TEE 130 decodes (or decompresses) the compressed data stream (e.g., using the token frequency table described above) to recover the data blocks stored in the allocation bucket. TEE 130 calculates a fingerprint value for each data block (e.g., using a fingerprint function) and compares each calculated fingerprint value with the correct fingerprint value previously determined for the corresponding data block. TEE 130 only stores (e.g., cached in a buffer) the data blocks whose calculated fingerprint values ​​match the correct fingerprint values.

[0162] After each TEE 130 decodes and stores the corresponding data block, all data blocks corresponding to the requested DO are stored in TEE 130 (e.g., cached in a buffer). In 712, each TEE encrypts its correct data block (e.g., using ReqPU encryption provided by ED 150 during the handshake process). In 714, the encrypted data block is sent by each TEE 130 from the corresponding TEE 130 to server 110 (i.e., from the protected environment of the corresponding TEE 130 to the general environment of server 110).

[0163] Server 110 collects responses from all TEEs 130 that have been assigned to buckets. At 716, server 110 sends encrypted data blocks to ED 150. The encrypted data blocks may include information identifying which data block originated from which bucket (e.g., by labeling the encrypted data blocks with the corresponding bucket number). This information can be used by ED 150 to reorder the data blocks at 720 below. Alternatively, server 110's transmissions may implicitly indicate which encrypted data block originated from which bucket, for example, by sorting the encrypted data blocks in transmission according to the bucket's numerical order, rather than explicitly providing this information.

[0164] After receiving the encrypted data block, ED 150 decrypts the data block (e.g., using the private key associated with the request (ReqPR)) (in 718) and recovers the DO by reordering the corresponding data blocks using a substitution token that is part of the DO ID (in 720). As mentioned above, the DO ID includes a set of substitution token bits at a specific position (e.g., at the beginning) in the DO ID. ED 150 knows how the DO ID is configured so that ED 150 can parse the DO ID to extract the substitution token. Although 718 and 720... Figure 7 The data blocks are shown in a specific order, but it should be understood that in some examples, 720 can be executed before 718. That is, ED 150 can first use the substitution token to restore the correct order of the data blocks in the DO, and then use ReqPR to decrypt the data blocks to recover the DO.

[0165] Figure 7 Operations and interactions between server 110, its TEE 130, and ED 150 are illustrated for ease of understanding. In some aspects, the invention can be embodied in operations performed in any of these entities (for the purposes of this description, TEE 130 is considered a separate entity from server 110, but TEE 130 can be a physical part of server 110). Flowcharts describing operations at server 110, and at a given TEE 130 and ED 150 of server 110, are now presented.

[0166] Figure 8 It can be used as Figure 7 The flowchart illustrates a portion of the operation, specifically an exemplary method 800 performed by server 110. In particular, method 800 can be executed by a carrier server program running on server 110, and outside of TEE 130. Details of some steps can be found in [reference needed]. Figure 7 The description will be provided, and will not be repeated here.

[0167] In the flowchart shown, it is assumed that server 110 has received and stored compressed data published by publisher 108, and has received the dataset ID from ED 150 (e.g., during the handshake process). In some examples, the steps of receiving compressed data from publisher 108 and / or receiving the dataset ID from ED 150 can be considered as part of method 800.

[0168] In 802, server 110 receives a query from ED 150 (e.g., similar to...). Figure 7 (702). The query includes an encrypted DO ID (e.g., encrypted with TEE PU). It should be noted that the DO ID is encoded with information identifying the corresponding data block of the DO stored in each bucket (e.g., a fingerprint value as described above), and the DO ID is also encoded with information used to sort the corresponding data blocks in order to recover the DO (e.g., a replacement token as described above).

[0169] In the 804 error, server 110 assigns TEE 130 to each data bucket containing the compressed dataset. Server 110 forwards queries with still encrypted DO IDs to TEE 130 (e.g., similar to...). Figure 7 704).

[0170] In 806, server 110 streams compressed data from each bucket to the corresponding allocation TEE 130 (e.g., similar to...). Figure 7 (708). It should be noted that the streaming of compressed data from each bucket can be performed using parallel data streaming.

[0171] In 808, server 110 receives encrypted data blocks from each TEE 130 (e.g., similar to...). Figure 7 (714). Encrypted data blocks can be received in parallel from all TEEs 130. Encrypted data blocks can only be received from TEEs 130 after all data has been streamed in step 806. This helps protect data privacy by ensuring that server 110 cannot infer which part of the data stream corresponds to an encrypted data block. Server 110 collects encrypted data blocks from all TEEs 130 allocated in step 804.

[0172] In 810, server 110 sends the still encrypted data blocks to ED 150 (e.g., similar to...). Figure 7 (716). The transmission may be in the form of a response to the query received in step 802.

[0173] It should be noted that, in Figure 7 Throughout the operation of method 800, server 110 may not be aware of the format or configuration of the DO ID. In particular, server 110 may not be aware that data blocks have been placed into buckets based on similarity, and that the DO ID includes a permutation token.

[0174] Figure 9 It can be used as Figure 7 The flowchart illustrates a portion of the operation performed by a given TEE 130 within server 110, representing an exemplary method 900. Specifically, method 900 can be executed by TEE 130 using a TEE program and is protected from inspection by server 110. Details of some steps can be found in [reference needed]. Figure 7 The description will be provided, and will not be repeated here.

[0175] In the flowchart shown, it is assumed that TEE 130 has been assigned a specific bucket and has been informed of its assigned bucket number (or other identifier). In some examples, the step of receiving the assigned bucket identifier may be included in method 900.

[0176] In 902, TEE 130 receives the encrypted DO ID from server 110 into TEE 130 (e.g., similar to...). Figure 7 (704). TEE 130 decrypts the DO ID (e.g., using its private key). As described above, the DO ID is encoded with information identifying the corresponding data block of the DO stored in each bucket (e.g., the fingerprint value as described above), and the DO ID is also encoded with information used to sort the corresponding data blocks in order to recover the DO (e.g., the permutation token as described above).

[0177] In 904, the TEE 130 determines the correct fingerprint value of the data block belonging to its assigned data bucket from the decrypted DO ID (e.g., similar to...). Figure 7 (706). For example, TEE 130 can parse the DO ID according to its known format in order to determine the set of fingerprint value bits (and therefore fingerprint values) belonging to its allocation bucket. It can be noted that, depending on the format of the DO ID, TEE 130 may need to use the permutation token encoded in the DO ID to determine which fingerprint value bit encoded in the DO ID corresponds to its allocation bucket.

[0178] In 906, TEE 130 receives a data stream from server 110 including data blocks belonging to its allocation bucket (e.g., similar to...). Figure 7 (708). The data stream can be compressed data, and the TEE 130 can decode (or decompress) the streaming data.

[0179] In step 908, the TEE 130 calculates the fingerprint value of each data block in the data stream and identifies its fingerprint value in step 904 (e.g., similar to...). Figure 7 The specific data block that matches the correct fingerprint value determined by (710) is cached in the TEE130's buffer. Other data blocks may be discarded.

[0180] In 910, TEE 130 encrypts the data block (e.g., using the public key provided by ED 150) and sends the encrypted data block from TEE 130 to server 110 (e.g., similar to...). Figure 7 (712 and 714).

[0181] Figure 10 It can be used as Figure 7 The flowchart illustrates a portion of the operation performed by ED 150, specifically an exemplary method 1000. In particular, method 1000 can be executed by a client program performed by ED 150. Details of some steps can be found in [reference needed]. Figure 7 The description will be provided, and will not be repeated here.

[0182] In the flowchart shown, it is assumed that ED 150 has verified TEE 130 (e.g., using a handshake process) and has provided the dataset ID to server 110.

[0183] In step 1002, ED 150 sends a DO query to server 110 (e.g., similar to...). Figure 7 (702). The query includes an encrypted DO ID (e.g., encrypted with TEE PU). It should be noted that the DO ID is encoded with information identifying the corresponding data block of the DO stored in each bucket (e.g., a fingerprint value as described above), and the DO ID is also encoded with information used to sort the corresponding data blocks in order to recover the DO (e.g., a replacement token as described above).

[0184] In 1004, ED 150 responds to a query to receive encrypted data blocks from server 110 (e.g., similar to...). Figure 7 (716). Data blocks can be encrypted using the public key provided by ED 150. ED 150 can use its private key (e.g., similar to...). Figure 7 (718) decrypts the data block.

[0185] In step 1006, ED 150 resolves the DO ID to determine the replacement token. The replacement token provides information that enables ED 150 to reorder the data blocks into the correct order to recover the requested DO.

[0186] In step 1008, ED 150 reorders the decrypted data blocks to recover the requested DO (e.g., similar to...). Figure 7 (720). It should be noted that in some examples, ED 150 can reorder encrypted data blocks and then decrypt them after reordering.

[0187] While methods 800 and 900 have been described above regarding a single DO query from one ED 150, in some examples, server 110 and TEE 130 can use similar steps to serve multiple DO queries from one or more ED 130s in parallel. Similarly, although method 1000 has been described above regarding a single DO query from ED 150, in some examples, ED 150 can use similar steps to query more than one DO from one or more datasets in parallel. Other variations of this type can be employed. Furthermore, in some examples, TEE 130 can be assigned multiple buckets (e.g., from the same dataset) for parallel processing.

[0188] While this invention references TEE, it should be understood that examples can be implemented using other forms of trusted intermediate environments, including any trusted environment within a server, within an ED, or at some other network entity.

[0189] This invention describes querying a single DO using a single DO ID, wherein the DO ID encodes information for identifying the corresponding data block within a data bucket and information for reordering the corresponding data blocks to recover the DO. In some examples, the information for identifying the corresponding data block (e.g., a fingerprint value for each data block within each corresponding bucket) and the information for reordering the corresponding data blocks (e.g., a replacement token) can be encoded in separate data blocks. For example, if the TEE does not require a replacement token (e.g., the TEE is able to identify the correct fingerprint value of its allocation bucket without needing to know the replacement token), the replacement token may not be encoded in the DO ID sent in the query. Instead, the ED can retain the replacement token locally for reordering data blocks.

[0190] In the various examples described herein, the present invention enables more efficient data stream grouping that takes into account statistical properties regarding the compressibility of data objects. This can lead to more efficient and / or more effective data compression.

[0191] The present invention also describes examples for generating data object identifiers using a fingerprint-based scheme. Data object identifiers generated according to the examples of the present invention can provide information to allow for the recovery of data objects, regardless of how data blocks are merged and / or reordered between data buckets. Because the data object identifier encodes information for identifying corresponding data blocks between data buckets and information for reordering data blocks, this makes it possible to reshape the data stream to achieve a higher compression ratio.

[0192] While this invention describes examples within the context of a content-delivery network (CDN), it should be understood that the invention can be applied to other fields. The examples described herein can be used to implement different open network architectures (e.g., for designing network protocols), and so on. For example, the invention can be used to protect data privacy where content IDs may be exposed during network packet transmission.

[0193] Furthermore, although the present invention describes examples in a static data context, the examples described herein can be extended to server-side dynamic request evaluation with protection from the server operator.

[0194] Although the present invention describes methods and processes using steps in a certain order, one or more steps of the methods and processes may be omitted or modified as appropriate. One or more steps may be performed sequentially, but not in the order described (as the case may be).

[0195] Although the invention has been described, at least in part, those skilled in the art will understand that the invention also relates to various components for performing at least some aspects and features of the described methods by means of hardware components, software, or any combination of both. Accordingly, the technical solutions of the invention can be embodied in the form of a software product. Suitable software products can be stored in pre-recorded storage devices or other similar non-volatile or non-transitory computer-readable media, such as DVDs, CD-ROMs, USB flash drives, removable hard drives, or other storage media. The software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, server, or network device) to perform the method examples disclosed herein.

[0196] The invention may be embodied in other specific forms without departing from the spirit of the claims. The exemplary embodiments described are to be regarded in all respects as illustrative rather than restrictive. Selected features from one or more of the foregoing embodiments may be combined to create alternative embodiments not explicitly described, and it is understood that features suitable for such combinations are within the scope of the invention.

[0197] All values ​​and sub-ranges within the scope of disclosure are also disclosed. Furthermore, while the systems, devices, and processes disclosed and illustrated herein may include a specific number of elements / components, these systems, devices, and components may be modified to include more or fewer such elements / components. For example, while any element / component disclosed may be referenced as a single quantity, embodiments disclosed herein may be modified to include multiple such elements / components. The subject matter described herein is intended to cover and include all suitable technical modifications.

Claims

1. A method for a trusted environment within a server, characterized in that, include: A data object identifier that identifies the requested data object is received in the trusted environment. The requested data object is stored as multiple corresponding data blocks in multiple corresponding data buckets. The data object identifier is encoded with information that identifies each data block in the multiple corresponding data blocks in the corresponding data bucket. The allocated data bucket has been allocated to the trusted environment, and the data stream corresponding to the allocated data bucket is received in the trusted environment. The data stream includes all data blocks stored in the allocated data bucket. Using the information that identifies each of the plurality of corresponding data blocks, determine which data block in the data stream is the corresponding data block received from the allocated data bucket; The corresponding data block is sent from the trusted environment to the server; the information identifying the multiple corresponding data blocks consists of multiple fingerprint values, each fingerprint value uniquely identifying the corresponding data block within a corresponding data bucket; determining the corresponding data block includes: The correct fingerprint value is determined from the data object identifier, the correct fingerprint value identifying the corresponding data block of the allocated data bucket; The fingerprint value is determined for each data block in the data stream; The fingerprint value of each data block in the data stream is compared with the correct fingerprint value to determine the corresponding data block.

2. The method according to claim 1, characterized in that, Also includes: The data stream is received from the server into the trusted environment.

3. The method according to claim 1 or 2, characterized in that, The data object identifier is also used to encode information for sorting the corresponding data blocks to recover the requested data object.

4. The method according to claim 3, characterized in that, Determining which data block among the data blocks is the corresponding data block includes: using the information used to sort the corresponding data blocks to determine the information identifying the corresponding data block of the allocated data bucket.

5. The method according to claim 3, characterized in that, The information used to sort the corresponding data blocks is a permutation token.

6. The method according to any one of claims 1-2 and 4-5, characterized in that, Also includes: The data object identifier is received as an encrypted data object identifier in the trusted environment; The data object identifier is decrypted using the private key of the trusted environment; The corresponding data block is sent from the trusted environment to the server as an encrypted corresponding data block.

7. The method according to any one of claims 1-2 and 4-5, characterized in that, Also includes: Perform the following steps: Receive the data object identifier; Determine the corresponding data block; And for the first received data object identifier and the second received data object identifier, the corresponding data blocks are sent in at least partially parallel.

8. The method according to claims 1-2 and 4-5, characterized in that, The data stream includes compressed data, and the method further includes decompressing the compressed data.

9. A method for publishing a server, characterized in that, include: For each given data object among a plurality of data objects: The given data object is divided into multiple corresponding data blocks; Based on the similarity between each corresponding data block and any other data block allocated to the corresponding data bucket, each of the plurality of corresponding data blocks is allocated to a corresponding data bucket among the plurality of data buckets; In this context, all data blocks of all data objects are assigned to one of the multiple data buckets; For each given data object, a data object identifier is generated for recovering the given data object from the plurality of data buckets. The data object identifier is encoded with information identifying each data block among the plurality of corresponding data blocks within the corresponding data bucket. Generating the data object identifier includes: for each given data bucket, determining a fingerprint value for each data block allocated to the given data bucket, the fingerprint value uniquely identifying each corresponding data block within the given data bucket; and for each given data object, encoding the fingerprint value of each corresponding data block in the data object identifier. For each given data object, the data object identifier is also used to encode information for sorting the corresponding data blocks to recover the requested data object; wherein, generating the data object identifier further includes: determining the bucket order, allocating each corresponding data block to a corresponding data bucket according to the bucket order; generating a permutation token representing the bucket order; encoding the permutation token in the data object identifier; and compressing each data bucket into a corresponding compressed dataset; Publish the compressed dataset and the generated data object identifier.

10. The method according to claim 9, characterized in that, The data blocks assigned to each corresponding data bucket are sorted according to similarity within each corresponding data bucket.

11. The method according to claim 9 or 10, characterized in that, Each data bucket has been allocated a single corresponding data block from each data object.

12. The method according to claim 9 or 10, characterized in that, Each data bucket is compressed using adaptive statistical coding compression.

13. A method for use in an electronic device, characterized in that, include: Send a query for a data object, the query including a data object identifier; The data object identifier sent in the query is encoded with information identifying each data block among multiple data blocks in the corresponding data bucket, and the data object identifier is also used to encode the information for sorting the data blocks; In response to a query on the data object, the plurality of data blocks are received; Information for sorting the data blocks to recover the data object is determined from the data object identifier associated with the data object; the information for sorting the data blocks includes a permutation token encoded in the data object identifier; The data blocks are reordered to restore the data object.

14. The method according to claim 13, characterized in that, Also includes: Receive the plurality of data blocks as a plurality of encrypted data blocks; The private key is used to decrypt the multiple encrypted data blocks.