A method for supporting correlation analysis and active recommendation of ciphertext storage data
By constructing an undirected graph of labels and a global inverted index for encrypted files, combined with public-key encryption technology, the problem of inaccurate user queries in encrypted text retrieval is solved. This enables encrypted text association analysis and recommendation in an untrusted cloud server environment, ensuring data security and query efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NO 30 INST OF CHINA ELECTRONIC TECH GRP CORP
- Filing Date
- 2023-12-12
- Publication Date
- 2026-06-26
AI Technical Summary
Existing encrypted retrieval technologies cannot perform encrypted correlation analysis and recommendations in an untrusted cloud server environment, resulting in inaccurate and inconvenient user queries.
By constructing an undirected graph of labels for encrypted files, recommendations are made based on label correlation, and search results are fed back using a global inverted index priority queue, combined with public-key encryption technology to ensure data security.
Without disclosing sensitive data, it enables association analysis and retrieval recommendation of encrypted keywords, improving the convenience and accuracy of user queries.
Smart Images

Figure CN117668367B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data search technology, and more specifically, to a method for association analysis and proactive recommendation of densely stored data. Background Technology
[0002] Association analysis in plaintext data search is a practical analytical technique aimed at discovering objective patterns in large amounts of data, such as in shopping basket analysis. By analyzing customer purchasing and query habits, relevant user models are built, marketing strategies are formulated, and personalized services are provided to users. Association analysis is an implicit way of analyzing user data, while search systems are an explicit way of filtering information by requiring users to provide keywords; both aim to solve the problem of information overload. When users cannot fully express their needs using keywords, search systems cannot fully perform. Recommendation technology can compensate for the shortcomings of search systems, but recommendation technology is usually not independent and needs to rely on search technology, such as Amazon's e-commerce recommendations, Youku's movie recommendations, Douban's music recommendations, and Toutiao's news recommendations. The recommendation algorithms for plaintext search vary greatly, but they all lead to the same goal: to provide better recommendations for users while maximizing the benefits for information owners.
[0003] Searchable encryption technology supports direct querying of encrypted stored data and has been valued by industry and academia. Currently, it has developed into a variety of encrypted retrieval applications. However, due to the characteristic of "ciphertext querying ciphertext", searchable encryption technology can only achieve exact matching. In real-world scenarios, users may not have complete or accurate memory of keywords. Furthermore, encrypted search servers cannot perform ciphertext association analysis. In other words, deterministically encrypted ciphertext indexes cannot provide search result recommendations like ordinary search engines, which limits the efficiency of search services. Summary of the Invention
[0004] Security requirements based on searchable encryption technology stipulate that users cannot disclose information other than the search pattern and access pattern after a search. "Search pattern" refers to information extracted from the user's search data, such as whether multiple searches correspond to the same phrase (encrypted state). "Access pattern" refers to the keywords contained in the file (unique identifier) (encrypted state). "Search pattern" and "access pattern" are information that is allowed to be disclosed after a user performs a search operation under the searchable encryption model. This invention aims to provide a method for supporting association analysis and proactive recommendation of encrypted stored data, based on the security premise of search and access patterns, balancing the security of user-stored data with the full utilization of search service efficiency.
[0005] This invention provides a method for association analysis and proactive recommendation of densely stored data, comprising the following steps:
[0006] S1, the uploading user uploads the encrypted file and the encrypted index of the encrypted file to the cloud server. The encrypted index contains the ciphertext of each keyword in the encrypted file, and each keyword ciphertext is regarded as a tag of the encrypted file.
[0007] S2, the cloud server constructs a labeled undirected graph using the labels of encrypted files;
[0008] S3: Search users submit search tags to the cloud server; the cloud server analyzes the undirected tag graph based on the search tags and extracts recommended tags with high relevance.
[0009] S4 retrieves the encrypted files corresponding to the search tags and recommendation tags.
[0010] Furthermore, in step 2, the cloud server constructs a labeled undirected graph using the labels of the encrypted files, including:
[0011] When the cloud server receives an encrypted file, it pairs the n tags associated with the encrypted file together.<tagi,tagj> Yes, then a single encrypted file has n(n-1) / 2 pairs.<tagi,tagj> right;
[0012] The cloud server uses n(n-1) / 2 pairs to encrypt the file.<tagi,tagj> Update the undirected label graph that represents the associations between different labels; the nodes in the undirected label graph are labels, and two labels are connected to form an edge, with each edge representing a pair.<tagi,tagj> Yes, the edge value represents the pair.<tagi,tagj> Regarding the frequency of occurrence.
[0013] Furthermore, updating the undirected graph representing the associations between different labels includes:
[0014] If the two tags taggi and tagj of the uploaded encrypted file form a pair<tagi,tagj> Yes, the cloud server determines whether a pair exists in the currently labeled undirected graph.<tagi,tagj> If a pair of edges does not exist in the undirected graph, it indicates a pair.<tagi,tagj> For a pair, create a new edge containing two nodes, tagi and tagj, to represent the pair.<tagi,tagj> Yes; if a pair already exists, it represents a pair.<tagi,tagj> For a matching edge, increment the weight of that edge by 1.
[0015] Furthermore, in step S3, the recommended tags with high relevance are extracted, including:
[0016] In an undirected label graph, query all labels adjacent to the search label;
[0017] The tags with higher edge weights connecting search tags and adjacent tags are the recommended tags.
[0018] Furthermore, when a user submits a query request for several search tags, the cloud server first traverses the tag graph to find all tags adjacent to the search tag, sorts the adjacent tags according to the edge weight between the search tag and the adjacent tags, analyzes the tags with high relevance to the submitted search tag, and selects the tags with the highest weight as recommended tags.
[0019] Furthermore, in step S4, the encrypted files corresponding to the search tags and recommendation tags are retrieved, including:
[0020] The cloud server maintains a global inverted index, where the label is the key and the filename or file pointer of the encrypted file is the value. The queue of filenames or file pointers of the encrypted files is sorted in a max-heap manner according to the frequency of the association between the label and the encrypted file.
[0021] The cloud server queries the global inverted index for search tags and recommendation tags, reads the file priority queue of these tags, outputs several items at the head of the queue, and completes the search result feedback.
[0022] Furthermore, when a user uploads an encrypted file, the tag is encrypted using the public key pk that the cloud server sent to the user in advance; when a user searches using a search tag, the search tag trapdoor value is uploaded to the cloud server; the cloud server executes Test(pk,Cw,Tw) ’ The algorithm performs keyword matching, where pk is the search user's public key, Cw is the keyword ciphertext index, and Tw... ’ The search tag trapdoor value generated for the search user and sent to the cloud server.
[0023] In summary, due to the adoption of the above technical solution, the beneficial effects of the present invention are:
[0024] This invention supports recommending keywords or files with high relevance to the user's search terms during the search process; it enables users to perform association analysis and retrieval recommendations of encrypted keywords in an untrusted cloud server environment without disclosing sensitive data, thereby expanding the practicality of encrypted retrieval technology and improving the convenience and accuracy of user queries. Attached Figure Description
[0025] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings in the embodiments will be briefly described below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0026] Figure 1 This is a schematic diagram illustrating the principle of the association analysis and proactive recommendation method supporting densely stored data in this embodiment of the invention.
[0027] Figure 2 This is a schematic diagram of an undirected graph with labels in an embodiment of the present invention.
[0028] Figure 3 This is a schematic diagram of a global inverted index in an embodiment of the present invention. Detailed Implementation
[0029] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0030] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.
[0031] Example
[0032] This embodiment proposes a method for supporting association analysis and proactive recommendation of encrypted storage data. It is designed for encrypted cloud storage scenarios, enabling cloud servers to perform association analysis on users' encrypted data and recommend highly relevant encrypted files to users when they search for encrypted files based on the association analysis results.
[0033] like Figure 1 As shown, the method for association analysis and proactive recommendation of densely stored data includes the following steps:
[0034] S1, the uploading user uploads the encrypted file and its encrypted index to the cloud server. The encrypted index contains the ciphertext of each keyword in the encrypted file, and each keyword ciphertext is regarded as a tag of the encrypted file. Assuming that an encrypted file has n keyword ciphertexts, from the perspective of the cloud server, the encrypted file has n tags.
[0035] S2, the cloud server constructs a labeled undirected graph using the labels of encrypted files; the details are as follows:
[0036] When the cloud server receives an encrypted file, it pairs the n tags associated with the encrypted file together.<tagi,tagj> Yes, then a single encrypted file has n(n-1) / 2 pairs.<tagi,tagj> right.
[0037] The cloud server uses n(n-1) / 2 pairs to encrypt the file.<tagi,tagj> Update the undirected graph of labels that represents the relationships between different labels, such as... Figure 2 As shown: In an undirected graph of labels, the nodes are labels, and two labels are connected to form an edge. Each edge represents a pair.<tagi,tagj> Yes, the edge value represents the pair.<tagi,tagj> The frequency of occurrence is considered. For example, when an encrypted file named file1 is uploaded to the cloud server, it carries two ciphertext keywords. From the cloud server's perspective, these two ciphertext keywords correspond to the two tags, tag1 and tag2. The cloud server first determines whether a pair exists in the current undirected graph of tags.<tag1,tag2> If a pair of edges does not exist in the undirected graph, it indicates a pair.<tag1,tag2> For a pair, create a new edge containing the two nodes tag1 and tag2 to represent the pair.<tag1,tag2> Yes; if a pair already exists, it represents a pair.<tag1,tag2> For a matching edge, increment the weight of that edge by 1.
[0038] S3, the search user submits search tags to the cloud server; the cloud server analyzes the undirected tag graph based on the search tags and extracts recommended tags with high relevance (i.e., among all tags adjacent to the search tag, the tag with the higher edge weight).
[0039] Specifically, when a user submits a query request for several search tags, the cloud server first traverses the tag graph to find all tags adjacent to the search tag, sorts the adjacent tags according to the edge weight between the search tag and the adjacent tags, analyzes the tags with high relevance to the submitted search tag, and selects the tags with the highest weight as recommended tags.
[0040] S4 retrieves the encrypted files corresponding to the search tags and recommendation tags.
[0041] The cloud server maintains a global inverted index, such as Figure 3As shown, the global inverted index uses tags as keys and encrypted filenames (or file pointers) as values. The queue of encrypted filenames (or file pointers) is sorted in a max-heap manner according to the frequency of association between tags and encrypted files. That is, the value of each tag in the global inverted index is greater than or equal to the values of its left and right child tags, satisfying Arr[i] >= arr[2i+1] && Arr[i] >= arr[2i+2]. For example, when an encrypted file file1 uploaded by a user precisely carries tags tag1 and tag2, the cloud server finds the positions of tags tag1 and tag2 in the global inverted index and updates the priority queue composed of filenames (or file pointers) so that the encrypted file with the highest frequency of association with the corresponding tag is placed at the head of the queue.
[0042] The cloud server queries the global inverted index for search tags and recommendation tags, reads the file priority queue of these tags, outputs several items at the head of the queue, and completes the search result feedback.
[0043] To ensure data storage security, when a user uploads an encrypted file, the tag is encrypted using the public key 'pk' pre-sent by the cloud server. When a user searches using a search tag, they can upload the trapdoor value of the search tag to the cloud server, which then executes Test(pk, Cw, Tw). ’ The algorithm performs keyword matching, where pk is the search user's public key, Cw is the keyword ciphertext index, and Tw... ’ The search tag trapdoor value generated for search users and sent to the cloud server ensures the security of user information while complying with the PEKS algorithm process.
[0044] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for association analysis and proactive recommendation of densely stored data, characterized in that, Includes the following steps: S1, the uploading user uploads the encrypted file and the encrypted index of the encrypted file to the cloud server. The encrypted index contains the ciphertext of each keyword in the encrypted file, and each keyword ciphertext is regarded as a tag of the encrypted file. S2, the cloud server constructs a labeled undirected graph using the labels of encrypted files; S3, users submit search tags to the cloud server; The cloud server analyzes the undirected graph of tags based on search tags and extracts recommended tags with high relevance. S4 retrieves the encrypted files corresponding to the search tags and recommendation tags; In step S2, the cloud server constructs a labeled undirected graph using the labels of the encrypted files, including: When the cloud server receives an encrypted file, it pairs the n tags associated with the encrypted file together.<tagi, tagj> Yes, then a single encrypted file has n(n-1) / 2 pairs.<tagi, tagj> right; The cloud server uses n(n-1) / 2 pairs to encrypt the file.<tagi, tagj> Update the undirected label graph that represents the associations between different labels; the nodes in the undirected label graph are labels, and two labels are connected to form an edge, with each edge representing a pair.<tagi, tagj> Yes, the edge value represents the pair.<tagi, tagj> Regarding the frequency of occurrence.
2. The method for association analysis and proactive recommendation of densely stored data according to claim 1, characterized in that, Updating an undirected labeled graph representing the relationships between different labels includes: If the two tags taggi and tagj of the uploaded encrypted file form a pair<tagi, tagj> Yes, the cloud server determines whether a pair exists in the currently labeled undirected graph.<tagi, tagj> If a pair of edges does not exist in the undirected graph, it indicates a pair.<tagi, tagj> For a pair, create a new edge containing two nodes, tagi and tagj, to represent the pair.<tagi,tagj> Yes; if a pair already exists, it represents a pair.<tagi, tagj> For a matching edge, increment the weight of that edge by 1.
3. The method for association analysis and proactive recommendation of densely stored data according to claim 1, characterized in that, In step S3, the recommended tags with high relevance are extracted, including: In an undirected label graph, query all labels adjacent to the search label; The tags with higher edge weights connecting search tags and adjacent tags are the recommended tags.
4. The method for association analysis and proactive recommendation of densely stored data according to claim 3, characterized in that, When a user submits a query request with several search tags, the cloud server first traverses the tag graph to find all tags adjacent to the search tag, sorts the adjacent tags according to the edge weight between the search tag and the adjacent tags, analyzes the tags with high relevance to the submitted search tag, and selects the tags with the highest weight as recommended tags.
5. The method for association analysis and proactive recommendation of densely stored data according to claim 1, characterized in that, In step S4, the encrypted files corresponding to the search tags and recommendation tags are retrieved, including: The cloud server maintains a global inverted index, where the label is the key and the filename or file pointer of the encrypted file is the value. The queue of filenames or file pointers of the encrypted files is sorted in a max-heap manner according to the frequency of the association between the label and the encrypted file. The cloud server queries the global inverted index for search tags and recommendation tags, reads the file priority queue of these tags, outputs several items at the head of the queue, and completes the search result feedback.
6. The method for association analysis and proactive recommendation of densely stored data according to claim 1, characterized in that, When uploading encrypted files uploaded by users, the public key that the cloud server has sent to the user in advance is used. pk The tags are encrypted with a public key; when a user searches using the tags, the search tag trapdoor value is uploaded to the cloud server. Cloud server execution Test(pk, Cw, Tw ’ ) The algorithm performs keyword matching, where, pk To search for the user's public key, Cw For keyword encrypted index, Tw ’ The search tag trapdoor value generated for the search user and sent to the cloud server.