A multi-dimensional missing data combination preference skyline query method for intelligent recommendation

By optimizing Skyline queries for multidimensional missing data using weighted classification trees and WCT-R*-Tree index structures, the problems of high computational cost and long response time of traditional algorithms for multidimensional missing data are solved, and efficient and accurate combined preference Skyline queries are achieved.

CN122240695APending Publication Date: 2026-06-19HARBIN UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HARBIN UNIV OF SCI & TECH
Filing Date
2026-04-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing Skyline query algorithm suffers from high computational overhead and long response time when faced with multidimensional missing data, and is prone to memory overflow under combined preferences, making it difficult to meet the responsiveness and decision-making requirements of intelligent recommendation systems.

Method used

We use a weighted classification tree (WT-MC) to classify the dataset, construct a WCT-R*-Tree index structure, and optimize the query process by reducing the number of tuple comparisons through bucket optimization and merging, preference skyline query and pruning, combined with combined preference skyline incremental generation and domination processing.

Benefits of technology

It effectively reduces the invalid search space, lowers computational overhead, improves query efficiency and accuracy, and meets the complex preference requirements under multi-objective decision-making.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240695A_ABST
    Figure CN122240695A_ABST
Patent Text Reader

Abstract

This invention discloses a Skyline query method for multidimensional missing data combined with preferences for intelligent recommendation. The invention utilizes a weighted classification tree for weighted classification preprocessing of multidimensional missing data, classifying data with similar missing patterns, and combines bucket optimization and incremental generation strategies to achieve efficient querying. The method includes: performing weighted preprocessing and partitioning of data using a multidimensional missing data classification algorithm based on a weighted classification tree to form a set of leaf node buckets; optimizing and merging the classified data into buckets to construct a WCT-R*-Tree index; performing preference Skyline querying and pruning based on the index structure; and performing bidirectional dominance checks using a combined preference Skyline generation and dominance algorithm to output a result set. This invention can significantly reduce the number of tuple comparisons, lower computational overhead, and improve pruning efficiency, solving the problems of low efficiency and insufficient accuracy of existing methods when processing multidimensional missing data and complex combined preferences. It can be widely applied in the fields of data query and intelligent recommendation technology, such as intelligent housing recommendation and product matching.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This invention relates to the field of data processing and intelligent recommendation technology, specifically a Skyline query method for multidimensional missing data combination preferences for intelligent recommendation. Background Technology

[0002] Skyline queries, designed for recommender systems, are used to solve multi-objective decision-making problems. A Skyline query is a query method for finding optimal solutions in a multi-attribute dataset. Its goal is to find a set of data objects (also called records or data points) that outperform other objects on multiple attributes; that is, they have the best performance or value on certain attributes. In practical applications, Skyline queries are used in various fields such as data mining, decision support, and multi-objective optimization. It helps users filter the most valuable or promising choices from large amounts of data, providing reference and insights for decision-making.

[0003] With the development of big data, data in fields such as intelligent transportation and financial risk control often exhibits multidimensional characteristics and contains missing values. Simultaneously, users' decision-making needs are no longer limited to single-dimensional comparisons but involve complex combinations of preferences. Most existing Skyline algorithms are designed for complete data; when faced with missing data, simple imputation introduces bias, while direct discarding leads to missed information. Furthermore, under combined preferences, traditional algorithms such as BNL have enormous computational overhead and insufficient pruning efficiency, resulting in uncontrollable query result sets or excessively long response times. Especially in extremely high-dimensional missing environments, traditional algorithms are prone to causing exponential growth in the combined candidate set, leading not only to excessively long response times but also to severe memory overflows or query timeouts, severely restricting the responsiveness and decision-making capabilities of intelligent recommendation systems. Summary of the Invention

[0004] To address the aforementioned issues, the present invention aims to provide a Skyline query method for multidimensional missing data combination preferences for intelligent recommendation, which enriches the Skyline query result set, reduces the number of tuple comparisons under missing data, improves efficiency, and solves the problems of low accuracy and high overhead of traditional Skyline queries in incomplete data environments.

[0005] The technical solution of this invention is: a Skyline query method for multi-dimensional missing data combination preferences for intelligent recommendation, comprising the following steps:

[0006] Step 1: Given a multidimensional missing data set O and user preference information, classify the dataset O using the multidimensional missing data classification algorithm based on weighted classification tree (WT-MC) to form a set of leaf node buckets B with similar missing patterns and attribute features;

[0007] Step 2: Perform bucket optimization and merging (CD-OM) based on the leaf node bucket set B to construct the pre-structure of the WCT-R*-Tree index;

[0008] Step 3: Construct a WCT-R*-Tree index structure based on the optimized bucket set, perform a preference Skyline query and pruning based on the weighted classification tree (WT-PSP), and output the candidate point set Candidates;

[0009] Step 4: Perform Combined Preference Skyline Incremental Generation and Domination (CPS-IDD) on the candidate point set Candidates, and output the final combined preference Skyline query result set. .

[0010] The step of classifying dataset O using the Weighted Classification Tree-Based Multidimensional Missing Data Classification (WT-MC) algorithm specifically includes:

[0011] Define a missing dataset O and a set of dimensions V; define a dominance rating function Score, whereby a dominance rating is high if an attribute dimension has a high splitting priority and low dispersion; for tuples missing values ​​of a splitting dimension, do not discard them directly, but assign them a calculated weight W(d,x) based on the weight of their parent node and divide them into the corresponding subspace; traverse dataset O to construct a weighted classification tree, and group data with similar missing patterns into the same leaf node buckets to form an initial bucket set B.

[0012] The WCT-R*-Tree index structure step specifically includes:

[0013] The data distribution within a bucket is determined by the bucket dispersion D(B), and buckets with excessive dispersion are locally split. Skyline candidate pairs are constructed using the bucket similarity Sim(Bi, Bk) and the bucket dominance relationship. Similar buckets with low dispersion after merging are merged to generate an optimized bucket set B'. A two-level index structure is constructed based on the optimized bucket set B'. The upper level uses an improved R*-tree to index the minimum bounding rectangle of the buckets, and the lower level constructs a weighted tree based on the mixed preference weights of the buckets within each bucket. This forms the WCT-R*-Tree index, ensuring that uniform pruning is performed during the query process using the optimistic upper bound of the preference.

[0014] The aforementioned Combined Preference Skyline Incremental Generation and Domination Processing (CPS-IDD) step specifically includes:

[0015] Given a pruned set of candidate points (Candidates), determine the dominance relationship between newly generated combinations and old combinations in the result set. Dominance judgment between approximate sets: iterate through each new point in Candidates. ,use With historical processed point set Point generation in the middle includes New combinations of size k Two-way dominance judgment between the new combination and the old combination:

[0016] 1. Forward check: If the result set There exists an old combination. Dominates new combinations based on the aggregation preference function Then discard , not included in the result set.

[0017] 2. Reverse check: If the new combination Dominates the result set on the aggregation preference function The old combination in Then from the result set Remove the dominated and will Add to result set Finally, the updated result set is returned. .

[0018] The beneficial effects of this invention are as follows: For multidimensional missing data, this invention proposes a weighted classification tree preprocessing mechanism, achieving accurate data partitioning without discarding missing tuples. By establishing an effective WCT-R*-Tree index structure and a preference-optimistic upper bound pruning strategy, the invalid search space is significantly reduced. Through the proposed combined preference incremental generation and bidirectional dominance algorithm, redundant calculations of full permutations are avoided, significantly reducing computational overhead and improving query efficiency and accuracy, making the query results more consistent with the complex preference needs of users under multi-objective decision-making. Attached Figure Description

[0019] Figure 1 This is a flowchart of the steps of the Skyline query method for multidimensional missing data combination preferences for intelligent recommendation, as proposed in this invention.

[0020] Figure 2 This is a schematic diagram of the construction of a weighted classification tree (WCT) in a specific embodiment of the present invention.

[0021] Figure 3 This is a schematic diagram of the WCT-R*-Tree index structure in a specific embodiment of the present invention. Detailed Implementation

[0022] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments. The step numbers in the following embodiments are only for ease of explanation and do not limit the order of the steps. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

[0023] Reference Figure 1 and Figure 2 This invention provides a Skyline query method for multi-dimensional missing data combination preferences in intelligent recommendation. In a specific intelligent recommendation scenario (such as an intelligent urban rental recommendation scenario), assuming the system contains a housing dataset with three dimensions: rent, distance, and area, where some housing data is missing for distance or area. To recommend a set of housing options that meet the user's personalized combination preferences and avoid memory overflow caused by exhaustive comparison, the method includes the following steps:

[0024] Step 1: Given a multidimensional missing data set O and user preference information, classify dataset O using a weighted classification tree-based multidimensional missing data classification algorithm (WT-MC) to form a set of leaf node buckets B with similar missing patterns and attribute features; define the dominance rating function. ,λϵ[0,1]. Where, This reflects the completeness of the attributes; λϵ[0,1] is used to balance the weights of both in the score, IG is the split priority, and MR is the dispersion. The weighted attribute value for each dimension d is calculated, and the attribute with the largest FinalScore is selected as the optimal split dimension. For missing tuples in this dimension, a weight value W(d,x) is assigned based on the parent node weight, and the tuple is then assigned to a child node branch.

[0025] The specific steps of the multidimensional missing data classification algorithm based on weighted classification trees (WT-MC) are as follows: Figure 2 As shown:

[0026] Input: Incomplete dataset O, bucket size .

[0027] Output: Set of leaf node buckets B.

[0028] 1. Initialize the root node It contains all data O.

[0029] 2. Calculate the initial split priority for all candidate attributes.

[0030] 3. Traverse the node queue. For each node, if the data size is less than... If the splitting yield is low, mark it as a leaf node bucket.

[0031] 4. Otherwise, calculate the dominance score for each dimension. For tuples with missing values, calculate the current weight using the parent node's weight; aggregate to obtain the average attribute weight.

[0032] 5. Calculate the final split priority FinalScore and select the dimension corresponding to the maximum value as the split dimension.

[0033] 6. Divide the data into child nodes according to the split dimension (missing data is divided according to weight).

[0034] 7. Repeat the above steps until the queue is empty, and output the set B of all leaf node buckets.

[0035] Step 2: Bucket optimization and merging based on classification data (CD-OM), which involves local splitting of buckets and merging of similar buckets.

[0036] 1. Bucket Optimization (Local Splitting): Calculate the weighted dispersion D(B) within the bucket. If D(B) is greater than the set value and the amount of data in the bucket is sufficient, then the bucket is split again to refine the data space and reduce the variability of the data within the bucket.

[0037] 2. Bucket Merging: Calculate the similarity between buckets, Sim(Bi, Bk). If the center vectors of two buckets are close in distance, and the increased dispersion of the merged bucket is not significant (meeting the Skyline selection criteria), then merge these two buckets into a new bucket. This step effectively reduces fragmented buckets and lowers the complexity of index construction.

[0038] Step 3: WCT-R*-Tree index structure as follows Figure 3 As shown, a preference skyline query and pruning based on a weighted classification tree (WT-PSP) is performed, which specifically includes:

[0039] 1. Construct a two-level index: The upper level performs an R*-tree index on the minimum bounding rectangle of the optimized bucket B'; the lower level constructs a weighted tree within each bucket.

[0040] 2. Pruning Strategy: Pruning is performed using the optimistic upper bound of preferences. The optimal preference point of the bucket is defined. For missing dimensions, the globally optimal value for that dimension is taken to ensure that unintended pruning does not occur. If a currently known Skyline point dominates a bucket... If the bucket and all its child nodes are pruned, then the bucket and all its child nodes are safely pruned. Using a priority queue, traverse the indices and output the set of all unpruned candidate points, Candidates.

[0041] Step four: The aforementioned combinatorial preference Skyline incremental generation and domination processing (CPS-IDD) specifically includes processing the dominance relationship between newly generated combinatorials and old combinatorials in the current result set. The specific steps of the combinatorial preference Skyline incremental generation and domination algorithm are as follows:

[0042] Input: Candidates set, combination size k, user preference u.

[0043] Output: Skyline query result set for combined preferences .

[0044] 1. Create an empty result set It is used to store the found Skyline combination data.

[0045] 2. Process data points one by one: Iterate through each new point in Candidates. .

[0046] 3. Utilize and Point generation in the middle includes All new combinations of size k .

[0047] 4. Positive judgment: Check Whether or not Existing combinations are dominated. If the current combination is dominated, it is not added to the result set.

[0048] 5. Reverse judgment: If Not under control, further investigation required. Dominate The old combination in. If dominant, then from Remove these old combinations.

[0049] 6. Add to the result set: Add the results that passed the check. Add to result set .

[0050] 7. Add to history points .

[0051] 8. The loop ends, and the final query result set is returned. .

[0052] The above is a detailed description of the preferred embodiments of the present invention. However, the present invention is not limited to the embodiments described. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention. All such equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims

1. A Skyline query method for multidimensional missing data combination preferences for intelligent recommendation, characterized in that, Define a missing dataset O, a set of dimensions V, and user preference weights w for intelligent recommendation scenarios. If a data tuple has a value of NULL in a certain dimension, it is defined as missing data. For a given multidimensional missing dataset O and a target number of combinations k, the specific operation steps include: Step 1: Classify the missing dataset O using the Weighted Classification Tree-based Multidimensional Missing Data Classification Algorithm (WT-MC), gather objects with similar missing patterns and attribute characteristics, divide the objects in dataset O into different leaf node bucket sets B, and integrate the objects in the buckets into a whole for processing to form an initial bucket set with classification characteristics. Step 2: Based on the initial bucket set generated in Step 1, perform bucket optimization and merging (CD-OM). Perform local splitting on buckets with high dispersion and merging on buckets with high similarity to construct an optimized bucket set B'. Add the optimized buckets as a whole to the index structure for unified processing. Step 3: Construct the WCT-R*-Tree index structure based on the optimized bucket set B`, and perform preference Skyline query and pruning based on weighted classification tree (WT-PSP) to filter out the undominated candidate point set Candidates using the upper bound of preference optimism; Step 4: Perform Combined Preference Skyline Incremental Generation and Domination Processing (CPS-IDD) on the candidate point set Candidates. During the combined generation process, perform bidirectional domination checks and output the final combined preference Skyline query result set. The results set is then applied to an intelligent recommendation system for display or decision-making.

2. The method according to claim 1, characterized in that, The specific steps of the weighted classification tree-based multidimensional missing data classification algorithm (WT-MC) are as follows: Step 1: Given the missing dataset O and bucket capacity values, initialize the root node and calculate the split priority IG and discreteness MR for all candidate attributes; Step 2: Define the dominance rating function The final split priority, FinalScore(d), is calculated by combining the node attribute weights. For candidate attribute dimensions, Initial splitting priority, For dispersion, The preset weight balance coefficient and ∈[0,1]; Step 3: Select the attribute with the largest FinalScore(d) as the optimal split dimension for the current node; Step 4: For a tuple in the current node, if the data is complete in the split dimension, it is divided into the corresponding child node based on the value; if the data is missing in the split dimension, the tuple is not discarded, but is assigned a weight value W(d,x) based on the parent node weight and divided into all child node branches. Step 5: Repeat the splitting step until the number of tuples in a node is less than the bucket capacity or the splitting reward is lower than the set value. Then mark the node as a leaf node bucket and generate a leaf node bucket set B.

3. The method according to claim 1, characterized in that, The aforementioned Combined Preference Skyline Incremental Generation and Domination Processing (CPS-IDD) step specifically includes: initializing the result set for the selected candidate point set Candidates. Empty, maintaining the set of historical processed points. The incremental generation and dominance processing of combination preferences involves handling the dominance relationship between newly generated combinations and old combinations in the current result set, ultimately generating a non-dominated combination preference result set. The dominance judgment between new and old combinations involves iterating through each new point in the Candidates list. ,use and Point generation in the middle includes New combinations of size k Positive dominance judgment: If There exists an old combination. Dominates new combinations based on the aggregation preference function Then discard Reverse dominance judgment: If the new combination Dominated by the aggregation preference function The old combination in Then from the result set Remove the dominated and will Add to result set .

4. The Skyline query method for multidimensional missing data combination preferences for intelligent recommendation as described in claim 1, characterized in that, The WCT-R*-Tree index structure step specifically includes: constructing a two-layer index structure. The upper layer uses an improved R*-tree to index the minimum bounding rectangle (MBR) of each bucket in the optimized bucket set B'. The lower layer constructs a weighted tree within each bucket based on the mixed preference weights of the buckets, defines the upper bound of the bucket's preference optimism, and for missing dimensions, fills them with the theoretically optimal value of the dimension in the global range according to the attribute preference direction of that dimension. This constructs the WCT-R*-Tree index, ensuring that during the query process, if the upper bound of the preference optimism of an index node is dominated by the current Skyline point, then all objects in that node and its subtree are pruned uniformly.