Clothing matching intelligent recommendation and virtual try-on method and system based on user portrait
By mapping user profiles and clothing item features in multiple dimensions and constructing a three-dimensional model, the matching degree of clothing is quantitatively evaluated, which solves the problem of insufficient modeling of real user features in virtual try-on technology and achieves highly accurate clothing matching recommendations and virtual try-on effects.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINSILK COM
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-16
AI Technical Summary
Existing virtual try-on technology lacks accurate modeling of users' real body characteristics, resulting in a large difference between the generated virtual try-on effect and the real effect. Furthermore, it lacks quantitative analysis of the matching degree between clothing and users' bodies, which affects user trust and system usability.
By extracting multi-dimensional features and semantically mapping user profile data, user representation vectors are generated, and a relationship graph structure of clothing item representation vectors is constructed. Combined with the user's three-dimensional body model, geometric transformation and texture mapping are performed to quantitatively evaluate the coverage, edge fit, and local deformation rate of clothing items, generating accurate recommended matching schemes.
It improves the accuracy and personalization of clothing matching recommendations, generates matching schemes that meet users' individual needs, and enhances user experience and system reliability.
Smart Images

Figure CN122222698A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method and system for intelligent clothing matching recommendation and virtual try-on based on user profiles. Background Technology
[0002] With the rapid development of e-commerce and the widespread application of artificial intelligence technology, personalized clothing recommendations and virtual try-on technologies based on user profiles have become important development directions in the modern fashion retail industry. Clothing recommendation systems aim to provide users with clothing matching solutions that meet their individual needs based on their personal characteristics, preferences, and historical purchasing behavior. Meanwhile, virtual try-on technology, through computer vision and image processing techniques, allows users to digitally preview how clothing would look on them without actually wearing it, greatly enhancing the user's shopping experience.
[0003] Most existing virtual try-on technologies rely on pre-set standard human body models, lacking the ability to accurately model the user's real body characteristics. This results in a significant discrepancy between the generated virtual try-on effect and the real-world effect. These systems are not adaptable enough to users with different body types and complex clothing styles, especially when dealing with the physical properties of clothing such as wrinkles and drape, affecting users' trust in the virtual try-on results. Existing technologies lack scientific evaluation mechanisms for virtual try-on effects and cannot quantitatively analyze the fit between clothing and the user's body. Traditional systems often rely on simple visual evaluations or subjective feedback, lacking objective quantitative analysis of key indicators such as clothing coverage and fit. This makes it difficult to provide users with truly valuable styling suggestions and try-on feedback, reducing the system's practicality and reliability. Summary of the Invention
[0004] This invention provides a method and system for intelligent clothing matching recommendations and virtual try-on based on user profiles, which can solve the problems in the prior art.
[0005] A first aspect of this invention provides a method for intelligent clothing matching recommendation and virtual try-on based on user profiles, comprising: Multi-dimensional feature extraction and semantic mapping are performed on user profile data to generate user representation vectors. Feature encoding and semantic space mapping are performed on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors. Based on the semantic similarity between the user representation vector and the clothing item representation vector, a relationship graph structure between clothing items is constructed. Graph traversal and subgraph extraction operations are performed on the relationship graph structure to generate an initial matching scheme that satisfies connectivity constraints and compatibility constraints. Semantic segmentation and body node detection are performed on user body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a three-dimensional model of the user's body is constructed. Two-dimensional image data of each clothing item in the initial matching scheme are obtained and geometric transformation and texture mapping are performed. The clothing items are mapped to the corresponding body parts of the user's three-dimensional body model. The mapped clothing item images are deformed and adjusted to generate virtual try-on images. Based on the virtual try-on images, quantitative evaluation results are obtained by calculating the coverage, edge fit, and local deformation rate of clothing items on the user's three-dimensional body model. Based on the quantitative evaluation results, the initial matching schemes are screened and sorted to obtain recommended matching schemes.
[0006] Multi-dimensional feature extraction and semantic mapping are performed on user profile data to generate user representation vectors. Feature encoding and semantic space mapping are performed on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors, including: By constructing a multi-granularity feature hierarchy structure for user profile data, the user profile data is decomposed into atomic-level feature units and combined-level feature clusters, and feature encoding is performed on each, generating a set of user sub-representation vectors with a hierarchical structure. Cross-level feature interaction operations are performed on the user sub-representation vector set. By establishing attention weight matrices between different granularity levels, the dependency relationship between atomic-level features and combined-level features is captured, and the user representation vector that integrates multi-granularity feature information is generated. The attribute feature data of individual clothing items are nonlinearly transformed in multiple independent feature subspaces to generate multiple perspective representation vectors. A semantic space alignment operation is performed on the multiple viewpoint representation vectors and the user representation vector. By constructing a bidirectional mapping relationship between the user representation vector and each viewpoint representation vector, the multiple viewpoint representation vectors are projected onto the semantic space where the user representation vector is located. The projected viewpoint representation vectors are then fused with consistency constraints to generate the clothing item representation vector that is in the same semantic space as the user representation vector.
[0007] Based on the semantic similarity between the user representation vector and the clothing item representation vector, a relationship graph structure for the clothing items is constructed. Graph traversal and subgraph extraction operations are performed on the relationship graph structure to generate initial matching schemes that satisfy connectivity and compatibility constraints, including: A multi-dimensional similarity metric matrix is constructed by calculating the distance metric between the representation vectors of the clothing items. An adaptive threshold segmentation strategy is set for each dimension similarity metric in the multi-dimensional similarity metric matrix. Based on the adaptive threshold segmentation strategy, clothing item pairing relationships that satisfy similarity constraints in at least one semantic dimension are selected. The clothing item pairing relationships are transformed into edge connections of a graph structure to construct the relationship graph structure between clothing items. Calculate the matching score between the user representation vector and the representation vector of each clothing item in the relationship graph structure, and select a preset number of clothing item nodes with the highest matching scores as the starting node set for graph traversal. Starting from each starting node in the set of starting nodes, a depth-first traversal is performed. During the traversal, the edge weights between adjacent clothing item nodes on the path are accumulated and a penalty factor for the path length is introduced to obtain a cumulative compatibility evaluation value. When the cumulative compatibility evaluation value is lower than a preset compatibility threshold or the traversal depth exceeds the preset maximum number of matching items, the traversal is terminated and the traversal path is extracted as a candidate matching subgraph. The candidate matching subgraphs are subjected to structural integrity verification, and the candidate matching subgraphs that pass the structural integrity verification are transformed into the initial matching scheme.
[0008] An adaptive threshold segmentation strategy is set for each dimension similarity measure value in the multi-dimensional similarity measure matrix. Based on the adaptive threshold segmentation strategy, clothing item pairing relationships that satisfy similarity constraints in at least one semantic dimension are selected, including: The frequency distribution of similarity measure values for each semantic dimension in the multi-dimensional similarity measure matrix is statistically analyzed. By performing multi-peak detection on the frequency distribution, the clustering pattern of the clothing item representation vector in each semantic dimension is identified. Based on the peak position and inter-peak interval of the clustering pattern, the boundary of the main peak region in the frequency distribution is determined and used as a strong compatibility threshold, and the transition boundary between the secondary peak region and the main peak region is determined and used as a weak compatibility threshold, thus obtaining the adaptive threshold segmentation strategy of the semantic dimension. Traverse the clothing item pairings in the multi-dimensional similarity measurement matrix, and compare the similarity measurement values of the clothing item pairings in each semantic dimension with the strong compatibility threshold and the weak compatibility threshold respectively based on the adaptive threshold segmentation strategy. Record the judgment results of the clothing item pairings satisfying the strong compatibility threshold and the weak compatibility threshold in each semantic dimension, and construct a cross-dimensional compatibility judgment matrix. Based on the cross-dimensional compatibility judgment matrix, a multi-level filtering strategy is executed. The multi-level filtering strategy prioritizes retaining clothing item pairs that meet strong compatibility threshold constraints in at least one semantic dimension, and secondarily retains clothing item pairs that meet weak compatibility threshold constraints in multiple semantic dimensions, thereby filtering out clothing item pairing relationships that satisfy the multi-level filtering strategy.
[0009] Semantic segmentation and body node detection are performed on user body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a 3D model of the user's body is constructed, including: Multi-scale feature extraction is performed on the user's body image data, and semantic segmentation operations are performed at different spatial resolution levels of the multi-scale features to obtain part segmentation results at different resolution levels. The boundary refinement process is performed on the part segmentation results of different resolution levels through a cross-level feature fusion mechanism. The boundary refinement process is performed by fusing and correcting the boundary detail information of different levels with semantic consistency information to obtain the part segmentation mask of the user's body. Body node detection is performed on the user's body image data to obtain the coordinates of the user's skeletal nodes. Based on the part segmentation mask, spatial constraint verification is performed on the skeletal node coordinates to determine whether the skeletal node coordinates are located within the region of the corresponding body part in the part segmentation mask. When the coordinates of the bone node are outside the area of the corresponding body part, the coordinates of the bone node are corrected according to the area boundary and centroid position in the part segmentation mask to obtain the corrected bone node coordinates. Based on the part segmentation mask and the corrected skeletal node coordinates, a three-dimensional model of the user's body is established.
[0010] The process involves acquiring two-dimensional image data of each garment item in the initial outfit scheme, performing geometric transformations and texture mapping, mapping the garment items to corresponding body parts of the user's three-dimensional body model, and then deforming and adjusting the mapped garment images to generate virtual try-on images. Two-dimensional image data of each clothing item in the initial matching scheme are obtained, foreground segmentation is performed on the two-dimensional image data, the outline boundary of the clothing item is extracted, and the perspective transformation matrix and projection transformation matrix are calculated based on the three-dimensional geometric information of the corresponding body parts in the user's three-dimensional body model. The perspective transformation matrix and the projection transformation matrix are sequentially applied to the two-dimensional image data of the garment to perform geometric transformation, thereby obtaining a transformed image of the garment that matches the spatial posture of the corresponding body part. The surface of the corresponding body part in the user's 3D model is meshed and the surface texture coordinates of the corresponding body part are extracted. The outline boundary of the geometrically transformed clothing item is aligned and matched with the outline boundary of the surface mesh to establish a mapping relationship between the pixel coordinates of the transformed clothing item image and the surface texture coordinates. Based on the mapping relationship, the texture information of the transformed clothing item image is mapped to the surface of the corresponding body part of the user's 3D model to obtain the mapped clothing item image. Based on the local curvature changes of corresponding body parts in the user's 3D body model, the local area of the mapped clothing item image is stretched or compressed, and then rendered and synthesized with the user's 3D body model to generate the virtual try-on image.
[0011] Based on the virtual try-on images, quantitative evaluation results are obtained by calculating the coverage, edge fit, and local deformation rate of clothing items on the user's 3D body model. Based on these quantitative evaluation results, the initial outfit combinations are filtered and ranked to obtain recommended outfit combinations, including: Extract the actual coverage area of the clothing item on the user's three-dimensional body model from the virtual try-on image, calculate the area ratio of the actual coverage area to the theoretical coverage area, and obtain the coverage integrity of the clothing item. Calculate the distribution of normal distances between the edge contour lines of the clothing items in the virtual try-on image and the surface contour lines of the corresponding body parts in the user's 3D body model, and perform statistical analysis to obtain the edge fit. Deformation analysis is performed on the texture mesh of the clothing item in the virtual try-on image, the deformation ratio of each mesh unit in the texture mesh is calculated, and the deformation ratio is statistically analyzed in different regions of the texture mesh to obtain the local deformation rate. The coverage integrity, edge fit, and local deformation rate are each assigned an evaluation weight and then fused to obtain a quantitative evaluation result of the initial matching scheme; The initial pairing schemes whose quantitative evaluation results are higher than the preset screening threshold are retained and arranged in descending order of the quantitative evaluation results to obtain the recommended pairing schemes.
[0012] A second aspect of the present invention provides an intelligent clothing matching recommendation and virtual try-on system based on user profiles, comprising: The first unit is used to perform multi-dimensional feature extraction and semantic mapping on user profile data to generate user representation vectors, and to perform feature encoding and semantic space mapping on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors. The second unit is used to construct a relationship graph structure between clothing items based on the semantic similarity between the user representation vector and the clothing item representation vector, and to perform graph traversal and subgraph extraction operations on the relationship graph structure to generate an initial matching scheme that satisfies connectivity constraints and compatibility constraints. The third unit is used to perform semantic segmentation and body node detection on user body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a three-dimensional model of the user's body is constructed. The fourth unit is used to acquire the two-dimensional image data of each clothing item in the initial matching scheme and perform geometric transformation and texture mapping, map the clothing items to the corresponding body parts of the user's three-dimensional body model, perform deformation adjustment on the mapped clothing item images, and generate virtual try-on images. The fifth unit is used to calculate the coverage, edge fit, and local deformation rate of clothing items on the user's three-dimensional body model based on the virtual try-on image, obtain quantitative evaluation results, and filter and sort the initial matching schemes according to the quantitative evaluation results to obtain recommended matching schemes.
[0013] A third aspect of the present invention, An electronic device is provided, comprising: processor; Memory used to store processor-executable instructions; The processor is configured to invoke instructions stored in the memory to execute the aforementioned method.
[0014] Fourth aspect of the embodiments of the present invention, A computer-readable storage medium is provided, having stored thereon computer program instructions that, when executed by a processor, implement the aforementioned method.
[0015] The beneficial effects of this application are as follows: By performing multi-dimensional feature extraction and semantic mapping on user profile data, and feature encoding and semantic space mapping on clothing item attributes, matching is achieved within the same semantic space, improving the accuracy and personalization of outfit recommendations. A relationship graph structure is constructed using the semantic similarity between user representation vectors and clothing item representation vectors. Through graph traversal and subgraph extraction operations, outfit schemes that satisfy connectivity and compatibility constraints can be automatically generated, effectively solving the problem of insufficient coordination between outfit items in traditional outfit recommendations. By performing semantic segmentation and skeletal node detection on user body images, a personalized 3D user body model is constructed, providing a precise body reference basis for subsequent virtual try-on. Attached Figure Description
[0016] Figure 1This is a flowchart illustrating the intelligent clothing matching recommendation and virtual try-on method based on user profiles, as described in an embodiment of the present invention. Figure 2 This is a schematic diagram of the multi-granularity feature encoding and semantic space mapping process. Detailed Implementation
[0017] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0018] The technical solution of the present invention will be described in detail below with reference to specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.
[0019] Figure 1 This is a flowchart illustrating the intelligent clothing matching recommendation and virtual try-on method based on user profiles, as described in an embodiment of the present invention. Figure 1 As shown, the method includes: Multi-dimensional feature extraction and semantic mapping are performed on user profile data to generate user representation vectors. Feature encoding and semantic space mapping are performed on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors. Based on the semantic similarity between the user representation vector and the clothing item representation vector, a relationship graph structure between clothing items is constructed. Graph traversal and subgraph extraction operations are performed on the relationship graph structure to generate an initial matching scheme that satisfies connectivity constraints and compatibility constraints. Semantic segmentation and body node detection are performed on user body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a three-dimensional model of the user's body is constructed. Two-dimensional image data of each clothing item in the initial matching scheme are obtained and geometric transformation and texture mapping are performed. The clothing items are mapped to the corresponding body parts of the user's three-dimensional body model. The mapped clothing item images are deformed and adjusted to generate virtual try-on images. Based on the virtual try-on images, quantitative evaluation results are obtained by calculating the coverage, edge fit, and local deformation rate of clothing items on the user's three-dimensional body model. Based on the quantitative evaluation results, the initial matching schemes are screened and sorted to obtain recommended matching schemes.
[0020] In one optional implementation, multi-dimensional feature extraction and semantic mapping are performed on user profile data to generate user representation vectors. Feature encoding and semantic space mapping are performed on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors. This includes: By constructing a multi-granularity feature hierarchy structure for user profile data, the user profile data is decomposed into atomic-level feature units and combined-level feature clusters, and feature encoding is performed on each, generating a set of user sub-representation vectors with a hierarchical structure. Cross-level feature interaction operations are performed on the user sub-representation vector set. By establishing attention weight matrices between different granularity levels, the dependency relationship between atomic-level features and combined-level features is captured, and the user representation vector that integrates multi-granularity feature information is generated. The attribute feature data of individual clothing items are nonlinearly transformed in multiple independent feature subspaces to generate multiple perspective representation vectors. A semantic space alignment operation is performed on the multiple viewpoint representation vectors and the user representation vector. By constructing a bidirectional mapping relationship between the user representation vector and each viewpoint representation vector, the multiple viewpoint representation vectors are projected onto the semantic space where the user representation vector is located. The projected viewpoint representation vectors are then fused with consistency constraints to generate the clothing item representation vector that is in the same semantic space as the user representation vector.
[0021] like Figure 2 As shown, the method includes: User profile data includes various data types such as basic user attributes, behavioral patterns, and preference tags. This data is decomposed into two levels: atomic-level feature units and combined-level feature clusters. Atomic-level feature units include basic user attributes such as age, gender, and region, as well as basic behaviors such as single clicks and favorites. Combined-level feature clusters are high-level feature sets formed by combining multiple atomic-level features, such as seasonal purchasing preferences and style preferences. For atomic-level features, an embedding layer is used to convert discrete features into dense vector representations, while continuous features are directly used after normalization. For example, the "gender" feature is mapped to a 64-dimensional vector, and the "age" feature is discretized and mapped to a 128-dimensional vector. For combined-level features, the atomic-level feature vectors constituting the feature cluster are concatenated and nonlinearly transformed using a multilayer perceptron to obtain the combined-level feature representation. For example, "recent purchase history" and "click browsing history" are combined into a "short-term interest feature cluster," which is then converted into a 256-dimensional vector through two fully connected layers. Through the above processing, a hierarchical set of user sub-representation vectors is generated, containing both the atomic-level feature vector set and the combined-level feature vector set.
[0022] Cross-level feature interaction operations are performed on the user sub-representation vector set to construct an attention mechanism between atomic-level features and combined-level features, and the semantic relevance between them is calculated. For each combined-level feature vector, its attention weight with all atomic-level feature vectors is calculated to obtain an attention weight matrix. Specifically, let the set of atomic-level feature vectors be A and the set of combined-level feature vectors be C. For each pair (ai, cj), the attention weight wij is calculated by dot product and then normalized using the softmax function. Based on the attention weights, the information of the atomic-level features is integrated into the combined-level features, enabling the combined-level features to dynamically pay attention to relevant atomic-level features. At the same time, a reverse attention mechanism is used to enable the atomic-level features to also perceive the combined-level semantic environment. Through the above bidirectional interaction, atomic-level features and combined-level features mutually enhance each other. Finally, the interacting feature vectors are concatenated and passed through a fully connected layer and a nonlinear activation function to generate a user representation vector U with a dimension of 512 that integrates multi-granularity feature information.
[0023] The attributes of clothing items include dimensions such as category, material, color, style, and suitability for different occasions. Independent feature subspaces are constructed for each attribute dimension, and specialized encoders are used to process them separately. For example, a category encoder Ec is used for the "category" dimension, mapping discrete values such as "T-shirt" and "dress" to 128-dimensional vectors; a style encoder Es is used for the "style" dimension, converting style labels such as "casual" and "formal" into 128-dimensional vectors; and a pre-trained convolutional neural network is used to extract 256-dimensional visual representation vectors for clothing image features. This results in multiple representation vectors V1, V2, ..., Vn from different perspectives, each reflecting the characteristics of the clothing in a specific attribute dimension.
[0024] A bidirectional mapping relationship is constructed between the user representation vector U and the viewpoint representation vectors Vi. For each viewpoint representation vector Vi, a forward mapping function fi is designed to project Vi onto the semantic space where the user representation vector resides; simultaneously, a backward mapping function gi is designed to project the user representation vector U back into the viewpoint representation space. Semantic space alignment is achieved by minimizing the reconstruction error of the bidirectional mapping. Consistency constraints are applied to the projected viewpoint representation vectors for fusion, and a cross-viewpoint attention mechanism is introduced to calculate the importance weights of different viewpoint representation vectors in the user semantic space. Based on these weights, the viewpoint representation vectors are weighted and summed, and through residual connections and nonlinear transformations, a clothing item representation vector P residing in the same semantic space as the user representation vector is generated.
[0025] The above methods enable the representation of user profile data and clothing item feature data in the same semantic space, laying the foundation for subsequent personalized recommendations. In practical applications, similarity can be calculated using user representation vectors and clothing item representation vectors, and personalized clothing recommendations can be made based on the similarity level, improving the accuracy of the recommendation system and the user experience.
[0026] In one optional implementation, a relationship graph structure between clothing items is constructed based on the semantic similarity between the user representation vector and the clothing item representation vector. Graph traversal and subgraph extraction operations are performed on the relationship graph structure to generate an initial matching scheme that satisfies connectivity and compatibility constraints, including: A multi-dimensional similarity metric matrix is constructed by calculating the distance metric between the representation vectors of the clothing items. An adaptive threshold segmentation strategy is set for each dimension similarity metric in the multi-dimensional similarity metric matrix. Based on the adaptive threshold segmentation strategy, clothing item pairing relationships that satisfy similarity constraints in at least one semantic dimension are selected. The clothing item pairing relationships are transformed into edge connections of a graph structure to construct the relationship graph structure between clothing items. Calculate the matching score between the user representation vector and the representation vector of each clothing item in the relationship graph structure, and select a preset number of clothing item nodes with the highest matching scores as the starting node set for graph traversal. Starting from each starting node in the set of starting nodes, a depth-first traversal is performed. During the traversal, the edge weights between adjacent clothing item nodes on the path are accumulated and a penalty factor for the path length is introduced to obtain a cumulative compatibility evaluation value. When the cumulative compatibility evaluation value is lower than a preset compatibility threshold or the traversal depth exceeds the preset maximum number of matching items, the traversal is terminated and the traversal path is extracted as a candidate matching subgraph. The candidate matching subgraphs are subjected to structural integrity verification, and the candidate matching subgraphs that pass the structural integrity verification are transformed into the initial matching scheme.
[0027] A multi-dimensional similarity metric matrix is constructed by calculating the distance metrics between the representation vectors of individual clothing items. This matrix contains similarity information for each clothing item across different semantic dimensions, including aspects such as color harmony, style consistency, and occasion suitability. For each dimension's similarity metric in the matrix, an adaptive threshold segmentation strategy is implemented. A percentile-based method is used to analyze the similarity distribution of each dimension, and an appropriate threshold is determined based on the distribution characteristics. For example, for the color harmony dimension, the top 30% of the similarity distribution can be selected as the threshold; for the style consistency dimension, the top 40% can be selected as the threshold.
[0028] Based on the aforementioned adaptive threshold segmentation strategy, clothing item pairings that satisfy similarity constraints in at least one semantic dimension are selected. If the similarity value of two clothing items in any semantic dimension is higher than the adaptive threshold for that dimension, then the two items are considered to be paired. These pairings are then transformed into edge connections in a graph structure. For example, if a white T-shirt and a pair of blue jeans have a similarity higher than the corresponding threshold in the style consistency dimension, then an edge connecting these two items is established in the graph structure, and the edge weight can be set as a weighted average of their similarity values. In this way, a complete clothing item relationship graph structure is constructed.
[0029] The matching score between the user's representation vector and the representation vectors of each clothing item in the relationship graph structure is calculated. The matching score can be calculated using cosine similarity, reflecting the degree of fit between user preferences and clothing characteristics. The calculated matching scores are then sorted, and a predetermined number of clothing item nodes with the highest rankings are selected as the starting node set for graph traversal. For example, the top five clothing items with the highest matching scores can be selected as the starting nodes; these items typically match the user's style preferences very well.
[0030] A depth-first traversal is performed starting from each node in the set of starting nodes. During the traversal, the edge weights between adjacent clothing item nodes on the path are accumulated, and a penalty factor based on the path length is introduced. The formula for accumulating edge weights is: Accumulated evaluation value = Current accumulated value + New edge weight - Path length penalty factor × Current path length. The path length penalty factor is used to balance the complexity of outfit combinations and prevent the generation of overly complex combinations. The traversal terminates when the accumulated compatibility evaluation value falls below a preset compatibility threshold or the traversal depth exceeds a preset limit on the number of outfit combinations. For example, if the preset limit on the number of outfit combinations is five, the exploration stops when the traversal depth reaches 5; or the traversal terminates early when the accumulated compatibility evaluation value drops below a preset threshold (e.g., 0.6). The completed traversal path is extracted as a candidate outfit subgraph.
[0031] The candidate outfit subgraphs are structurally validated to ensure that the outfits meet the basic rules for clothing category combination. The validation includes checking whether the outfits contain the necessary clothing categories (such as tops and bottoms) and whether there are any functionally conflicting items in the outfits (such as multiple coats or multiple pairs of pants). It can also validate the seasonal consistency of the outfits to ensure that summer items are not mixed with winter items.
[0032] Through the above steps, a relationship graph structure between clothing items is constructed based on the semantic similarity between user representation vectors and clothing item representation vectors. Then, through graph traversal and subgraph extraction operations, initial matching schemes that satisfy connectivity and compatibility constraints are successfully generated. This method fully utilizes the semantic associations between clothing items and user personalized preference information, enabling the generation of clothing matching schemes that both conform to matching rules and meet user personalized needs.
[0033] In one optional implementation, an adaptive threshold segmentation strategy is set for each dimension similarity measure value in the multi-dimensional similarity measure matrix, and the selection of clothing item pairing relationships that satisfy similarity constraints in at least one semantic dimension based on the adaptive threshold segmentation strategy includes: The frequency distribution of similarity measure values for each semantic dimension in the multi-dimensional similarity measure matrix is statistically analyzed. By performing multi-peak detection on the frequency distribution, the clustering pattern of the clothing item representation vector in each semantic dimension is identified. Based on the peak position and inter-peak interval of the clustering pattern, the boundary of the main peak region in the frequency distribution is determined and used as a strong compatibility threshold, and the transition boundary between the secondary peak region and the main peak region is determined and used as a weak compatibility threshold, thus obtaining the adaptive threshold segmentation strategy of the semantic dimension. Traverse the clothing item pairings in the multi-dimensional similarity measurement matrix, and compare the similarity measurement values of the clothing item pairings in each semantic dimension with the strong compatibility threshold and the weak compatibility threshold respectively based on the adaptive threshold segmentation strategy. Record the judgment results of the clothing item pairings satisfying the strong compatibility threshold and the weak compatibility threshold in each semantic dimension, and construct a cross-dimensional compatibility judgment matrix. Based on the cross-dimensional compatibility judgment matrix, a multi-level filtering strategy is executed. The multi-level filtering strategy prioritizes retaining clothing item pairs that meet strong compatibility threshold constraints in at least one semantic dimension, and secondarily retains clothing item pairs that meet weak compatibility threshold constraints in multiple semantic dimensions, thereby filtering out clothing item pairing relationships that satisfy the multi-level filtering strategy.
[0034] A multi-dimensional similarity metric matrix is calculated based on the obtained clothing item representation vectors. This matrix reflects the degree of similarity between different clothing items in various semantic dimensions (such as color, style, and material). For example, for the color dimension, the color similarity between two garments can be calculated by extracting the color histogram features from the clothing images; for the style dimension, the style similarity can be calculated by extracting contour and structural features; and for the material dimension, the similarity can be calculated using texture features.
[0035] The frequency distribution of similarity metrics across semantic dimensions in a multi-dimensional similarity matrix is analyzed. For example, in the color dimension, the similarity values of all clothing item pairs are divided into several intervals at certain intervals, and the number of clothing item pairs in each interval is counted to form a frequency distribution histogram. This step helps to understand the data distribution characteristics and provides a basis for subsequent threshold setting.
[0036] The kernel density estimation method is used to identify clustering patterns of clothing item representation vectors in each semantic dimension by performing multi-peak detection on the frequency distribution. This transforms the discrete frequency distribution into a continuous probability density function, and then identifies the peak locations by finding the local maxima of this function. For example, in the color dimension, the frequency distribution exhibits two or more peaks, corresponding to highly similar, moderately similar, and lowly similar clothing item pairs, respectively.
[0037] Based on the peak positions and inter-peak intervals of clustering patterns, the boundary of the main peak region in the frequency distribution is determined and used as a strong compatibility threshold, while the transition boundary between the secondary peak region and the main peak region is determined and used as a weak compatibility threshold. Specifically, for multiple identified peaks, the peak region with the highest similarity is selected as the main peak region, and its right boundary (the side with lower similarity) is used as the strong compatibility threshold; the valley position between the second most similar peak region and the main peak region is selected as the weak compatibility threshold. This adaptive threshold setting method based on data distribution characteristics can automatically adjust the threshold according to the data characteristics of different dimensions, avoiding the subjectivity and inflexibility of manually setting thresholds.
[0038] In practical applications, some dimensions may exhibit a unimodal distribution. In such cases, the mean and standard deviation of the frequency distribution can be calculated. The mean minus one standard deviation is used as the strong compatibility threshold, and the mean minus two standard deviations is used as the weak compatibility threshold. For each semantic dimension, corresponding strong and weak compatibility thresholds are obtained, constituting a complete adaptive threshold segmentation strategy.
[0039] The system iterates through the clothing item pairs in the multi-dimensional similarity measurement matrix and compares their similarity metrics in each semantic dimension with strong and weak compatibility thresholds based on an adaptive threshold segmentation strategy. For example, for clothing items A and B, it checks whether their similarity value in the color dimension is greater than the strong or weak compatibility threshold for color, and similarly checks their similarity values in other dimensions such as style and material.
[0040] Record the judgment results of clothing item pairings satisfying strong compatibility thresholds and weak compatibility thresholds in each semantic dimension, and construct a cross-dimensional compatibility judgment matrix. This matrix can be represented as a two-dimensional table, where rows represent clothing item pairs, columns represent each semantic dimension, and cells record the judgment results: for example, "2" indicates that the strong compatibility threshold is met, "1" indicates that the weak compatibility threshold is met, and "0" indicates that no compatibility requirements are met.
[0041] A multi-level filtering strategy is implemented based on a cross-dimensional compatibility judgment matrix. This strategy prioritizes clothing item pairs that meet strong compatibility threshold constraints in at least one semantic dimension. For example, clothing item pairs that meet the strong compatibility threshold in the color dimension will be prioritized, reflecting the high degree of color coordination among these items. Secondarily, clothing item pairs that simultaneously meet weak compatibility threshold constraints in multiple semantic dimensions are retained. For example, pairs that meet the weak compatibility thresholds in both color and style dimensions reflect that while these items may not be particularly outstanding in a single dimension, their overall coordination is good.
[0042] This multi-level screening strategy effectively balances two compatibility modes: one is specialized coordination that excels in one dimension, and the other is broad coordination that is balanced across multiple dimensions. In this way, clothing item pairings that meet the multi-level screening strategy are selected, laying the foundation for subsequent outfit recommendations.
[0043] In one optional implementation, semantic segmentation and body node detection are performed on the user's body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a three-dimensional model of the user's body is constructed, including: Multi-scale feature extraction is performed on the user's body image data, and semantic segmentation operations are performed at different spatial resolution levels of the multi-scale features to obtain part segmentation results at different resolution levels. The boundary refinement process is performed on the part segmentation results of different resolution levels through a cross-level feature fusion mechanism. The boundary refinement process is performed by fusing and correcting the boundary detail information of different levels with semantic consistency information to obtain the part segmentation mask of the user's body. Body node detection is performed on the user's body image data to obtain the coordinates of the user's skeletal nodes. Based on the part segmentation mask, spatial constraint verification is performed on the skeletal node coordinates to determine whether the skeletal node coordinates are located within the region of the corresponding body part in the part segmentation mask. When the coordinates of the bone node are outside the area of the corresponding body part, the coordinates of the bone node are corrected according to the area boundary and centroid position in the part segmentation mask to obtain the corrected bone node coordinates. Based on the part segmentation mask and the corrected skeletal node coordinates, a three-dimensional model of the user's body is established.
[0044] An encoder-decoder network structure is used to extract multi-scale features from user body image data. The encoder part uses a pre-trained backbone network (such as ResNet or EfficientNet) to extract features layer by layer, generating five feature layers of different resolutions: F1, F2, F3, F4, and F5. F1 retains the highest resolution (1 / 2 of the original image), while F5 has the lowest resolution (1 / 32 of the original image). Each layer contains semantic information from different receptive fields; lower-level features retain more details and boundary information, while higher-level features contain more abstract semantic understanding.
[0045] When performing semantic segmentation at different resolution levels, a lightweight segmentation head is designed for each feature level Fi, containing a 3×3 convolutional layer (adjusting the number of channels) and a 1×1 convolutional layer (generating the segmentation map). Each segmentation head outputs the segmentation result Mi at the corresponding resolution, containing semantic labels for various parts of the human body (such as the head, torso, limbs, etc.). In specific implementation, M5 corresponding to the F5 level first identifies the general region, while M1 corresponding to the F1 level focuses on detailed boundaries.
[0046] The cross-level feature fusion mechanism adopts a top-down progressive fusion approach, upsampling M5 to M4 resolution and fusing it with M4 features using the formula: M4' = Conv(Concat(Upsample(M5), M4)). Here, Upsample is the upsampling operation, Concat is the feature concatenation, and Conv is the convolution operation. This process continues layer by layer to obtain M3', M2', and M1'. In the boundary regions, an attention mechanism is used to enhance the boundary feature weights: Bi = Sigmoid(Conv(Mi)) × Mi', where Bi represents the segmentation result after boundary enhancement. Finally, through boundary refinement, a high-precision part segmentation mask is obtained.
[0047] Body node detection is achieved through heatmap regression. The same backbone network is used to extract features, and a dedicated keypoint detection head is added to output heatmap predictions for 17 human keypoints (such as the top of the head, neck, shoulders, elbows, wrists, hips, knees, and ankles). Gaussian peak localization is performed on each heatmap to obtain initial skeletal node coordinates P = {p1, p2, …, p17}.
[0048] During the spatial constraint verification phase, each bone node *pi* is checked to see if it falls within the corresponding body part segmentation mask region. For example, it checks whether the elbow keypoint is located within the forearm segmentation region. A function *Check(pi, Maski)* is defined, returning a boolean value indicating whether the node position is reasonable. When *Check(pi, Maski)* is false, meaning the keypoint is outside the corresponding body part region, node coordinate correction is required.
[0049] Skeletal node coordinate correction is based on the regional characteristics of the part mask. The centroid coordinates ci and boundary contour Contouti of the corresponding part mask Maski are calculated, and the boundary point bi closest to the initial prediction point pi is found on the contour. The corrected keypoint pi' is determined by a weighted combination of the initial point pi, centroid ci, and boundary point bi: pi' = αpi + βbi + (1-α-β)ci, where the weights α and β are dynamically adjusted according to the confidence of the initial point.
[0050] Based on the part segmentation mask and the corrected skeletal node coordinates, a 3D model of the user's body is established, the skeletal topology is constructed, the connection relationships between key points are determined, the part segmentation mask is mapped onto a 3D template, and the template is deformed and adjusted according to the corrected skeletal nodes. The part segmentation mask provides constraints on the body surface shape, while the skeletal nodes control pose changes. Through optimization algorithms, the error between the 3D model surface and the 2D projected contour is minimized, resulting in a 3D model that conforms to the user's body characteristics.
[0051] In practical applications, this method effectively handles occlusion and complex poses. For example, when a user's arms are crossed, keypoint detection may fail, but these errors can be corrected by using part segmentation masks, ensuring the accuracy of the generated 3D model's pose. Furthermore, this method exhibits good stability under various lighting conditions and background environments, making it suitable for interactive applications such as virtual try-on and fitness guidance.
[0052] In one optional implementation, acquiring two-dimensional image data of each garment item in the initial outfit scheme and performing geometric transformation and texture mapping, mapping the garment items to corresponding body parts of the user's three-dimensional body model, and performing deformation adjustments on the mapped garment item images to generate virtual try-on images includes: Two-dimensional image data of each clothing item in the initial matching scheme are obtained, foreground segmentation is performed on the two-dimensional image data, the outline boundary of the clothing item is extracted, and the perspective transformation matrix and projection transformation matrix are calculated based on the three-dimensional geometric information of the corresponding body parts in the user's three-dimensional body model. The perspective transformation matrix and the projection transformation matrix are sequentially applied to the two-dimensional image data of the garment to perform geometric transformation, thereby obtaining a transformed image of the garment that matches the spatial posture of the corresponding body part. The surface of the corresponding body part in the user's 3D model is meshed and the surface texture coordinates of the corresponding body part are extracted. The outline boundary of the geometrically transformed clothing item is aligned and matched with the outline boundary of the surface mesh to establish a mapping relationship between the pixel coordinates of the transformed clothing item image and the surface texture coordinates. Based on the mapping relationship, the texture information of the transformed clothing item image is mapped to the surface of the corresponding body part of the user's 3D model to obtain the mapped clothing item image. Based on the local curvature changes of corresponding body parts in the user's 3D body model, the local area of the mapped clothing item image is stretched or compressed, and then rendered and synthesized with the user's 3D body model to generate the virtual try-on image.
[0053] After obtaining the 2D image data of each garment item in the initial outfit scheme, foreground segmentation is performed to separate the garment items from the background. Foreground segmentation can employ deep learning-based image segmentation algorithms, such as U-Net or DeepLabV3+ networks. The original garment image is input, and a binary segmentation mask is output. This mask is applied to extract the effective pixel regions of the garment items, while edge detection algorithms such as the Canny operator are used to extract the set of contour boundary points of the garment items.
[0054] The transformation matrix is calculated based on the 3D geometric information of corresponding body parts in the user's 3D body model. Specifically, the coordinates of key points corresponding to the wearing areas are extracted from the user's 3D body model, such as the 3D coordinates of key points like the chest, shoulders, and waist for a top; and the 3D coordinates of key points like the waist, hips, and thighs for trousers. A local coordinate system is constructed using these key points, and the perspective transformation matrix P of the clothing from a 2D plane to 3D space is calculated: Let the pixel coordinates of n key points extracted from the 2D clothing image be (ui, vi), and the spatial coordinates of the corresponding 3D model key points be (Xi, Yi, Zi), i=1, 2, ..., n. The perspective transformation matrix P is a 3×3 matrix that transforms the 2D homogeneous coordinates [ui, vi, 1]T to the 3D homogeneous coordinates [Xi, Yi, Zi]T. By establishing a system of linear equations, for each corresponding point pair, we have Xi = p11×ui + p12×vi + p13, Yi = p21×ui + p22×vi + p23, and Zi = p31×ui + p32×vi + p33, where pij are elements of the perspective transformation matrix P. The overdetermined system of equations is solved using the least squares method, minimizing the reprojection error of all keypoint pairs ∑i||[Xi, Yi, Zi]T - P×[ui, vi, 1]T||2, to obtain the optimal perspective transformation matrix P. This matrix is obtained by solving a least squares problem, ensuring that the projection error between the transformed keypoints of the 2D clothing and their corresponding keypoints in the 3D model is minimized. Simultaneously, based on camera parameters and viewing angle, the projection transformation matrix Q from 3D space to 2D screen space is calculated. The projection transformation matrix Q consists of the camera intrinsic parameter matrix K and the extrinsic parameter matrix [R|t]. K includes parameters such as focal length fx, fy, principal point coordinates cx, cy, etc., while the extrinsic parameter matrix includes the rotation matrix R and translation vector t, describing the transformation from the 3D spatial coordinate system to the camera coordinate system. The projection transformation matrix Q = K × [R|t] projects the 3D spatial point (X, Y, Z) onto the 2D screen coordinates (u', v'), with the projection relationship s × [u', v', 1]T = Q × [X, Y, Z, 1]T, where s is the depth scale factor. The intrinsic parameter matrix K is obtained through camera calibration, and the extrinsic parameter matrix [R|t] is set according to the user's viewing angle to complete the calculation of the projection transformation matrix Q.
[0055] Geometric transformation is performed on the two-dimensional image data of the clothing item using the calculated transformation matrix. The perspective transformation matrix P is applied to transform the clothing item from the original plane to a spatial position that matches the user's body posture. The transformed clothing image maintains the correct relative positional relationship with the body parts. The projection transformation matrix Q is applied to project the clothing image in space onto the observation plane, generating a transformed clothing image that matches the user's body spatial posture.
[0056] The corresponding body parts of the user's 3D model are meshed, representing continuous curved surfaces as discrete triangular meshes. The mesh density is dynamically adjusted according to the complexity of the body parts; for example, the mesh density is higher for joints with greater curvature. UV texture coordinates are assigned to each mesh vertex to construct a complete surface texture mapping space. The outline boundary points of the geometrically transformed clothing item are aligned and matched with the outline boundary points of the surface mesh. Precise alignment can be achieved using the Iterative Closest Point (ICP) algorithm.
[0057] Establish the mapping relationship between pixel coordinates and surface texture coordinates of the transformed clothing image. For each triangular facet of the surface mesh, the coordinates of its three vertices in 3D space (V1, V2, V3) and the corresponding UV texture coordinates (UV1, UV2, UV3) are known. For any pixel p in the transformed clothing image, its pixel coordinates are (up, vp). Through the inverse transformation P-1 of the perspective transformation matrix P, the corresponding 3D space coordinates Pp = P-1 × [up, vp, 1]T of the pixel are calculated. Determine which triangular facet the 3D point Pp falls within by calculating the centroid coordinates (α, β, γ) of Pp relative to the triangle (V1, V2, V3), where α + β + γ = 1 and α, β, γ ≥ 0. The centroid coordinates are obtained by solving the linear equation Pp = α × V1 + β × V2 + γ × V3. Once the barycenter coordinates are determined, the corresponding UV texture coordinates UVp of that pixel can be calculated through interpolation using the same barycenter coordinates, i.e., UVp = α × UV1 + β × UV2 + γ × UV3. By traversing all pixels of the transformed garment image, a complete mapping relationship is established between each pixel coordinate (up, vp) and the surface texture coordinates UVp. The mapping process uses a bilinear interpolation algorithm to ensure a smooth and natural texture transition. When the garment image pixels are mapped onto the surface of the 3D model, the impact of normal vector changes on lighting is considered, and local lighting effects are adjusted through normal mapping to enhance realism.
[0058] Based on the local curvature changes of corresponding body parts in the user's 3D body model, the mapped clothing image is deformed and adjusted. The Gaussian curvature and average curvature of each point on the body surface are calculated, identifying high-curvature areas such as elbows, knees, and shoulders. In these areas, the local stretching or compression parameters of the clothing texture are dynamically adjusted according to the curvature values to simulate the deformation effect of real fabric on the human body's curved surface. The clothing texture is appropriately compressed in high-curvature areas, while the original texture characteristics are maintained in low-curvature areas.
[0059] For clothing made of elastic materials, the degree of deformation is adjusted according to the material properties; for example, the elongation coefficient of knitwear is greater than that of denim. Physical simulations are used to calculate the stress and deformation state of the clothing, ensuring that different materials produce different fits on the same body parts.
[0060] The adjusted images of individual clothing items are rendered and composited with the user's 3D body model. The rendering process considers factors such as ambient lighting, shadows, and reflections, using Physically Based Rendering (PBR) technology. Different reflectivity, roughness, and metallicity parameters are set according to the clothing material properties. When generating the final virtual try-on image, the parts of the clothing that come into contact with the body are specially processed to ensure a natural transition at the boundary without any obvious discontinuity.
[0061] In one optional implementation, based on the virtual try-on image, a quantitative evaluation result is obtained by calculating the coverage, edge fit, and local deformation rate of the clothing item on the user's three-dimensional body model. The initial matching schemes are then filtered and ranked according to the quantitative evaluation result to obtain recommended matching schemes, including: Extract the actual coverage area of the clothing item on the user's three-dimensional body model from the virtual try-on image, calculate the area ratio of the actual coverage area to the theoretical coverage area, and obtain the coverage integrity of the clothing item. Calculate the distribution of normal distances between the edge contour lines of the clothing items in the virtual try-on image and the surface contour lines of the corresponding body parts in the user's 3D body model, and perform statistical analysis to obtain the edge fit. Deformation analysis is performed on the texture mesh of the clothing item in the virtual try-on image, the deformation ratio of each mesh unit in the texture mesh is calculated, and the deformation ratio is statistically analyzed in different regions of the texture mesh to obtain the local deformation rate. The coverage integrity, edge fit, and local deformation rate are each assigned an evaluation weight and then fused to obtain a quantitative evaluation result of the initial matching scheme; The initial pairing schemes whose quantitative evaluation results are higher than the preset screening threshold are retained and arranged in descending order of the quantitative evaluation results to obtain the recommended pairing schemes.
[0062] Based on virtual try-on images, three key indicators are calculated: coverage integrity, edge fit, and local deformation rate, to comprehensively evaluate the actual wearing effect of clothing items on the user's body.
[0063] To calculate coverage completeness, the actual coverage area of the clothing item on the user's 3D body model is extracted from the virtual try-on image. Depth image segmentation technology can be used to effectively separate the clothing area from the human body area in the image, obtaining the pixel-level coverage area of the clothing. Simultaneously, based on a preset theoretical coverage area template for the clothing type, the ratio of the actual coverage area to the theoretical coverage area is calculated to obtain the coverage completeness of the clothing item. For example, for a standard T-shirt, the theoretical coverage area should include the chest, shoulders, and upper abdomen. If the actual coverage area is A and the theoretical coverage area is B, then the coverage completeness is A / B. When this ratio is close to 1, it indicates that the clothing coverage is good.
[0064] Edge fit assessment focuses on the fit between the garment's edges and the human body's contours. Edge detection algorithms extract the edge contours of individual garments from virtual try-on images, while simultaneously acquiring the surface contours of the corresponding locations on the user's 3D body model. The normal distance between the two contours is sampled and measured, and distance values are calculated at multiple sampling points to form distance distribution data. The edge fit index is obtained by calculating the mean, standard deviation, and maximum deviation of the distance distribution. Taking a shirt collar as an example, multiple distance sample points between the collar edge and the neck contour are collected. A small average distance and a low standard deviation indicate a high collar fit.
[0065] Local deformation rate analysis assesses the deformation of the garment's texture mesh. During virtual try-on, the garment's texture mesh deforms to varying degrees depending on the user's body shape. The garment model is divided into several mesh units, and the geometric characteristics of each unit are compared between its original state and its worn state. The changes in area, angle, or length ratio of each mesh unit are calculated to obtain the local deformation ratio. The deformation rate is weighted and statistically analyzed according to different functional areas of the garment (such as the chest, waist, and shoulders) to form a regional deformation rate assessment. A reasonable deformation rate should be within a certain range; too high a rate indicates the garment is too tight, while too low a rate indicates it is too loose.
[0066] When integrating the above three indicators, a weighted allocation method is adopted to take into account the specificities of different clothing types and wearing scenarios. For example, for close-fitting underwear, the weight of edge fit can be set to 0.4, the weight of coverage integrity to 0.3, and the weight of local deformation rate to 0.3; while for outerwear, the weight of coverage integrity can be set to 0.5, the weight of edge fit to 0.3, and the weight of local deformation rate to 0.2. A comprehensive score is obtained by weighted summation, which serves as the quantitative evaluation result.
[0067] Set an appropriate preset filtering threshold, such as requiring a score greater than 0.7 to enter the recommendation list. Retain the initial outfit combinations that exceed the threshold and sort them from highest to lowest score to form the final recommended outfit combinations. This sorting ensures that users see clothing combinations that best suit their body type first.
[0068] This quantitative evaluation method addresses the issues of subjective evaluation and lack of precise measurement in traditional virtual fitting systems. By quantifying the effect of clothing wearing through objective indicators, it improves the practicality and user experience of virtual fitting systems.
[0069] This invention provides an intelligent clothing matching recommendation and virtual try-on system based on user profiles. The system includes: The first unit is used to perform multi-dimensional feature extraction and semantic mapping on user profile data to generate user representation vectors, and to perform feature encoding and semantic space mapping on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors. The second unit is used to construct a relationship graph structure between clothing items based on the semantic similarity between the user representation vector and the clothing item representation vector, and to perform graph traversal and subgraph extraction operations on the relationship graph structure to generate an initial matching scheme that satisfies connectivity constraints and compatibility constraints. The third unit is used to perform semantic segmentation and body node detection on user body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a three-dimensional model of the user's body is constructed. The fourth unit is used to acquire the two-dimensional image data of each clothing item in the initial matching scheme and perform geometric transformation and texture mapping, map the clothing items to the corresponding body parts of the user's three-dimensional body model, perform deformation adjustment on the mapped clothing item images, and generate virtual try-on images. The fifth unit is used to calculate the coverage, edge fit, and local deformation rate of clothing items on the user's three-dimensional body model based on the virtual try-on image, obtain quantitative evaluation results, and filter and sort the initial matching schemes according to the quantitative evaluation results to obtain recommended matching schemes.
[0070] A third aspect of the present invention provides an electronic device, comprising: processor; Memory used to store processor-executable instructions; The processor is configured to invoke instructions stored in the memory to execute the aforementioned method.
[0071] A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the aforementioned method.
[0072] This invention can be a method, apparatus, system, and / or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of the invention.
[0073] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for intelligent clothing matching recommendation and virtual try-on based on user profiles, characterized in that, include: Multi-dimensional feature extraction and semantic mapping are performed on user profile data to generate user representation vectors. Feature encoding and semantic space mapping are performed on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors. Based on the semantic similarity between the user representation vector and the clothing item representation vector, a relationship graph structure between clothing items is constructed. Graph traversal and subgraph extraction operations are performed on the relationship graph structure to generate an initial matching scheme that satisfies connectivity constraints and compatibility constraints. Semantic segmentation and body node detection are performed on user body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a three-dimensional model of the user's body is constructed. Two-dimensional image data of each clothing item in the initial matching scheme are obtained and geometric transformation and texture mapping are performed. The clothing items are mapped to the corresponding body parts of the user's three-dimensional body model. The mapped clothing item images are deformed and adjusted to generate virtual try-on images. Based on the virtual try-on images, quantitative evaluation results are obtained by calculating the coverage, edge fit, and local deformation rate of clothing items on the user's three-dimensional body model. Based on the quantitative evaluation results, the initial matching schemes are screened and sorted to obtain recommended matching schemes.
2. The method according to claim 1, characterized in that, Multi-dimensional feature extraction and semantic mapping are performed on user profile data to generate user representation vectors. Feature encoding and semantic space mapping are performed on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors, including: By constructing a multi-granularity feature hierarchy structure for user profile data, the user profile data is decomposed into atomic-level feature units and combined-level feature clusters, and feature encoding is performed on each, generating a set of user sub-representation vectors with a hierarchical structure. Cross-level feature interaction operations are performed on the user sub-representation vector set. By establishing attention weight matrices between different granularity levels, the dependency relationship between atomic-level features and combined-level features is captured, and the user representation vector that integrates multi-granularity feature information is generated. The attribute feature data of individual clothing items are nonlinearly transformed in multiple independent feature subspaces to generate multiple perspective representation vectors. A semantic space alignment operation is performed on the multiple viewpoint representation vectors and the user representation vector. By constructing a bidirectional mapping relationship between the user representation vector and each viewpoint representation vector, the multiple viewpoint representation vectors are projected onto the semantic space where the user representation vector is located. The projected viewpoint representation vectors are then fused with consistency constraints to generate the clothing item representation vector that is in the same semantic space as the user representation vector.
3. The method according to claim 1, characterized in that, Based on the semantic similarity between the user representation vector and the clothing item representation vector, a relationship graph structure for the clothing items is constructed. Graph traversal and subgraph extraction operations are performed on the relationship graph structure to generate initial matching schemes that satisfy connectivity and compatibility constraints, including: A multi-dimensional similarity metric matrix is constructed by calculating the distance metric between the representation vectors of the clothing items. An adaptive threshold segmentation strategy is set for each dimension similarity metric in the multi-dimensional similarity metric matrix. Based on the adaptive threshold segmentation strategy, clothing item pairing relationships that satisfy similarity constraints in at least one semantic dimension are selected. The clothing item pairing relationships are transformed into edge connections of a graph structure to construct the relationship graph structure between clothing items. Calculate the matching score between the user representation vector and the representation vector of each clothing item in the relationship graph structure, and select a preset number of clothing item nodes with the highest matching scores as the starting node set for graph traversal. Starting from each starting node in the set of starting nodes, a depth-first traversal is performed. During the traversal, the edge weights between adjacent clothing item nodes on the path are accumulated and a penalty factor for the path length is introduced to obtain a cumulative compatibility evaluation value. When the cumulative compatibility evaluation value is lower than a preset compatibility threshold or the traversal depth exceeds the preset maximum number of matching items, the traversal is terminated and the traversal path is extracted as a candidate matching subgraph. The candidate matching subgraphs are subjected to structural integrity verification, and the candidate matching subgraphs that pass the structural integrity verification are transformed into the initial matching scheme.
4. The method according to claim 3, characterized in that, An adaptive threshold segmentation strategy is set for each dimension similarity measure value in the multi-dimensional similarity measure matrix. Based on the adaptive threshold segmentation strategy, clothing item pairing relationships that satisfy similarity constraints in at least one semantic dimension are selected, including: The frequency distribution of similarity measure values for each semantic dimension in the multi-dimensional similarity measure matrix is statistically analyzed. By performing multi-peak detection on the frequency distribution, the clustering pattern of the clothing item representation vector in each semantic dimension is identified. Based on the peak position and inter-peak interval of the clustering pattern, the boundary of the main peak region in the frequency distribution is determined and used as a strong compatibility threshold, and the transition boundary between the secondary peak region and the main peak region is determined and used as a weak compatibility threshold, thus obtaining the adaptive threshold segmentation strategy of the semantic dimension. Traverse the clothing item pairings in the multi-dimensional similarity measurement matrix, and compare the similarity measurement values of the clothing item pairings in each semantic dimension with the strong compatibility threshold and the weak compatibility threshold respectively based on the adaptive threshold segmentation strategy. Record the judgment results of the clothing item pairings satisfying the strong compatibility threshold and the weak compatibility threshold in each semantic dimension, and construct a cross-dimensional compatibility judgment matrix. Based on the cross-dimensional compatibility judgment matrix, a multi-level filtering strategy is executed. The multi-level filtering strategy prioritizes retaining clothing item pairs that meet strong compatibility threshold constraints in at least one semantic dimension, and secondarily retains clothing item pairs that meet weak compatibility threshold constraints in multiple semantic dimensions, thereby filtering out clothing item pairing relationships that satisfy the multi-level filtering strategy.
5. The method according to claim 1, characterized in that, Semantic segmentation and body node detection are performed on user body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a 3D model of the user's body is constructed, including: Multi-scale feature extraction is performed on the user's body image data, and semantic segmentation operations are performed at different spatial resolution levels of the multi-scale features to obtain part segmentation results at different resolution levels. The boundary refinement process is performed on the part segmentation results of different resolution levels through a cross-level feature fusion mechanism. The boundary refinement process is performed by fusing and correcting the boundary detail information of different levels with semantic consistency information to obtain the part segmentation mask of the user's body. Body node detection is performed on the user's body image data to obtain the coordinates of the user's skeletal nodes. Based on the part segmentation mask, spatial constraint verification is performed on the skeletal node coordinates to determine whether the skeletal node coordinates are located within the region of the corresponding body part in the part segmentation mask. When the coordinates of the bone node are outside the area of the corresponding body part, the coordinates of the bone node are corrected according to the area boundary and centroid position in the part segmentation mask to obtain the corrected bone node coordinates. Based on the part segmentation mask and the corrected skeletal node coordinates, a three-dimensional model of the user's body is established.
6. The method according to claim 1, characterized in that, The process involves acquiring two-dimensional image data of each garment item in the initial outfit scheme, performing geometric transformations and texture mapping, mapping the garment items to corresponding body parts of the user's three-dimensional body model, and then deforming and adjusting the mapped garment images to generate virtual try-on images. Two-dimensional image data of each clothing item in the initial matching scheme are obtained, foreground segmentation is performed on the two-dimensional image data, the outline boundary of the clothing item is extracted, and the perspective transformation matrix and projection transformation matrix are calculated based on the three-dimensional geometric information of the corresponding body parts in the user's three-dimensional body model. The perspective transformation matrix and the projection transformation matrix are sequentially applied to the two-dimensional image data of the garment to perform geometric transformation, thereby obtaining a transformed image of the garment that matches the spatial posture of the corresponding body part. The surface of the corresponding body part in the user's 3D model is meshed and the surface texture coordinates of the corresponding body part are extracted. The outline boundary of the geometrically transformed clothing item is aligned and matched with the outline boundary of the surface mesh to establish a mapping relationship between the pixel coordinates of the transformed clothing item image and the surface texture coordinates. Based on the mapping relationship, the texture information of the transformed clothing item image is mapped to the surface of the corresponding body part of the user's 3D model to obtain the mapped clothing item image. Based on the local curvature changes of corresponding body parts in the user's 3D body model, the local area of the mapped clothing item image is stretched or compressed, and then rendered and synthesized with the user's 3D body model to generate the virtual try-on image.
7. The method according to claim 1, characterized in that, Based on the virtual try-on images, quantitative evaluation results are obtained by calculating the coverage, edge fit, and local deformation rate of clothing items on the user's 3D body model. Based on these quantitative evaluation results, the initial outfit combinations are filtered and ranked to obtain recommended outfit combinations, including: Extract the actual coverage area of the clothing item on the user's three-dimensional body model from the virtual try-on image, calculate the area ratio of the actual coverage area to the theoretical coverage area, and obtain the coverage integrity of the clothing item. Calculate the distribution of normal distances between the edge contour lines of the clothing items in the virtual try-on image and the surface contour lines of the corresponding body parts in the user's 3D body model, and perform statistical analysis to obtain the edge fit. Deformation analysis is performed on the texture mesh of the clothing item in the virtual try-on image, the deformation ratio of each mesh unit in the texture mesh is calculated, and the deformation ratio is statistically analyzed in different regions of the texture mesh to obtain the local deformation rate. The coverage integrity, edge fit, and local deformation rate are each assigned an evaluation weight and then fused to obtain a quantitative evaluation result of the initial matching scheme; The initial pairing schemes whose quantitative evaluation results are higher than the preset screening threshold are retained and arranged in descending order of the quantitative evaluation results to obtain the recommended pairing schemes.
8. A user profile-based intelligent clothing matching recommendation and virtual try-on system, used to implement the method as described in any one of claims 1-7, characterized in that, include: The first unit is used to perform multi-dimensional feature extraction and semantic mapping on user profile data to generate user representation vectors, and to perform feature encoding and semantic space mapping on the attribute feature data of clothing items to generate clothing item representation vectors that are in the same semantic space as the user representation vectors. The second unit is used to construct a relationship graph structure between clothing items based on the semantic similarity between the user representation vector and the clothing item representation vector, and to perform graph traversal and subgraph extraction operations on the relationship graph structure to generate an initial matching scheme that satisfies connectivity constraints and compatibility constraints. The third unit is used to perform semantic segmentation and body node detection on user body image data. By extracting the segmentation mask of the user's body parts and the coordinates of the skeletal nodes, a three-dimensional model of the user's body is constructed. The fourth unit is used to acquire the two-dimensional image data of each clothing item in the initial matching scheme and perform geometric transformation and texture mapping, map the clothing items to the corresponding body parts of the user's three-dimensional body model, perform deformation adjustment on the mapped clothing item images, and generate virtual try-on images. The fifth unit is used to calculate the coverage, edge fit, and local deformation rate of clothing items on the user's three-dimensional body model based on the virtual try-on image, obtain quantitative evaluation results, and filter and sort the initial matching schemes according to the quantitative evaluation results to obtain recommended matching schemes.
9. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1 to 7.
10. A computer-readable storage medium having computer program instructions stored thereon, characterized in that, When the computer program instructions are executed by the processor, they implement the method described in any one of claims 1 to 7.