A store relocation analysis method and system

By using multi-objective weighted Bayesian iterative hierarchical grouping and machine learning models, the scientific issues of store grouping and operational goal setting were solved, enabling refined and digital upgrades in store management and improving the accuracy of relocation analysis and operational efficiency.

CN122243565APending Publication Date: 2026-06-19GUANGZHOU TIANCHEN HEALTH TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU TIANCHEN HEALTH TECH CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In the current store lifecycle management, the grouping lacks a scientific labeling system and iterative layering logic, the setting of operational goals has not achieved quantitative gain assessment, and the accuracy and generalization ability of the member return prediction model are insufficient, making it difficult to support the refined management needs of large-scale stores.

Method used

A multi-objective weighted Bayesian iterative hierarchical grouping method is used to label stores. Combined with weighted K-Means clustering and WCSS value identification elbow diagram, a machine learning model is constructed to predict member return sales, quantify operational goals, and optimize operating profit.

Benefits of technology

It achieves scientific and precise store grouping, improves the accuracy of relocation analysis, quantifies the improvement effect of operational goals, and supports the digital and refined upgrading of store management.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243565A_ABST
    Figure CN122243565A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for analyzing store relocation. The method includes: S1, labeling stores and grouping them using a multi-objective weighted Bayesian iterative hierarchical grouping method, grouping stores with similar profitability conditions into the same group; S2, for each group of stores, setting operational goals for each store within the group based on the median value of its operational indicators, quantifying the effect of process indicator optimization on improving operating profit; S3, constructing a machine learning model based on the characteristic data of historically relocated stores and their member return data to predict the sales revenue from member return after the relocation of existing stores. This invention solves the technical pain points in the grouping, operation, and relocation stages of existing store lifecycle management, realizing the digital, refined, and intelligent upgrade of store management, and providing solid technical support for the large-scale and high-quality development of physical retail enterprises.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of information mining technology, and in particular to a method and system for analyzing store relocation. Background Technology

[0002] The development of the physical retail industry requires lifecycle management of each store, including site selection, operation, and closure / relocation. With the development of technology, its application has gone through three core development stages: The first stage is the traditional manual management stage, in which the selection of store locations, the setting of operational goals, and the decision to close or relocate stores all rely entirely on the personal experience of managers. Store groups are mostly based on administrative and operational structures, without taking into account the differences in the objective characteristics of the stores themselves. This stage is characterized by low management efficiency, strong decision-making subjectivity, and difficulty in adapting to the needs of large-scale store expansion.

[0003] The second stage is the initial algorithm application stage. With the rise of information mining technology, some companies have begun to apply simple algorithms to the store site selection process, realizing semi-automation of the site selection process and reducing the error of human decision-making to a certain extent. However, the formulation of operational goals after the store opens and the decision to withdraw / relocate still rely on the experience judgment of managers, and a standardized and quantitative technical solution has not been formed.

[0004] The third stage is the intelligent exploration stage, where machine learning technology begins to be applied to the field of store management. Some solutions attempt to group stores through clustering algorithms and predict store operation results through simple models. However, existing solutions still have obvious defects: store grouping lacks a scientific labeling system and iterative hierarchical logic, the formulation of operational goals has not achieved quantitative gain assessment, and the accuracy and generalization ability of the member return prediction model are insufficient, making it difficult to support the refined management needs of large-scale stores. Summary of the Invention

[0005] This invention provides a technical solution based on data analysis that covers the entire process of "grouping-operation-relocation", which can effectively improve the accuracy of relocation analysis.

[0006] The technical solution of this invention is a store relocation analysis method, comprising the following steps: S1. Label the stores and use a multi-objective weighted Bayesian iterative hierarchical grouping method to group stores with similar profitability conditions into the same group. S2. For each group of stores, based on the median value of its operating indicators, set operating targets for each store in the group, and quantify the effect of process indicator optimization on improving operating profit. S3. Based on the feature data of historically closed stores and their member return data, a machine learning model is built to predict the sales revenue of existing stores if they are relocated.

[0007] Furthermore, step S1 specifically includes: S11. Process the raw data to obtain the tagged store features; S12. Preset the target indicators and information weights for grouping, and distinguish the tagged store characteristics into necessary labels and unnecessary labels. The unnecessary labels include non-exempt labels and exempt labels. S13. Perform Cartesian integration on all necessary labels, output the groups with sample sizes less than a set threshold, and use the groups with sample sizes greater than the set threshold as the current grouping path nodes. S14. For the current grouping path node, exclude non-exempt labels that would result in a sample size of less than a set threshold group, calculate the mutual information value between the remaining non-exempt labels and the target index, and further group the labels with the largest mutual information greater than 0. S15. In the current grouping path node, exclude the labels of small sample groups that still exist after merging them into other classes. Calculate the mutual information of the remaining labels and select the best one to use. When there are no usable exemption labels in the current grouping node, the grouping ends.

[0008] Furthermore, the tagging described in step S11 includes three methods: For feature data with clear business definitions or business classification thresholds, labels are generated based on the business definitions or classification thresholds, and the labels are discrete categorical variables. For feature data that has no clear business definition or business classification threshold but has a requirement for the number of business categories, the preset number of business categories is used as the k value, and the relevant feature data is weighted K-Means clustered for labeling. For feature data without business requirements, WCSS is used to identify the optimal inflection point in the elbow plot as the k value for weighted K-Means clustering for labeling.

[0009] Furthermore, the weighted K-Means clustering steps include: Weights are assigned to each feature variable participating in clustering based on business experience. The weights range from [0,1], and the sum of the weights of all feature variables is 1. For each feature variable After Min-Max standardization, we get The feature variables are mapped to a preset interval of [0,1] and multiplied by the corresponding weights to obtain the processed store features. ; Based on a preset k value, k coordinates are randomly selected from the processed store features as initial cluster centers; Calculate the Euclidean distance from each processed store feature to each initial cluster center, and assign it to the cluster with the closest distance. Calculate the mean of the processed store features in each cluster and use it as the new cluster center; Repeat the above steps until the change in the location of the cluster center is less than the preset convergence threshold, and use the final cluster number as the label classification number.

[0010] Furthermore, the mutual information value mentioned in step S14 The calculation formula is:

[0011] in, For store tags, the corresponding probability distribution is: ,and , For the target index, the corresponding probability density function is: The conditional probability density function is , It is a conditional distribution With marginal distribution The KL divergence.

[0012] Furthermore, step S2 specifically includes: S21. Decompose the North Star indicator of store profitability into process indicators that have a direct impact on the North Star indicator. S22. Quantify the gains of each process indicator optimization on the North Star indicator; S23. Use the median value of each process indicator of stores in the same group as the corresponding operational target for each store in the group, and identify opportunities for improvement in store operations and the extent of improvement. S24. Based on the gain prediction coefficient and the improvement of process indicators, calculate the increase in operating profit for the store after the target is achieved, and output the store operation target and profit improvement prediction results.

[0013] Furthermore, step S3 specifically includes: S31. Obtain data on historically closed stores, and perform data cleaning and standardization on the obtained data. S32. Calculate the sales contribution from returning members of historically closed stores i. The calculation formula is:

[0014] in For the relocation of stores Members who made purchases in the quarter prior to the closure and who returned to the membership within the quarter of closure. For the relocation of stores Members who made purchases in the quarter prior to the closure of operations, ; and All of them are stores that are being closed. Sales generated in the process; S33. The XGBoost algorithm is adopted. As a feature, Using the target variable, a regression prediction model is constructed, trained, and evaluated to obtain the coefficient of determination. With Adjusted Determination Coefficient ; S34. When the model's coefficient of determination With Adjusted Determination Coefficient When the difference is greater than the preset threshold, the model is retrained after feature filtering to obtain the final member return prediction model. S35. Use the final member return prediction model to predict the existing stores'... The value is multiplied by the corresponding store's sales revenue to estimate the potential return sales revenue after the store relocation, and the estimated return sales revenue is output.

[0015] Further, in step S34, the feature selection includes: Calculate the variance of each feature variable and discard features with variance close to 0; Calculate the relationship between each feature variable and the target variable. The correlation coefficient is used to discard features with a correlation coefficient close to 0; Calculate the correlation coefficients between the feature variables and discard features with a correlation coefficient greater than 0.8; Core features are selected based on the feature importance index embedded in the XGBoost model.

[0016] Furthermore, in step S34, the methods for model training and evaluation include: The sample set is randomly divided into training and test sets in an 8:2 ratio. 20% of the samples in the training set are extracted as a validation set for early stopping monitoring to prevent overfitting. Configure the basic parameters and hyperparameter search range for random search; The mean squared error was used as the evaluation index for the training process to optimize the hyperparameters, and the hyperparameter combination with the smallest mean squared error was selected as the optimal hyperparameter combination. Mean square error The coefficient of determination is used to measure the mean squared deviation between predicted and actual values. This reflects the model's ability to explain variations in sales revenue contribution; Use the adjusted coefficient of determination Coefficient of determination This is used to make corrections to constrain the number of redundant features in the model, thereby more objectively evaluating the fitting effect of the multi-feature regression model.

[0017] This invention also provides a store relocation analysis system, including a store grouping unit, an operational target customization unit, and a prediction unit, wherein: The store grouping unit is used to label stores. It uses a multi-objective weighted Bayesian iterative hierarchical grouping method to group stores with similar profitability into the same group. The operational target customization unit is used to set operational targets for each group of stores based on the median value of their operational indicators, and to quantify the effect of process indicator optimization on improving operating profit. The prediction unit is used to build a machine learning model based on the feature data of historically closed stores and their member return data, in order to predict the sales revenue of existing stores if they are relocated.

[0018] In practical applications, the modules described in the methods and systems disclosed in this invention can be deployed on a single target server, or each module can be deployed independently on different target servers. In particular, as needed, to provide more powerful computing capabilities, the modules can also be deployed on a cluster of target servers.

[0019] Therefore, this invention has constructed a scientific, complete, and practical store relocation analysis technology system, which solves the technical pain points in the existing store lifecycle management of grouping, operation, and relocation, and realizes the digital, refined, and intelligent upgrade of store management, providing solid technical support for the large-scale and high-quality development of physical retail enterprises.

[0020] To provide a clearer and more comprehensive understanding of the present invention, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are merely some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without any creative effort.

[0022] Figure 1 This is a schematic diagram of the store relocation analysis method according to an embodiment of the present invention. Detailed Implementation

[0023] Please see Figure 1 To address the shortcomings of existing technologies, this invention proposes a store relocation analysis method, comprising the following steps: A store relocation analysis method includes the following steps: S1. Label the stores and use a multi-objective weighted Bayesian iterative hierarchical grouping method to group stores with similar profitability conditions into the same group. S2. For each group of stores, based on the median value of its operating indicators, set operating targets for each store in the group, and quantify the effect of process indicator optimization on improving operating profit. S3. Based on the feature data of historically closed stores and their member return data, a machine learning model is built to predict the sales revenue of existing stores if they are relocated.

[0024] The technical solution of this application will be described below with reference to various preferred embodiments.

[0025] S1. Stores are tagged and grouped using a multi-objective weighted Bayesian iterative hierarchical grouping method. Stores with similar profitability are grouped together, providing a data foundation for subsequent operational target analysis and formulation. This includes: S11. Process the raw data to obtain tagged store characteristics, specifically including data collected from offline data collection terminals and data accumulated from expert experience: Based on population data (including population profile data such as number of residents, age, and gender), POI data (including business environment data such as number of competitors and market size), and POS transaction data (such as store operation data such as order volume, average number of items per order, average price per item, sales revenue, and gross profit).

[0026] The data mentioned above was obtained through offline data collection terminals in chain pharmacies and third-party data platforms to ensure the authenticity and completeness of the data.

[0027] In addition, this application also incorporates data accumulated by experts based on their experience. These data, determined by business experts, include objective factors that cannot be changed by operational methods, such as: customer flow profiles (age, gender), market size (number of residents in the surrounding area), industry competition (number of pharmacies in the surrounding area), medical insurance qualifications, store age, market share, etc. These data also have a significant impact on the analysis of this application.

[0028] Based on the data collected by the acquisition terminal and the data accumulated by experts, this application embodiment uses three labeling methods to label these objective factors.

[0029] 1. Tagging feature data with clear business definitions or business classification thresholds: This includes "medical insurance qualification labels" and "store age group labels," which are directly defined using business definitions. Medical insurance qualification label: a discrete categorical variable, divided into "with medical insurance qualification" (e.g., 45 stores) and "without medical insurance qualification" (e.g., 55 stores). Store age group label: discrete categorical variable, divided into "stores that have been open for less than one year" (e.g., 28 stores) and "stores that have been open for more than one year" (e.g., 72 stores).

[0030] 2. Tagging feature data that lacks a clear business definition but has a requirement for a certain number of business categories: Taking "market share type labeling" as an example, the business requirement is to divide market share types into 4 categories (i.e., preset k=4), and use weighted K-Means clustering for labeling. The specific steps include: ① Feature weighting: The feature variables participating in clustering are "store share (F1), store density (F2), and surrounding resident population (F3)". Based on the experience of business experts, the number of residents has the greatest impact on market share, followed by the store share, and store density has the least impact. Weights are assigned accordingly. The weight values ​​range from [0,1] and the sum of the weights is 1. The weight allocation has been verified and confirmed by three business experts to be reasonable.

[0031] ② Standardization of feature variables: For example, performing Min-Max standardization on F1, F2, and F3 for 100 stores. The core of Min-Max standardization is to map the feature variables to the [0,1] interval to eliminate the influence of units. The standardization formula is:

[0032] in, Representing the Home stores, , For the i-th feature (the i-th feature) The Min-Max standardized value of (number of stores). For the first The original value of the first feature (the first feature) The first store (original values ​​of each feature), For the first The minimum value of each feature across all 100 stores ( , For the first The maximum value of each feature across all 100 stores ( .

[0033] After standardization ...and then multiply by the corresponding weights to obtain the processed store features: [ ], used for subsequent clustering. For example, a certain store =0.2, min()=0.05, max()=0.3, then =(0.2-0.05) / (0.3-0.05)=0.6, =0.4×0.6=0.24, and similarly calculate the others to obtain the processed feature coordinates of the store.

[0034] ③ Initial cluster center selection: Based on the preset k=4, four coordinates are randomly selected from the processed 100 store feature coordinates as initial cluster centers, which are denoted as C1, C2, C3 and C4 respectively.

[0035] ④ Assign store features to clusters: Calculate the Euclidean distance from the coordinates of each processed store feature to the four initial cluster centers. The Euclidean distance formula is:

[0036] Where (x1, x2, x3) are the processed store feature coordinates, and (y1, y2, y3) are the cluster center coordinates. The store is assigned to the cluster containing the nearest cluster center. For example, if a store's feature coordinates are 0.12 from C1, 0.18 from C2, 0.25 from C3, and 0.21 from C4, then the store is assigned to the cluster containing C1.

[0037] ⑤ Cluster Center Update: Calculate the mean of the processed feature coordinates of all stores in each cluster, and use this mean as the new cluster center. For example, if the cluster containing C1 has 25 stores, calculate the mean of the processed feature coordinates of these 25 stores ((Σw1×std_F1) / 25, (Σw2×std_F2) / 25, (Σw3×std_F3) / 25), and use this mean as the new C1.

[0038] ⑥ Iteration Termination: Repeat steps ④ and ⑤ until the change in the position of the cluster center is less than the preset convergence threshold (e.g., set to 0.001), that is, in two adjacent iterations, the change in the coordinates of the four cluster centers is less than 0.001, then stop the iteration.

[0039] Four clusters were ultimately obtained, numbered 0, 1, 2, and 3, corresponding to the "market share type label" [market share type 0, market share type 1, market share type 2, market share type 3]. Among them, market share type 0 is high market share stores (22 stores), market share type 1 is medium-high market share stores (28 stores), market share type 2 is medium-low market share stores (25 stores), and market share type 3 is low market share stores (25 stores).

[0040] 3. Tagging feature data without business requirements: Taking "population profile tags" as an example, without a clear requirement for the number of business categories, WCSS (inertia value, i.e., the sum of the squares of all samples within a K-Means cluster to its cluster center) is used to identify the optimal inflection point in the elbow diagram as the k value for weighted K-Means clustering. The specific steps are as follows: ① The feature variables participating in the clustering are "number of residents in the surrounding area (F1), age distribution (F2), and gender ratio (F3)", which are assigned weights [w1=3, w2=1, w3=1] (for example, the number of residents is considered to be the key factor causing differences in store sales). After standardization, the processed store features are obtained. ② Set k=2, 3, 4, 5, 6 respectively, perform weighted K-Means clustering, calculate the WCSS value corresponding to each k value, and draw an elbow plot (horizontal axis is the k value, vertical axis is the WCSS value); ③ Identify the optimal inflection point in the elbow plot, that is, the turning point where the WCSS value begins to drop sharply. In this embodiment, the optimal inflection point corresponds to k=3, so k=3 is determined for clustering. ④ Perform weighted K-Means clustering (the steps are the same as the market share type label clustering above), and finally obtain 3 clusters, corresponding to the "population profile label" [population type 0, population type 1, population type 2], which respectively correspond to the young people living in high-density areas, the middle-aged people living in medium-density areas, and the elderly people living in low-density areas.

[0041] Through three labeling methods, the objective characteristics of stores were standardized and discretized, transforming abstract objective factors into classification labels that can be used for grouping. Weighted K-Means clustering, combined with business experience, assigns weights to ensure that the clustering results meet business needs. The application of WCSS values ​​and elbow plots enables the selection of the optimal k value for labels without business requirements, avoiding the subjectivity of k value setting and providing accurate feature support for subsequent store grouping.

[0042] S12. Preset the target indicators and information weights for grouping, and distinguish the tagged store characteristics into necessary labels and unnecessary labels, among which unnecessary labels are further divided into non-exempt labels and exempt labels.

[0043] Based on the business objectives of chain pharmacies, the target indicators for the pre-set groups are "net profit (Y1) and sales (Y2)", with information weights of 0.6 and 0.4 respectively, to ensure that the grouping strategy can "separate" the target indicators. Even if the net profit and sales performance of stores in the same group are similar, the performance of stores in different groups will be significantly different.

[0044] Essential labels are those that are incomparable between different categories of stores, such as "medical insurance qualification label," because pharmacies with medical insurance qualifications can sell prescription drugs, and their business scope and sales volume are vastly different from those of pharmacies without medical insurance qualifications.

[0045] Non-essential tags are those that do not fundamentally differentiate stores across different categories and may be comparable when combined with other tags. These include "store age group tags, market share type tags, and demographic profile tags," which are specifically categorized as follows: Non-exempt labels: The usage logic is strict. After use, the resulting groups cannot contain groups with a sample size less than the set threshold (in this embodiment, the threshold is 10 stores). Otherwise, the label will not be grouped. In this embodiment, the non-exempt labels are "market share type label" and "store age group label".

[0046] Exemption label: Use it with a loose logic and the lowest priority. After grouping necessary labels and non-exempt labels, further investigation will be conducted. If the grouped groups contain groups with a sample size less than the set threshold, the exemption label category of the small sample group will be merged into other categories. If the sample size of other categories is still less than the set threshold, then the label will not be grouped. In this embodiment, the exemption label is "demographic profile label".

[0047] By setting target indicators and information weights, the business orientation of the grouping is clarified, ensuring that the grouping results can support operational decisions. The better classification of labels (necessary / non-necessary, non-exempt / exempt) avoids invalid grouping caused by unreasonable labels. At the same time, by setting threshold limits, it is ensured that the sample size of each group after grouping is sufficient and statistically significant, laying the foundation for subsequent stratified grouping.

[0048] S13. Perform Cartesian integration on all necessary labels, output the groups with sample sizes less than a set threshold, and use the groups with sample sizes greater than the set threshold as the current grouping path nodes.

[0049] Cartesian integral groups were applied to all necessary labels (medical insurance qualification labels). Since there are two categories for medical insurance qualification labels (with medical insurance qualification, without medical insurance qualification), two groups were created: Group 1: Stores with medical insurance qualifications, totaling 45 stores, with a sample size greater than the set threshold (10 stores), serving as the current grouping path node, proceed to the next step; Group 2: Stores without medical insurance qualifications, totaling 55 stores. The sample size is greater than the set threshold (10 stores). This group is the current grouping path node, proceeding to the next step. It can be seen that the sample sizes of the two groups obtained by the Cartesian integral with necessary labels are both greater than the set threshold, and there are no small sample groups that need to be directly output.

[0050] By using the Cartesian integral of the necessary labels, completely incomparable stores are first distinguished to avoid mixing stores with different categories of necessary labels into one group, thus ensuring the rationality of the grouping. At the same time, the sample size is initially screened to exclude small sample groups, ensuring the effectiveness of subsequent grouping.

[0051] S14. For the current grouping path node, exclude non-exempt labels that would result in a sample size of less than a set threshold group, calculate the mutual information value between the remaining non-exempt labels and the target index, and further group the labels with the largest mutual information greater than 0.

[0052] Taking Group 1 (45 stores with medical insurance qualifications) as an example, the steps for implementing non-exemption label grouping are as follows: ① Exclude invalid labels: The current non-exempt labels are "market share type label" and "store age group label". Determine whether using these two labels will create groups with a sample size of less than 10 stores. Market share type label: There are 4 categories, and the distribution in group 1 is as follows: market share type 0 (8 companies), market share type 1 (15 companies), market share type 2 (12 companies), and market share type 3 (10 companies). Among them, the sample size of market share type 0 is 8 companies, which is less than the set threshold of 10 companies, so the market share type label is excluded. Store age group labels: 2 categories. In group 1, the distribution is as follows: less than one year in operation (10 stores) and more than one year in operation (35 stores). The sample size of both categories is greater than the set threshold of 10 stores, so the store age group labels are retained.

[0053] ② Calculate the mutual information value: The only remaining non-exempt label is the store age group label. Calculate the mutual information value between this label and the preset target indicators (net profit Y1, sales Y2). The formula for calculating mutual information value is:

[0054] in, For store tags, the corresponding probability distribution is: ,and , For the target index, the corresponding probability density function is: The conditional probability density function is , It is a conditional distribution With marginal distribution The KL divergence.

[0055] Store tags (discrete categorical variables), in this embodiment Labels for store age groups, For the i-th category of the label, i.e. 1 = "Open for business for less than a year" 2 = "Open for business for more than one year", k = 2 (number of tag categories); :Label Take the probability distribution of the i-th category, i.e. =10 / 45≈0.222, =35 / 45≈0.778, and +p =1; Target indicator (continuous variable), in this embodiment, is first calculated relative to net profit. The mutual information of 1 is then combined with information weights to calculate the comprehensive mutual information; The probability density function of the target indicator Y, i.e., the overall distribution density of net profit; ): Conditional probability density function, that is, the distribution density of target index Y when label X takes the i-th category. In this embodiment, it is the net profit distribution density of stores that have been open for less than one year and the net profit distribution density of stores that have been open for more than one year. Kullback-Leibler divergence (KL divergence) measures the difference between the conditional distribution f(y|xi) and the marginal distribution f(y). A larger KL divergence indicates a higher degree of discrimination between the label category and the target indicator. The formula for calculating KL divergence is:

[0056] in For conditional distribution, It is distributed at the edge.

[0057] To measure the degree of dependence between store label X and target metric Y, The higher the value, the higher the discrimination of the label against the target indicator. Grouping the label will result in greater differences in the target indicators between the groups.

[0058] In this embodiment, the mutual information value between the store age group label and net profit Y1 is calculated. 1 = 0.32, the mutual information value with sales revenue Y2 =2=0.28. Combining the information weights (Y1 weight 0.6, Y2 weight 0.4), the comprehensive mutual information value = 0.32×0.6 + 0.28×0.4=0.304. Since the mutual information value is greater than 0, the store age group labels are further grouped.

[0059] ③ Grouping Results: Group 1 (stores with medical insurance qualifications) is divided into 2 subgroups based on store age group label: Subgroup 1-1: 10 stores with medical insurance qualifications that have been open for less than a year; Subgroup 1-2: 35 stores with medical insurance qualifications and that have been in operation for more than one year; Since the sample size of both subgroups is greater than the set threshold, proceed to the next step and try to group them using the exemption label from the non-essential labels.

[0060] Similarly, the above steps were performed on Group 2 (55 stores without medical insurance qualifications). After excluding invalid non-exempt labels, the store age group labels were grouped into two subgroups: Subgroup 2-1: 18 stores without medical insurance qualifications or that have been open for less than a year; Subgroup 2-2: 37 stores without medical insurance qualifications and that have been in operation for more than one year.

[0061] By excluding invalid non-exempt labels, the generation of small sample groups was avoided; the application of mutual information value calculation ensured that the labels used could distinguish the target indicators to the greatest extent, thereby maximizing the differences in net profit and sales between groups; the hierarchical grouping method gradually refined the grouping results, making the profitability conditions of stores in the same subgroup more similar, and further improving the scientific nature of the grouping.

[0062] S15. In the current grouping path node, exclude the labels of small sample groups that still exist after merging them into other classes. Calculate the mutual information of the remaining labels and select the best one to use. When there are no usable exemption labels in the current grouping node, the grouping ends.

[0063] Taking subgroup 1-2 (35 stores with medical insurance qualifications and more than one year of operation) as an example, the exemption label grouping is implemented, and the steps are as follows: ① Exclude invalid labels: The current exempt label is "Population profile label", which has 3 categories (population type 0, population type 1, population type 2). The distribution in subgroups 1-2 is as follows: population type 0 (8 companies), population type 1 (18 companies), and population type 2 (9 companies). After grouping the label, the sample size of population type 0 (8 households) and population type 2 (9 households) is less than the set threshold of 10 households. Therefore, these two small sample groups are merged into other categories. After merging, the sample size of other categories is 8 + 9 = 17 households, which is greater than the set threshold of 10 households. Therefore, the label is retained. ② Calculate the mutual information value: Calculate the comprehensive mutual information value between the merged population profile label (category: population type 1, other) and the preset target indicator. The calculation method is the same as S14. The comprehensive mutual information value is 0.21. If the mutual information value is greater than 0, the label is grouped. ③ Grouping Results: Subgroups 1-2 were divided into two subgroups based on demographic labels: Subgroup 1-2-1: 18 stores with medical insurance qualifications, open for more than one year, and of population type 1; Subgroup 1-2-2: 17 stores with medical insurance qualifications, open for more than one year, and belonging to other population types; ④ Group termination judgment: There are no other usable exemption tags in the current group path node, so the grouping of this path ends.

[0064] Similarly, the above exemption label grouping steps were performed on other subgroups (subgroup 1-1, subgroup 2-1, subgroup 2-2) to finally complete the grouping of all stores, resulting in a total of 8 store groups. The sample size of each group is greater than 10 stores, and the profitability conditions (medical insurance qualifications, store age, population profile, market share) of stores in the same group are highly similar.

[0065] The lenient logic of using exemption labels not only makes full use of the store's characteristic information and further refines the grouping results, but also avoids invalid grouping by merging small sample groups.

[0066] Moreover, the optimal use of mutual information values ​​ensures the target distinguishability of the grouping results, ultimately achieving accurate grouping of stores with similar profitability conditions and solving the shortcomings of unreasonable grouping in existing technologies.

[0067] As can be seen, step S1 achieves standardization of store labeling and grouping, avoiding the subjectivity of manual grouping and the dimensional explosion problem of simple label grouping; the grouping results ensure that stores in the same group have similar profitability conditions, while stores in different groups have significantly different profitability conditions, providing a solid foundation for the formulation of operational goals in the subsequent step S2, while improving the efficiency of store grouping and reducing management costs.

[0068] S2. For each group of stores, based on the median of its operational metrics, formulate operational targets for each store within the group, quantifying the effect of process metric optimization on improving operating profit, specifically including: S21. Break down the North Star indicator (core profitability indicator, such as net profit) of store profitability into process indicators that have a direct impact on the North Star indicator. Process indicators include, but are not limited to, order volume, one-piece rate per order, average number of pieces per order, unit price, procurement cost, rent, salary expenses, etc.

[0069] By breaking down the North Star indicator into its hierarchical structure, the relationship between net profit and various process indicators is clarified. This transforms abstract profit targets into concrete process indicators that can be implemented and optimized, making operational measures more targeted and solving the problem of vague and undirected operational goals in existing technologies.

[0070] S22. Quantify the gains of each process indicator optimization on the North Star indicator. Taking the process indicator of reducing the "one-piece rate per order" as an example, the formula for calculating the gain prediction coefficient k of offline sales is as follows:

[0071] Among them, △ is the change in the offline order rate of this indicator, and the offline average number of items is 0, which is the baseline value of this indicator (initial offline average number of items). k: The gain prediction coefficient of the process indicator (offline order one-piece rate) optimization on offline sales. When k is positive, it means that the optimization of the process indicator can increase offline sales and thus increase net profit. Offline order volume 0: The baseline value of this process indicator, that is, the current offline order volume of the store (in this embodiment, taking the subgroup 1-2-1 store as an example, offline order volume 0 = 1000 orders / month). △Offline one-piece rate: The change in the offline one-piece rate. A negative value indicates a decrease in the one-piece rate (optimization direction). In this embodiment, the preset △offline one-piece rate = -0.1 (that is, the one-piece rate decreases from 0.4 to 0.3). Offline item price 0: The baseline value of this indicator, which is the current offline item price of the store (in this example, offline item price 0 = 80 yuan / item). Average number of items per offline order (0): This is the baseline value of the indicator, which is the current average number of items per offline order in the store (in this example, the average number of items per offline order (0) = 1.5 items / order).

[0072] Substituting the values, we get: k = -(-0.1) / 1.5 ≈ 0.0667, which means that if the offline sales rate decreases by 0.1, offline sales will increase by 6.67%.

[0073] Furthermore, combining the offline gross profit margin (in this embodiment, the offline gross profit margin = 25%), the gain of this process indicator optimization on net profit can be calculated: Net profit increase = Offline sales increase × Offline gross profit margin - Optimization cost (optimization cost is negligible in this embodiment), where the offline sales increase = 1000 × 1.5 × 80 × 0.0667 ≈ 8004 yuan / month, therefore the net profit increase ≈ 8004 × 25% ≈ 2001 yuan / month.

[0074] Similarly, other process indicators (such as increasing offline order volume, increasing offline unit price, and reducing procurement costs) are quantified for gain, and their respective gain prediction coefficients and corresponding net profit increases are calculated to form a process indicator gain quantification table.

[0075] By calculating the gain prediction coefficient, the optimization effect of process indicators was quantified, the value of each operational measure in improving net profit was clarified, and the problem of difficulty in quantifying the benefits of existing technical operational measures was solved. Moreover, the calculation through formulas reduces the complexity of quantitative calculations, making it easier for store managers to understand and apply.

[0076] S23. Use the median value of each process indicator of stores in the same group as the corresponding operational target for each store in the group, identify the opportunity points and magnitude of improvement for store operations. For example, for stores in the group with an offline single-item rate > 0.40 (such as store 1-2-1-1, indicator value 0.42), the improvement opportunity point is determined to be "reducing the offline single-item rate", with an improvement magnitude of 0.02 (from 0.42 to 0.40). Combined with the gain quantification results of S22, it can be estimated that by reducing the single-item rate, the store can increase its net profit by ≈ ( -(-0.02) / 1.5 )×1000×1.5×80×25%≈400 yuan / month, thus clarifying the benefits of the operational measures.

[0077] Setting targets based on the median of stores in the same group ensures the rationality and feasibility of the targets; the logic that "stores with similar profitability should have similar profitability" frees the target setting from the limitations of the traditional "benchmark store" comparison method and avoids the problem of inappropriate comparison of stores in different market environments.

[0078] S24. Based on the gain prediction coefficient and the improvement of process indicators, calculate the increase in operating profit for the store after the target is achieved, and output the store operation target and profit improvement prediction results.

[0079] S3. Based on the feature data of historically closed stores and their member return data, a machine learning model is built to predict the sales revenue of existing stores if they are relocated.

[0080] S31. Obtain multiple characteristic data of historically closed stores. The data includes member-related data, such as historical data on closed stores including member consumption data for the quarter before closure, member return data for the quarter after closure, and store characteristic data. Member-related data includes member profiles, consumption records, and repurchase information. The acquired data is cleaned and standardized. Data cleaning uses an outlier detection algorithm to remove outliers, and missing values ​​are handled using a missing value imputation algorithm (or by directly removing samples with a missing value ratio exceeding a preset threshold). Standardization is performed using the Min-Max standardization method.

[0081] S32. Calculate the sales contribution from returning members of historically closed stores i. The calculation formula is:

[0082] in For the relocation of stores Members who made purchases in the quarter prior to the closure and who returned to the membership within the quarter of closure. For the relocation of stores Members who made purchases in the quarter prior to the closure of operations, ; and All of them are stores that are being closed. Sales generated during this period.

[0083] S33. The XGBoost algorithm is adopted. As a feature, Using the target variable, a regression prediction model is constructed, trained, and evaluated to obtain the coefficient of determination. With Adjusted Determination Coefficient ; Methods for model training and evaluation include: The sample set is randomly divided into training and test sets in an 8:2 ratio. 20% of the samples in the training set are extracted as a validation set for early stopping monitoring to prevent overfitting. Configure the basic parameters and hyperparameters of RandomSearchCV for the search range; The mean squared error (MSE) was used as the evaluation index for the training process to optimize the hyperparameters, and the hyperparameter combination with the smallest mean squared error was selected as the optimal hyperparameter combination. Mean square error The coefficient of determination is used to measure the mean squared deviation between predicted and actual values. This reflects the model's ability to explain variations in sales revenue contribution; Use the adjusted coefficient of determination Coefficient of determination This is used to make corrections to constrain the number of redundant features in the model, thereby more objectively evaluating the fitting effect of the multi-feature regression model.

[0084] S34. When the model's coefficient of determination With Adjusted Determination Coefficient When the difference is greater than the preset threshold (0.1), the model is retrained after feature filtering to obtain the final member return prediction model.

[0085] Feature filtering includes the following steps: Calculate the variance of each feature variable and discard features with variance close to 0 (e.g., <0.01), as these features do not contribute significantly to the prediction. Calculate the correlation coefficient between each feature variable and the target variable cᵢ, and discard features with correlation coefficients close to 0 (e.g., |ρ|<0.1), as these features do not contribute significantly to the prediction. Calculate the correlation coefficient between feature variables, discard any feature with a correlation coefficient greater than 0.8, and avoid using redundant features with collinearity; Core features are selected based on the feature importance index (feature_importances_) embedded in the XGBoost model, retaining the top 80% of features by importance.

[0086] After feature selection is complete, repeat the training and evaluation steps in S33 until R² equals 1 / 2. If the difference is less than 0.1, the final member return prediction model is obtained.

[0087] S35. Use the final member return prediction model to predict the existing stores'... The value is multiplied by the corresponding store's sales revenue to estimate the potential return sales revenue after the store relocation, outputting the estimated return sales revenue. Combining the predicted return rate with actual sales revenue directly outputs a financial indicator (return sales revenue) that business personnel can understand, providing an intuitive and quantitative basis for relocation decisions.

[0088] This invention also provides a store relocation analysis system, including a store grouping unit, an operational target customization unit, and a prediction unit, wherein: The store grouping unit is used to label stores. It uses a multi-objective weighted Bayesian iterative hierarchical grouping method to group stores with similar profitability into the same group. The operational target customization unit is used to set operational targets for each group of stores based on the median value of their operational indicators, and to quantify the effect of process indicator optimization on improving operating profit. The prediction unit is used to build a machine learning model based on the feature data of historically closed stores and their member return data, in order to predict the sales revenue of existing stores if they are relocated.

[0089] The technical effects achieved by this application are as follows: 1. By employing a multi-objective weighted Bayesian iterative hierarchical grouping method, which prioritizes necessary labels, selects the best non-exempt labels, and supplements exempt labels, and combines mutual information values ​​to select the best grouping labels, this approach avoids the dimensionality explosion problem caused by simple label Cartesian integral groups, while maximizing the discriminative power of the grouping results in core objective indicators such as net profit and sales. At the same time, by limiting the sample size threshold, it ensures that each group has statistical significance, thereby enabling accurate clustering of stores with similar profitability conditions and solving the problem of unreasonable grouping.

[0090] 2. A process indicator optimization gain quantification model was established, which decomposes the North Star indicator (net profit) of store profitability into process indicators that can be directly optimized. Through mathematical formulas, the improvement coefficient and specific benefits of optimizing each process indicator on sales and net profit are accurately calculated. This solves the problem that the benefits of innovative operational measures in existing technologies are difficult to quantify, and provides a quantitative basis for the allocation of operational resources, avoiding the investment of ineffective operational resources.

[0091] 3. A model was built based on the real characteristic data of historically closed stores and member return data, and the XGBoost regression prediction model was used. The value multiplied by the existing store sales effectively improved the accuracy of predicting the sales volume of returning members after a store relocation, significantly reduced the subjective risk of relocation decisions, maximized the utilization of store customer resources, and solved the problem of extensive management in the last link of the store life cycle.

[0092] Therefore, this invention has constructed a scientific, complete, and practical store relocation analysis technology system, which solves the technical pain points in the existing store lifecycle management of grouping, operation, and relocation, and realizes the digital, refined, and intelligent upgrade of store management, providing solid technical support for the large-scale and high-quality development of physical retail enterprises.

[0093] This invention also provides an electronic device, including: a processor, a storage medium, and a bus. The storage medium stores machine-readable instructions executable by the processor. When the electronic device is running, the processor communicates with the storage medium via the bus, and the processor executes the machine-readable instructions to perform the store relocation analysis method as described above.

[0094] It should be noted that those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, which may include, but is not limited to, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc.

[0095] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for analyzing store relocation, characterized in that, Includes the following steps: S1. Label the stores and use a multi-objective weighted Bayesian iterative hierarchical grouping method to group stores with similar profitability conditions into the same group. S2. For each group of stores, based on the median value of its operating indicators, set operating targets for each store in the group, and quantify the effect of process indicator optimization on improving operating profit. S3. Based on the feature data of historically closed stores and their member return data, a machine learning model is built to predict the sales revenue of existing stores if they are relocated.

2. The store relocation analysis method according to claim 1, characterized in that, Step S1 specifically includes: S11. Process the raw data to obtain the tagged store features; S12. Preset the target indicators and information weights for grouping, and distinguish the tagged store characteristics into necessary labels and unnecessary labels. The unnecessary labels include non-exempt labels and exempt labels. S13. Perform Cartesian integration on all necessary labels, output the groups with sample sizes less than a set threshold, and use the groups with sample sizes greater than the set threshold as the current grouping path nodes. S14. For the current grouping path node, exclude non-exempt labels that would result in a sample size of less than a set threshold group, calculate the mutual information value between the remaining non-exempt labels and the target index, and further group the labels with the largest mutual information greater than 0. S15. In the current grouping path node, exclude the labels of small sample groups that still exist after merging them into other classes. Calculate the mutual information of the remaining labels and select the best one to use. When there are no usable exemption labels in the current grouping node, the grouping ends.

3. The store relocation analysis method according to claim 2, characterized in that, The tagging described in step S11 includes three methods: For feature data with clear business definitions or business classification thresholds, labels are generated based on the business definitions or classification thresholds, and the labels are discrete categorical variables. For feature data that has no clear business definition or business classification threshold but has a requirement for the number of business categories, the preset number of business categories is used as the k value, and the relevant feature data is weighted K-Means clustered for labeling. For feature data without business requirements, WCSS is used to identify the optimal inflection point in the elbow plot as the k value for weighted K-Means clustering for labeling.

4. The store relocation analysis method according to claim 3, characterized in that, The steps of the weighted K-Means clustering include: Weights are assigned to each feature variable participating in clustering based on business experience. The weights range from [0,1], and the sum of the weights of all feature variables is 1. For each feature variable After Min-Max standardization, we get The feature variables are mapped to a preset interval of [0,1] and multiplied by the corresponding weights to obtain the processed store features. ; Based on a preset k value, k coordinates are randomly selected from the processed store features as initial cluster centers; Calculate the Euclidean distance from each processed store feature to each initial cluster center, and assign it to the cluster with the closest distance. Calculate the mean of the processed store features in each cluster and use it as the new cluster center; Repeat the above steps until the change in the location of the cluster center is less than the preset convergence threshold, and use the final cluster number as the label classification number.

5. The store relocation analysis method according to claim 2, characterized in that, The mutual information value mentioned in step S14 The calculation formula is: ; in, For store tags, the corresponding probability distribution is: ,and , For the target index, the corresponding probability density function is: The conditional probability density function is , It is a conditional distribution With marginal distribution The KL divergence.

6. The store relocation analysis method according to claim 1, characterized in that, Step S2 specifically includes: S21. Decompose the North Star indicator of store profitability into process indicators that have a direct impact on the North Star indicator. S22. Quantify the gains of each process indicator optimization on the North Star indicator; S23. Use the median value of each process indicator of stores in the same group as the corresponding operational target for each store in the group, and identify opportunities for improvement in store operations and the extent of improvement. S24. Based on the gain prediction coefficient and the improvement of process indicators, calculate the increase in operating profit for the store after the target is achieved, and output the store operation target and profit improvement prediction results.

7. The store relocation analysis method according to claim 1, characterized in that, Step S3 specifically includes: S31. Obtain data on historically closed stores, and perform data cleaning and standardization on the obtained data. S32. Calculate the sales contribution from returning members of historically closed stores i. The calculation formula is: ; in For the relocation of stores Members who made purchases in the quarter prior to the closure and who returned to the membership within the quarter of closure. For the relocation of stores Members who made purchases in the quarter prior to the closure of operations, ; and All of them are stores that are being closed. Sales generated in the process; S33. The XGBoost algorithm is adopted to... As a feature, Using the target variable, a regression prediction model is constructed, trained, and evaluated to obtain the coefficient of determination. With Adjusted Determination Coefficient ; S34. When the model's coefficient of determination With Adjusted Determination Coefficient When the difference is greater than the preset threshold, the model is retrained after feature filtering to obtain the final member return prediction model. S35. Use the final member return prediction model to predict the existing stores'... The value is multiplied by the corresponding store's sales revenue to estimate the potential return sales revenue after the store relocation, and the estimated return sales revenue is output.

8. The store relocation analysis method according to claim 7, characterized in that, In step S34, the feature selection includes: Calculate the variance of each feature variable and discard features with variance close to 0; Calculate the relationship between each feature variable and the target variable. The correlation coefficient is used to discard features with a correlation coefficient close to 0; Calculate the correlation coefficients between the feature variables and discard features with a correlation coefficient greater than 0.8; Core features are selected based on the feature importance index embedded in the XGBoost model.

9. The store relocation analysis method according to claim 7, characterized in that, In step S34, the methods for model training and evaluation include: The sample set is randomly divided into training and test sets in an 8:2 ratio. 20% of the samples in the training set are extracted as a validation set for early stopping monitoring to prevent overfitting. Configure the basic parameters and hyperparameter search range for random search; The mean squared error was used as the evaluation index for the training process to optimize the hyperparameters, and the hyperparameter combination with the smallest mean squared error was selected as the optimal hyperparameter combination. Mean square error The coefficient of determination is used to measure the mean squared deviation between predicted and actual values. This reflects the model's ability to explain variations in sales revenue contribution; Use the adjusted coefficient of determination Coefficient of determination This is used to make corrections to constrain the number of redundant features in the model, thereby more objectively evaluating the fitting effect of the multi-feature regression model.

10. A store relocation analysis system, characterized in that, This includes store grouping units, operational target customization units, and forecasting units, among which: The store grouping unit is used to label stores. It uses a multi-objective weighted Bayesian iterative hierarchical grouping method to group stores with similar profitability into the same group. The operational target customization unit is used to set operational targets for each group of stores based on the median value of their operational indicators, and to quantify the effect of process indicator optimization on improving operating profit. The prediction unit is used to build a machine learning model based on the feature data of historically closed stores and their member return data, in order to predict the sales revenue of existing stores if they are relocated.