Intelligent decision-making method and system for cross-border e-commerce product selection based on multi-language comment mining
By performing unified modeling and fine-grained analysis on cross-border e-commerce reviews, the data processing challenge of multilingual reviews in cross-border e-commerce product selection was solved, enabling accurate identification of product attributes and user needs, and improving the accuracy and stability of product selection results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG OPEN UNIV (GUANGDONG POLYTECHNIC VOCATIONAL COLLEGE)
- Filing Date
- 2026-04-09
- Publication Date
- 2026-06-12
AI Technical Summary
Existing cross-border e-commerce product selection methods struggle to fully utilize genuine user feedback from multi-platform, multilingual reviews. They lack comprehensive processing of review credibility, quality, and trends over time, leading to the migration of review contribution sources and the spread of risk areas, resulting in fragile ranking results.
By establishing a correspondence between candidate products and a standardized corpus of reviews, and combining sentence segmentation, term matching, and sequence labeling, evaluation elements are identified and cross-linguistic semantic alignment and ambiguity resolution are performed. Fine-grained evaluation features are extracted, an evaluation contribution trajectory tensor is constructed, and high-value vulnerable states are identified and dynamically corrected to form differentiated attribution results.
It improves the standardization and usability of comment data processing, accurately identifies product attributes and implicit user needs, identifies abnormal products with high surface value but internal structural migration or diffusion, and enhances the accuracy and robustness of product selection results.
Smart Images

Figure CN122199030A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of cross-border e-commerce data mining and intelligent decision-making technology, and more specifically, to a cross-border e-commerce product selection intelligent decision-making method and system based on multilingual review mining. Background Technology
[0002] With the rapid development of cross-border e-commerce platforms and overseas sites, sellers are increasingly relying on user reviews to judge market demand, product reputation, and improvement directions during product selection. Current product selection methods largely depend on sales volume, ratings, human experience, or data from a single site, making it difficult to fully utilize the genuine user feedback contained in reviews from multiple platforms and in multiple languages. Especially in cross-border scenarios, there are significant differences in the language, expression habits, evaluation focus, and needs of reviews from different countries and regions. Simply translating or performing coarse-grained sentiment analysis can easily overlook specific attributes, implicit needs, and localized risks.
[0003] Furthermore, existing technologies typically lack comprehensive processing of review credibility, review quality, review subject attribution, review intensity, and time-varying trends, making it difficult to effectively identify issues such as review manipulation, templated expressions, semantic mismatch, and variant differences. For candidate products, even with high overall scores, there may be migration of review contribution sources, diffusion of risk areas, or instability of local variants, resulting in seemingly stable but actually fragile ranking results.
[0004] To address the above problems, this invention proposes a solution. Summary of the Invention
[0005] To overcome the aforementioned deficiencies of the prior art, embodiments of the present invention provide a method and system for intelligent decision-making in cross-border e-commerce product selection based on multilingual review mining, in order to solve the problems mentioned in the background art.
[0006] To achieve the above objectives, the present invention provides the following technical solution: In a preferred embodiment, it includes: The review collection targets are limited by product identifiers, and variant identifiers are associated with product identifiers. By combining review time, reviewer location identifier, product variant attributes and source identifiers, the original review records from different page sources are uniformly mapped to establish the correspondence between candidate products and corresponding standardized review corpus sets. Based on the standardized comment corpus, the standardized comment units are segmented into sentences, and candidates for evaluation elements are identified by combining term matching and sequence labeling. Then, the evaluation information corresponding to the candidates for evaluation elements is analyzed by combining the rating mapping value, comment credibility coefficient and comment quality score retained in each comment sentence segment. After cross-language semantic alignment and ambiguity resolution, the information is collected to form a set of fine-grained evaluation elements corresponding to each candidate product. The set of fine-grained evaluation elements is read according to the product identifier. An evaluation element statistics table is constructed with the standard evaluation element name as the primary key. Based on the comment time distribution and product variant attribute distribution in the evaluation element statistics table, time change characteristics and variant difference characteristics are extracted. Combined with the total evaluation frequency, weighted evaluation intensity, demand attention, and frequency of implicit demand, product selection is calculated. Then, time subsets are divided according to comment time and variant subsets are divided according to product variant attributes. Local product selection decision values are repeatedly calculated on each time subset and each variant subset. The final product selection value is obtained by correcting the dispersion and risk concentration of multiple local product selection decision values to determine the candidate products. A three-dimensional evaluation distribution unit set is constructed for the selected candidate products, including time interval, product variant attributes, and source region, and an evaluation contribution trajectory tensor is established. Based on the evaluation contribution trajectory tensor, the evolutionary relationship of evaluation contribution among time interval, product variant attributes, source region, and standard evaluation elements is extracted, distinguishing between changes in internal dominant paths and changes in external diffusion paths. Based on this, high-value vulnerable state identification and structural track-changing state identification are performed, and segmented correction and rebound correction are implemented for the final selected product value. Then, the corrected selected product value is re-ranked, and a differentiated attribution result corresponding to the corrected ranking result is formed by combining the element contribution value.
[0007] In a preferred embodiment, the review collection target is defined by the product identifier, and the variant identifiers corresponding to the color, size, set or specification are associated with and saved with the product identifier. At the same time, the review time, reviewer's geographic identifier, product variant attributes and source identifier are extracted.
[0008] In a preferred embodiment, a unified field mapping, time parsing, and rating mapping are performed on the original review records, so that the review time, reviewer geographic identifier, product variant attributes, and source information from different page sources are entered into a unified field structure; Subsequently, language recognition, language-specific normalization, and cross-language domain terminology mapping were performed on the comment texts. Duplicate comment identification, abnormal comment identification, and comment quality assessment were conducted within the comment set with consistent product identifiers. Abnormal comment identification calculated the comment credibility coefficient based on templated risk indicators, behavioral abnormality risk indicators, and semantic mismatch risk indicators. Comment quality assessment calculated the comment quality score based on text integrity indicators, feature word richness indicators, scene description indicators, and expression clarity indicators. The comments were then grouped by product identifiers to establish a correspondence between candidate products and corresponding standardized comment corpora.
[0009] In a preferred embodiment, standardized comment units are segmented into sentences, and each comment segment retains the product identifier, comment language tag, comment time, rating mapping value, comment credibility coefficient, and comment quality score. Then, candidate evaluation elements are identified through a combination of domain terminology matching and sequence labeling. The identified candidate evaluation elements are then subjected to evaluation object attribution determination, evaluation polarity determination, evaluation intensity quantification, evaluation evidence fragment extraction, and implicit demand identification. Evaluation intensity is calculated based on basic polarity intensity, degree correction, negation correction, transition correction, and rating mapping correction. Implicit demand content is extracted based on demand triggering mode, evaluation object category, and evaluation evidence fragments and mapped to unified demand terms. Subsequently, cross-language semantic alignment and ambiguity resolution were performed on the candidate evaluation elements. Evaluation expressions in different languages were replaced with standard evaluation element names. The standard evaluation element names, evaluation object categories, evaluation polarity categories, evaluation intensity, evaluation evidence fragments, implicit demand content, comment time, comment credibility coefficient, and comment quality score were assembled into fine-grained evaluation records. Finally, using product identifiers as the aggregation condition and standard evaluation element names as the aggregation primary key, the frequency of positive evaluations, negative evaluations, neutral evaluations, implicit demand occurrences, and evaluation evidence fragment sets were statistically analyzed for the fine-grained evaluation records under the same candidate product. Weighted evaluation intensity was calculated based on evaluation intensity, comment credibility coefficient, and comment quality score. Demand attention was calculated based on occurrence frequency, negative evaluation ratio, and implicit demand occurrence frequency. This established a structured correspondence between candidate products and standard evaluation elements, evaluation object categories, evaluation polarity categories, evaluation intensity, implicit demand content, comment time, comment credibility coefficient, and comment quality score, forming a set of fine-grained evaluation elements for each candidate product.
[0010] In a preferred embodiment, a set of fine-grained evaluation elements is read according to the product identifier, and an evaluation element statistics table is constructed using the standard evaluation element name as the primary key. The evaluation element statistics table is filled with the total evaluation frequency, weighted evaluation intensity, demand attention, frequency of implicit demand, comment time distribution, and product variant attribute distribution. Then, based on the evaluation element statistics table, market demand indicators, product risk indicators, and product selection decision values are calculated. The comment time distribution is formed by statistically analyzing the comment times of the corresponding fine-grained evaluation records in chronological order. The product variant attribute distribution is formed by statistically analyzing the product variant attributes in the corresponding fine-grained evaluation records. The time growth trend and time surge risk are extracted based on the change amplitude of adjacent time intervals. The variant difference risk is calculated based on the dispersion of the weighted evaluation intensity of the same standard evaluation element under different product variant attributes.
[0011] In a preferred embodiment, the standardized comment corpus is divided into multiple time subsets according to the comment time and into multiple variant subsets according to the product variant attributes. Local product selection decision values are repeatedly calculated in each time subset and each variant subset. The product selection stability is calculated based on the dispersion of multiple local product selection decision values. Then, the product selection correction value is calculated in combination with the risk concentration state. Finally, the final product selection value is calculated from the product selection decision value and the selection correction value. After that, the products are sorted according to the final product selection value, and the candidate products that meet the screening criteria and are located in the top sorting interval are determined as selected candidate products.
[0012] In a preferred embodiment, after the shortlisted candidate products have obtained their final selection values, the standardized review units are first divided into three layers according to review time, product variant attributes, and source region. This constructs a three-dimensional evaluation distribution unit set for time interval, product variant attributes, and source region. Each three-dimensional evaluation distribution unit is then associated with the standard evaluation element name, evaluation object category, evaluation polarity category, evaluation intensity, demand attention, implicit demand content, review credibility coefficient, and review quality score. Next, the local evaluation contribution value is calculated by multiplying the local demand attention, local weighted evaluation intensity, local review credibility aggregation value, and local review quality aggregation value in each three-dimensional evaluation distribution unit. Based on this, an evaluation contribution trajectory tensor is constructed, and the evaluation contribution change relationship is displayed along the four dimensions of time interval, product variant attributes, source region, and standard evaluation elements within the same candidate product. Subsequently, parallel calculations are performed on the following quantities: evaluation contribution migration, evaluation center of gravity shift, implicit demand shift, source region diffusion, commodity variant attribute penetration, and high value retention. Evaluation contribution migration is obtained by comparing the changes in the contribution distribution of standard evaluation elements within adjacent time intervals; evaluation center of gravity shift is obtained by assigning positional codes to each evaluation object category and calculating the changes in evaluation center of gravity coordinates within adjacent time intervals; implicit demand shift is obtained by comparing the changes in the distribution of unified demand terms within adjacent time intervals; source region diffusion is obtained by statistically analyzing the changes in the expansion quantity of preset problem conditions in different source regions; commodity variant attribute penetration is obtained by statistically analyzing the changes in the coverage quantity of preset problem conditions in different commodity variant attributes; and high value retention is obtained by analyzing the time series high value distribution after normalizing the local product selection decision values within each time interval. Then, the aforementioned multiple quantities are coupled pairwise to construct a high-value vulnerability state judgment matrix, and high-value vulnerability is calculated in conjunction with the high value retention. Finally, the structural track-changing coefficient is calculated through the combination relationship between evaluation contribution migration, evaluation center of gravity shift, implicit demand shift, source region diffusion, and commodity variant attribute penetration, thus separating and representing the internal dominant path track-changing and the external diffusion path.
[0013] In a preferred embodiment, the comparison result between high-value vulnerability and a preset vulnerability condition threshold is used as the first segmentation basis, and the comparison result between the structural track-changing coefficient and a preset diffusion condition threshold is used as the second segmentation basis. Dynamic correction factors are calculated segment by segment, and the original final selection value is attenuated and corrected by the dynamic correction factors. Then, the recovery judgment amount is calculated based on the decrease in the amount of evaluation contribution migration, the amount of evaluation center of gravity shift, the amount of implicit demand shift, the amount of diffusion in the source area, and the amount of penetration of commodity variant attributes in the two most recent time intervals. Combined with the high-value retention amount and the selection stability, the rebound adjustment coefficient is calculated, and the selection value after dynamic correction is rebounded and corrected. Finally, the selection values after rebound correction are re-sorted, and the contribution value of the elements is calculated for the selected candidate commodities. The demand attention, weighted evaluation intensity, implicit demand aggregation value, and negative pressure value are combined to divide the main driving factor set and the main limiting factor set. Then, the corresponding evaluation evidence fragments are collected to form a differentiated attribution result corresponding to the corrected ranking result.
[0014] In a preferred embodiment, the module includes: a standardized comment corpus construction module, a fine-grained evaluation element extraction module, a product selection decision value calculation module, a high-value vulnerability identification and correction module, and signal connections between the modules. The standardized comment corpus construction module is used to limit the comment collection objects by product identifiers and associate variant identifiers with product identifiers. It combines comment time, commenter geographic identifier, product variant attributes and source identifiers to uniformly map the original comment records under different page sources and establish the correspondence between candidate products and corresponding standardized comment corpus sets. The fine-grained evaluation element extraction module is used to segment standardized comment units into sentence segments based on the standardized comment corpus, and identify evaluation element candidates by combining term matching and sequence labeling. Then, by combining the rating mapping value, comment credibility coefficient and comment quality score retained in each comment segment, the evaluation information corresponding to the evaluation element candidates is parsed, and after cross-language semantic alignment and ambiguity resolution, it is aggregated to form a fine-grained evaluation element set corresponding to each candidate product. The product selection decision value calculation module is used to read the set of fine-grained evaluation elements by product identifier, construct an evaluation element statistics table with the standard evaluation element name as the primary key, and extract time change features and variant difference features based on the comment time distribution and product variant attribute distribution in the evaluation element statistics table. Combined with the total evaluation frequency, weighted evaluation intensity, demand attention, and frequency of implicit demand, product selection is calculated. Then, the module divides the time subset according to comment time and the variant subset according to product variant attributes. The local product selection decision value is repeatedly calculated on each time subset and each variant subset. The module is then corrected based on the dispersion and risk concentration of multiple local product selection decision values to obtain the final product selection value and determine the candidate products. The high-value vulnerability identification and correction module is used to construct a three-dimensional evaluation distribution unit set for the selected candidate products, including time interval, product variant attributes, and source region, and to establish an evaluation contribution trajectory tensor. Based on the evaluation contribution trajectory tensor, it extracts the evolutionary relationship of evaluation contribution among time interval, product variant attributes, source region, and standard evaluation elements, distinguishes between changes in internal dominant paths and changes in external diffusion paths, and performs high-value vulnerability state identification and structural track-changing state identification accordingly. It then performs segmented correction and rebound correction on the final selected product value. After that, the corrected selected product value is re-ranked, and a differentiated attribution result corresponding to the corrected ranking result is formed by combining the element contribution value.
[0015] The technical effects and advantages of this invention, which is a cross-border e-commerce product selection intelligent decision-making method and system based on multilingual review mining, are as follows: This invention improves the standardization and usability of review data processing by unifying modeling and fine-grained analysis of reviews across multiple platforms and languages. It accurately identifies multi-dimensional evaluation objects such as product attributes, performance, packaging, logistics, and compatibility, and uncovers implicit user needs. By constructing an indicator system encompassing market demand, product satisfaction, product risk, and competitive opportunities, this invention can more comprehensively reflect the true market performance of candidate products. Furthermore, by introducing product selection stability, high-value vulnerability, and dynamic correction mechanisms, it can identify abnormal products with apparent high value but whose internal structure has migrated or spread, avoiding ranking distortion. This improves the accuracy, robustness, and interpretability of product selection results, making it more suitable for intelligent product selection decisions in cross-border e-commerce scenarios. Attached Figure Description
[0016] Figure 1 This diagram illustrates the three-dimensional evaluation distribution unit, evaluation contribution trajectory tensor, and calculation relationships of various changes for candidate products in the intelligent decision-making method and system for cross-border e-commerce product selection based on multilingual review mining, as presented in this invention.
[0017] Figure 2 This is a schematic diagram illustrating the dynamic correction and recovery rebound process of the high-value vulnerable state of the intelligent decision-making method and system for cross-border e-commerce product selection based on multilingual comment mining, as described in this invention. Detailed Implementation
[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0019] Example: This invention discloses an intelligent decision-making method for cross-border e-commerce product selection based on multilingual review mining, including: Step 1: Define the review collection target by product identifier, associate variant identifier with product identifier, and combine review time, reviewer's geographical identifier, product variant attribute and source identifier to uniformly map the original review records under different page sources, and establish the correspondence between candidate products and corresponding standardized review corpus sets; Step 2: Based on the standardized comment corpus, the standardized comment units are segmented into sentences, and candidate evaluation elements are identified by combining term matching and sequence labeling. Then, the evaluation information corresponding to the candidate evaluation elements is analyzed by combining the rating mapping value, comment credibility coefficient and comment quality score retained in each comment sentence segment. After cross-language semantic alignment and ambiguity resolution, the evaluation information is collected to form a fine-grained set of evaluation elements corresponding to each candidate product. Step 3: Read the fine-grained evaluation element set by product identifier, construct an evaluation element statistics table with the standard evaluation element name as the primary key, and extract time change features and variant difference features based on the comment time distribution and product variant attribute distribution in the evaluation element statistics table. Combine the total evaluation frequency, weighted evaluation intensity, demand attention, and frequency of implicit demand to calculate product selection; then divide the time subset according to comment time and the variant subset according to product variant attributes, and repeatedly calculate the local product selection decision value on each time subset and each variant subset. Adjust the value based on the dispersion and risk concentration of multiple local product selection decision values to obtain the final product selection value and determine the candidate products. Step 4: Construct a three-dimensional evaluation distribution unit set for the selected candidate products, including time interval, product variant attributes, and source region, and establish an evaluation contribution trajectory tensor; based on the evaluation contribution trajectory tensor, extract the evolutionary relationship of evaluation contribution among time interval, product variant attributes, source region, and standard evaluation elements, distinguish between changes in internal dominant paths and changes in external diffusion paths, and identify high-value vulnerable states and structural track-changing states accordingly, and implement segmented correction and rebound correction for the final selected product values; then, re-rank the corrected selected product values, and combine them with the element contribution values to form a differentiated attribution result corresponding to the corrected ranking result.
[0020] Step one begins by determining the target audience for review data collection. The target audience consists of user review texts directly corresponding to candidate products. Candidate products are limited by product identifiers, which are represented by a unique product number assigned by the e-commerce platform, a unique identification field in the product link, or a unique marker information in the product details page. When the same product has variations in color, size, set, or specification, the variation identifier is associated and saved with the product identifier. In addition to product identifiers, category identifiers, brand identifiers, and store identifiers are also recorded. Category identifiers come from category information on the e-commerce platform's product category page or product details page; brand identifiers come from the product title, brand field, or brand display area; and store identifiers come from the store homepage link, store name field, or seller information field. These identifiers are extracted together during review collection, establishing a correspondence between each review and the corresponding product, category, brand, and store.
[0021] After identifying the target audience, raw review data is obtained based on product identifiers and review display entry points. The sources of raw review data include the main review section, follow-up review section, review section with images, text sections with evaluative content in the product details page, and publicly available review pages directly associated with the product page. Text in the main review section serves as the initial review content, text in the follow-up review section serves as supplementary review content, text in the review section with images serves as review content accompanying image or video submissions, and text with evaluative content in the product Q&A section serves as evaluation content related to product usage experience, quality, and compatibility. For each source page, the review text and its directly corresponding ancillary fields are extracted according to page structure rules.
[0022] For each original comment, extract the comment text, comment title, comment time, rating tags, commenter's geographic identifier, corresponding product variant attributes, interactive feedback tags, and source identifier. The comment text comes from the main content area of the page; the comment title comes from the title area; the comment time comes from the comment time field; the rating tags come from star ratings, scores, positive review tags, or recommendation tags; the commenter's geographic identifier comes from country, region, website, or language website tags; the product variant attributes come from the purchase attributes attached to the comment, such as specifications, color, size, and model; the interactive feedback tags come from the number of likes, helpful comments, replies, or adoption tags; and the source identifier is represented by the platform name, website name, page type, and page link. If the page contains a comment number, user ID, or order verification tag, extract and save these elements as well.
[0023] After data collection is complete, a unified field mapping process is performed on all original review records. This process converts fields with the same meaning but different names into uniform fields. Each original review record is then saved according to a unified structure, which includes at least the product identifier, category identifier, brand identifier, store identifier, review text, review title, review time, rating tag, reviewer's location identifier, product variant attribute, interaction feedback tag, source identifier, and an optional review number.
[0024] After obtaining the original comment records, an integrity check is performed on them. The integrity check is performed line by line, checking for the existence of the comment text, product identifier, parsability of the comment time, recognizability of the rating tag, and existence of the source identifier. Original comment records with empty text, missing product identifier, or missing source identifier are not processed further. If the comment time is missing but the source page contains a sorting position, adjacent time, or site date context, it is parsed according to the available time clues on the page. If parsing still fails, the original time string is retained and an unparsed tag is appended. If the rating tag is missing, the comment text is retained and the rating tag is set to null. After the integrity check is completed, the original comment records are divided into processable and unprocessable records. Processing of unprocessable records is stopped, while processing of processable records continues.
[0025] The comment text that passes integrity verification undergoes a unified encoding process. This process converts all text from different sites and crawling environments to the same character encoding format. After encoding unification, the comment text is then cleaned up. This cleanup process removes web page tags, script fragments, style remnants, meaningless control characters, abnormal line breaks, duplicate spaces, illegal escape characters, and page noise introduced during the crawling process. Web page tags are residual HTML tags or nested tags; script fragments are residual page scripts or event code; style remnants are CSS style tags or page style content; and control characters are invisible characters or format control characters. After cleanup, only characters relevant to the review content remain in the comment text.
[0026] The cleaned comment text undergoes expression normalization. Expression normalization includes compressing consecutive repeated punctuation marks into a single normalized mark, converting common emoticons and kaomojis into corresponding emotion tags, replacing colloquial abbreviations, internet slang, and non-standard spellings with standard word forms, and restoring elongated characters caused by repeated input to normal spelling. Consecutive repeated exclamation marks are converted into emphasis marks, and emoticons representing satisfaction, disappointment, anger, and surprise are replaced with their corresponding emotion tags. After expression normalization, each comment receives cleaned comment text.
[0027] After obtaining the purified comment text, language recognition is performed on each comment. Language recognition uses the purified comment text as input and employs character distribution features, word fragment statistical features, and a pre-trained language recognition model to jointly determine the comment's language. Character distribution features are used to identify differences in the distribution of letters, characters, or writing systems in different languages; word fragment statistical features are used to identify common affixes, common high-frequency words, and linguistic habitual fragments; and the pre-trained language recognition model is used to output the language category and corresponding probability for the entire text. When both the comment title and comment body exist, the title and comment body are concatenated and used as input; when follow-up comments exist, the follow-up comment text is identified separately and then compared with the language of the first comment. When more than two languages appear in the same comment text, it is first segmented into multiple text fragments according to punctuation, line breaks, conjunctions, and semantic pauses. Each text fragment is then separately identified to obtain fragment-level language labels. Finally, the fragment-level language labels are combined according to fragment length proportions and semantic integrity to form the overall comment's language label, and the language confidence score is calculated. Language confidence is determined by both the model's output probability and segment consistency. Language confidence increases when segments are highly consistent in language; it decreases when segments are mixed in language and lack a clear dominant language. After this processing, each comment is assigned a corresponding language tag and language confidence.
[0028] After obtaining the language tags for the comments, language-specific text normalization is performed on the comment text. For languages with clearly defined word boundaries, word segmentation tools are used to divide the comment text into word sequence. For languages without spaces as natural boundaries, appropriate word segmentation models or sub-word segmentation models are used for segmentation. For languages with lexical inflections, lexical reconstruction tools are used to unify different lexical forms of the same root word into a basic lexical form. For languages with stemming inflections, stemming tools are used to normalize derived forms into stem forms. Subsequently, based on the stop expression table corresponding to each language, function words, conjunctions, and high-frequency generalized words that do not carry effective evaluative semantics are deleted, while words expressing negation, comparison, degree, contrast, and time state are retained. After this processing, each comment forms a normalized word sequence for that language.
[0029] After forming a standardized term sequence, a unified mapping is performed on the domain terms in the comments. A domain terminology table and a cross-language terminology alignment table for cross-border e-commerce comments are pre-established. The terms in the domain terminology table are derived from product titles, product attribute pages, platform category terms, common evaluation terms, and manually compiled domain terminology sets. This terminology table includes at least material terms, size terms, color terms, workmanship terms, packaging terms, logistics terms, usage scenario terms, odor terms, durability terms, comfort terms, compatibility terms, and defect terms. The cross-language terminology alignment table establishes the correspondence between different language expressions and unified terms. During mapping, the terminology rules for the corresponding language are first selected based on the comment's language tag, and then abbreviations, colloquial expressions, spelling variations, and synonyms in the comments are replaced with unified terms. In cases where a word has multiple meanings, the contextual collocation of the word in the comment sentence is used for judgment, and mapping is performed only when the context satisfies the target evaluation semantics. After this processing, each comment forms a standardized comment text, and the evaluation expressions in the standardized comment text are represented using unified terms.
[0030] After obtaining the normalized comment text, duplicate comment identification is performed on the comment records. Duplicate comment identification is limited to the set of comments with the same product identifier. For comment records under the same product, a text vector and a normalized term set are first generated for each comment. The text vector is generated through a pre-trained text representation model or a weighted aggregation of word vectors, and the normalized term set is formed by deduplicating the normalized terms in the comment. Then, the comment similarity is calculated between candidate comment pairs according to the following formula: ; in, and These represent two comment records to be compared; and These represent the text vectors of the two comment records respectively; Represents the cosine similarity between text vectors; and These represent the sets of normalized terms corresponding to the two comment records; Indicates the Jaccard similarity between normalized term sets; Indicates the time proximity correction term; , and These represent the weight coefficients of cosine similarity, term set similarity, and time proximity correction, respectively. If two comments have the same comment ID, they are directly identified as duplicate comments; if no comment ID exists, the similarity result determines whether it is a duplicate or near-duplicate comment. For duplicate comments, the comment record with higher information completeness is retained; for near-duplicate comments, if the two comments have the same main body but different ratings, product variant attributes, or interactive feedback, a single main record is retained, and the different fields are supplemented and merged into the main record.
[0031] After identifying duplicate comments, anomaly comment identification is performed on the comment records. Anomaly comment identification includes templated risk detection, behavioral anomaly risk detection, and semantic mismatch risk detection. Templated risk detection identifies mechanically copied text by comparing comment sentence structure, the repetition of fixed phrases, and the frequency of large-scale duplicated segments. Behavioral anomaly risk detection identifies unnatural comment behavior by analyzing the distribution of comment posting time, the clustering of similar comments, abnormal consistency in interactive feedback, and traces of comment source behavior. Semantic mismatch risk detection identifies situations where the comment content does not match the target product or contradicts the rating by comparing whether the evaluated object in the comment text matches the product attributes and whether the comment semantic direction is consistent with the rating markers. For each comment, a comment credibility coefficient is calculated based on the results of the three types of risk detection. The comment credibility coefficient is calculated according to the following formula: ; in, This represents the credibility coefficient of the i-th comment; This indicates a templated risk indicator, calculated based on the similarity between the comment text and the template set, the frequency of repeated fixed phrases, and the proportion of repeated sentence patterns. This indicator represents a risk factor for abnormal behavior. It is calculated based on the degree of abnormal concentration of comment posting times, the degree of abnormal consistency of interactive feedback, and the degree of abnormality of the source behavior. This indicates a semantic mismatch risk indicator, which is calculated based on the degree of mismatch between the review object and the product attributes, and the degree of conflict between the review semantics and the rating direction. , and These represent the weighting coefficients for the three types of risk indicators. For each comment, the comment credibility coefficient is written into the comment record.
[0032] After obtaining the credibility coefficient of the comments, a comment quality assessment is performed on the comment records. For each comment, four indices are calculated: text integrity, feature word richness, scene description, and clarity of expression. The text integrity index is determined by whether sentences are complete, whether sentences are coherent, and whether there are any serious truncations. The feature word richness index is determined by the quantity and distribution of product attribute words, experience words, problem words, and comparison words appearing in the comment. The scene description index is determined by whether the comment describes the user, the usage environment, the usage process, or the usage result. The clarity of expression index is determined by whether the comment points to a specific attribute, whether there is a clear judgment of good or bad, and whether the focus of the evaluation can be identified. Based on these four indices, a comment quality score is calculated using the following formula: ; in, This represents the quality score of the i-th comment; Indicates text integrity metrics; Indicator of feature word richness; Indicates the scene description metrics; Indicators of clarity of expression; , , and These represent the weighting coefficients for the four indicators. For each comment, the comment quality score is written into the comment record.
[0033] After calculating review credibility and assessing review quality, all review records undergo standardized time formatting and rating mapping. Standardized time formatting converts time expressions from different source pages into a unified time format, including converting month-day, days ago, site local time, or other regional time formats into a unified date-time representation; when the time field can only be parsed to a date, it is saved as a date-level time; when only the month can be identified, it is saved as a month-level time with an added time precision marker. Standardized rating mapping converts star ratings, numerical ratings, recommendation tags, and positive / negative review tags from different source pages into unified rating mapping values. When the source page uses star ratings, they are mapped to unified rating values according to star levels; when using percentage, ten-point, or tag-based ratings, they are converted to a unified rating scale based on the rating range; when only "recommended" or "not recommended" or "satisfied" tags are present, they are mapped to binary rating tags. After processing, the rating information in all review records is converted into a unified format.
[0034] Finally, the processed comment records are standardized. The standardization process generates standardized comment units according to a unified field structure. Each standardized comment unit includes at least the following: product identifier, category identifier, brand identifier, store identifier, standardized comment text, original comment text, comment title, comment language tag, language confidence score, comment time, time precision marker, rating mapping value, commenter location identifier, product variant attribute, interactive feedback marker, source identifier, comment credibility coefficient, and comment quality score. Then, the standardized comment units are aggregated according to the product identifier to form a standardized comment corpus set for each candidate product.
[0035] In step two, the standardized comment corpus formed in step one is processed one by one to identify fine-grained evaluation elements, determine the evaluation object attribution, determine the evaluation polarity, quantify the evaluation intensity, identify implicit needs, perform cross-language semantic alignment, and aggregate product-level evaluation elements, so as to obtain the set of fine-grained evaluation elements corresponding to each candidate product.
[0036] First, each standardized comment unit in the standardized comment corpus is segmented into sentences. Sentence segmentation uses both standardized comment text and original comment text as processing objects. Standardized comment text is used for element identification under unified terminology, while original comment text is used to preserve modification relationships and semantic transition relationships in the original context. During sentence segmentation, the comment text is segmented according to periods, question marks, exclamation marks, semicolons, line breaks, item separators, conjunctions, and transition words. For cases where the same sentence contains multiple parallel evaluation contents, it is further segmented into clauses according to parallel conjunctions, comma-separated structures, and attribute parallel expressions. After sentence segmentation, each standardized comment unit corresponds to one or more comment sentences. Each comment sentence retains its product identifier, comment language tag, comment time, rating mapping value, comment credibility coefficient, and comment quality score.
[0037] After obtaining the comment segments, evaluation element candidate identification is performed on each comment segment. Evaluation element candidates refer to words, phrases, or short sentences in the comment segment that can represent a specific evaluation object or content of the product. Evaluation objects include product attributes, product components, product performance, product appearance, product packaging, product logistics, product compatibility, product usage scenarios, and product defects. Product attributes refer to descriptive objects such as material, size, color, weight, and odor; product components refer to the specific parts that make up the product; product performance refers to usability aspects such as durability, water resistance, comfort, and stability; product appearance refers to visual and tactile aspects such as style, design, texture, and workmanship; product packaging refers to packaging integrity, aesthetics, and protective measures; product logistics refers to delivery timeliness, transportation timeliness, and delivery integrity; product compatibility refers to the matching relationship between the product and the target audience, scenario, equipment, size, space, or purpose; product usage scenarios refer to usage environments such as office, outdoor, home, and travel; and product defects refer to abnormal conditions such as damage, odor, color fading, loose threads, and jamming.
[0038] During the evaluation element candidate identification process, the domain terminology list established in Step 1 is first used to perform term matching on the standardized terms in the comment segments, resulting in the first evaluation element candidate set. Then, a sequence labeling model is used to identify the boundaries of continuous term sequences in the comment segments, resulting in the second evaluation element candidate set. The sequence labeling model uses comment segments as input and term-level labels as output, with the output including the start position, end position, and category of the evaluation object. Subsequently, the first and second evaluation element candidate sets are merged to form the final evaluation element candidate set. For synonymous evaluation element candidates appearing in the same comment segment, they are mapped to a unified evaluation element name based on the cross-language terminology alignment table from Step 1. The evaluation element name uses a unified terminology expression, without retaining abbreviations or undefined alternative expressions.
[0039] After obtaining the candidate set of evaluation elements, an evaluation object attribution determination is performed for each candidate evaluation element. This determination determines whether a candidate evaluation element belongs to the product itself, product variant attributes, product packaging, product logistics, product adaptation relationships, or product usage scenarios. During the evaluation object attribution determination, the product identifier, category identifier, brand identifier, product variant attributes, and original product attribute text corresponding to the comment segment are first read. The original product attribute text comes from the attribute description, specification description, title text, or parameter text on the product details page. Then, similarity matching and contextual dependency analysis are performed between the candidate evaluation element and the original product attribute text. If a candidate evaluation element matches an attribute name or value in the original product attribute text, it is assigned to the corresponding product attribute. If it matches a set of packaging terms, it is assigned to product packaging. If it matches a set of logistics terms, it is assigned to product logistics. If it matches a set of adaptation relation terms, it is assigned to product adaptation relation. If it matches a set of scenario terms, it is assigned to product usage scenario. In other cases, the assignment of the candidate evaluation element is determined by combining the modification relationship between the evaluation term and the head noun in the syntactic dependency relation. After the assignment determination, each candidate evaluation element corresponds to a unique evaluation object category.
[0040] After determining the category of the evaluation object, the evaluation polarity is determined for each candidate evaluation element. Evaluation polarity refers to the positive, negative, or neutral attitude expressed in the comment segment regarding a candidate evaluation element. The evaluation polarity determination uses evaluation words, degree words, negation words, transition words, comparison words, and time state words associated with the candidate evaluation element in the comment segment as input information. Evaluation words are words that directly express attitudes such as satisfaction, dissatisfaction, good or bad, strong or weak, fast or slow; degree words are words that indicate the degree of strength such as very, relatively, slightly, extremely; negation words are words that indicate negation such as no, not, none, not; transition words are words that indicate semantic reversal such as but, however, etc.; comparison words are words that indicate comparative relationships such as more, compared, more, most; and time state words are words that indicate time stages such as just started, after use, after long use, etc. When determining the polarity of an evaluation, the system first identifies a set of evaluation words in the comment segment that have a dependency or adjacency relationship with the candidate evaluation element. Then, it corrects the polarity direction of the evaluation words based on negation words, transition words, and comparison words. Finally, a fine-grained sentiment classification model is used to output the evaluation polarity category corresponding to the candidate evaluation element. The evaluation polarity category includes at least positive, negative, and neutral evaluation polarities.
[0041] After obtaining the evaluation polarity category, evaluation intensity quantification is performed on each evaluation element candidate. Evaluation intensity represents the strength of the attitude expressed by the comment segment towards the evaluation element candidate. Evaluation intensity is calculated jointly by the base polarity intensity, degree correction, negation correction, transition correction, and rating mapping correction. Evaluation intensity is calculated according to the following formula: ; in, This represents the evaluation intensity of the j-th evaluation element candidate in the i-th standardized comment unit; This represents the basic polarity strength output by the fine-grained sentiment classification model; This indicates the degree correction amount, which is determined based on the degree term associated with the candidate evaluation element; This indicates the amount of negation correction, which is determined based on the position and scope of the negation word. This indicates the amount of adjustment for a change in the direction of evaluation in the clauses before and after the transition word. This represents the rating mapping correction amount, which is determined based on the rating mapping value corresponding to the standardized comment unit. A positive rating intensity indicates positive intensity, and a negative rating intensity indicates negative intensity. The larger the absolute value of the rating intensity, the stronger the attitude.
[0042] After quantifying the evaluation intensity, evaluation evidence fragments are extracted for each candidate evaluation element. An evaluation evidence fragment is the smallest semantic segment in a comment sentence that directly supports the evaluation object and the evaluation polarity determination result. During evaluation evidence fragment extraction, the process expands along the syntactic dependency tree from the candidate evaluation element to its modifiers, governing words, and complements, forming continuous text segments containing evaluation object words, evaluation words, degree words, negation words, and transition words. When multiple evaluation objects exist in the same comment sentence, corresponding evaluation evidence fragments are extracted for each. Each evaluation evidence fragment is mapped to its corresponding candidate evaluation element, evaluation object category, evaluation polarity category, and evaluation intensity. After this processing, each candidate evaluation element has traceable textual evidence.
[0043] After obtaining the evaluation evidence fragments, implicit demand identification is performed for each evaluation element candidate. Implicit demand refers to product needs that are not explicitly expressed in the comment segment but are indirectly reflected through complaints, comparisons, usage obstacles, functional deficiencies, or improvement tendencies. Implicit demand identification first identifies demand trigger patterns in the comment segment. Demand trigger patterns include missing expressions, obstacle expressions, improvement expressions, inadequate comparison expressions, and expectation expressions. Missing expressions refer to statements such as "not available," "lacking," "not included," or "not equipped"; obstacle expressions refer to statements such as "cannot be installed," "not easy to use," "inconvenient," or "prone to breaking"; improvement expressions refer to statements such as "should be added," "best to change," or "hope for improvement"; inadequate comparison expressions refer to statements such as "worse than before," "not as good as other models," or "not as good as the old model"; and expectation expressions refer to statements such as "if it could be included," or "it would be better if it were added." After identifying the demand trigger patterns, the implicit demand content is extracted by combining the evaluation object category and evaluation evidence fragment corresponding to the evaluation element candidate, and the implicit demand content is mapped to unified demand terminology. The standardized terminology for requirements adopts the format of "requirement object + requirement direction." The requirement object corresponds to the specific object within the evaluation object category, while the requirement direction corresponds to standardized directional terms such as "add," "optimize," "strengthen," "reduce," "simplify," "improve," "stabilize," or "adapt." After this processing, the implicit requirements related to product improvement in each comment segment are structurally represented.
[0044] After identifying implicit needs, cross-language semantic alignment is performed on each candidate evaluation element. The objects of cross-language semantic alignment include the evaluation element name, evaluation object category, evaluation polarity category, core evaluation words in the evaluation evidence fragment, and implicit need content. During cross-language semantic alignment, the corresponding cross-language word vector mapping table or cross-language semantic representation model is first invoked based on the comment's language tag to map the candidate evaluation elements from different languages to a shared semantic space; then, the semantic similarity between the candidate evaluation element and each candidate in the standard evaluation element vocabulary is calculated. The standard evaluation element vocabulary consists of evaluation element names expressed in unified terminology, with each evaluation element name corresponding to a specific evaluation object category. The semantic similarity between a candidate evaluation element and a candidate in the standard evaluation element vocabulary is calculated using the following formula: ; in, This represents the alignment similarity between the j-th candidate evaluation element and the m-th standard evaluation element in the i-th standardized comment unit; This represents the semantic vector of the j-th evaluation element candidate in the i-th standardized comment unit within the shared semantic space; This represents the semantic vector of the m-th standard evaluation element in the shared semantic space; Indicates the cosine similarity between the two; This indicates the term mapping match item, which is determined by the direct matching result of the cross-language term alignment table in step one. When there is a direct match, this item takes a larger value, and when there is no direct match, this item takes a smaller value. This indicates the contextual consistency item, which is determined based on the degree of consistency between the comment segment containing the candidate evaluation element and the typical context of the standard evaluation element; , and These represent the weight coefficients of cosine similarity, term mapping matching, and contextual consistency, respectively. For each candidate evaluation element, the standard evaluation element with the highest alignment similarity is selected as the alignment result, and the original candidate evaluation element is replaced with the corresponding standard evaluation element name.
[0045] After completing cross-language semantic alignment, ambiguity resolution is performed on the aligned standard evaluation elements. Ambiguity resolution addresses situations where the same term has different meanings in different categories, products, or contexts. During ambiguity resolution, the category identifier, product title text, product attribute text, and evaluation evidence fragments corresponding to the current standardized review unit are read, and the standard evaluation elements are matched with the category-specific terminology. When a standard evaluation element has multiple candidate meanings under the current category, the attribute words in the product attribute text and adjacent words in the evaluation evidence fragments are used as the criteria to select the meaning consistent with the current product context as the final standard evaluation element name. After ambiguity resolution, each evaluation element retains only one definite meaning.
[0046] After obtaining the standard evaluation element names, a structured assembly of evaluation elements is performed for each standardized comment unit. This structured assembly integrates the identified standard evaluation element names, evaluation object categories, evaluation polarity categories, evaluation intensity, evaluation evidence fragments, implicit demand content, comment language tags, comment time, rating mapping values, comment credibility coefficients, and comment quality scores within the same standardized comment unit into fine-grained evaluation records. Each fine-grained evaluation record corresponds to one standard evaluation element and only one evaluation object category, one evaluation polarity category, and one evaluation intensity. If the same standard evaluation element is mentioned repeatedly in the same comment segment, multiple evaluation evidence fragments are merged; when there are both positive and negative expressions, they are split into multiple fine-grained evaluation records according to the scope of transition words and the scope of time state words.
[0047] After assembling the fine-grained evaluation records, aggregation is performed on the fine-grained evaluation records for the same candidate product. Aggregation is based on the product identifier as the settling condition and the standard evaluation element name as the aggregation primary key. For multiple fine-grained evaluation records corresponding to the same product and the same standard evaluation element name, the frequency of positive evaluations, negative evaluations, neutral evaluations, average evaluation intensity, weighted evaluation intensity, frequency of implicit demand occurrence, and set of evaluation evidence fragments are calculated respectively. The weighted evaluation intensity is calculated using the following formula: ; in, This represents the weighted evaluation intensity of the k-th candidate product on the m-th standard evaluation element; This represents the number of fine-grained evaluation records corresponding to the k-th candidate product on the m-th standard evaluation element; This represents the evaluation intensity of the r-th fine-grained evaluation record; This represents the credibility coefficient of the comment corresponding to the r-th fine-grained evaluation record; This represents the comment quality score corresponding to the r-th fine-grained evaluation record. The weighted evaluation intensity obtained by this formula is used to characterize the comprehensive evaluation direction and degree of a candidate product on a certain standard evaluation element.
[0048] After calculating the weighted evaluation strength, the demand attention score is calculated for each standard evaluation element under the same candidate product. The demand attention score indicates the degree to which that standard evaluation element is centrally discussed in comments. The demand attention score is calculated using the following formula: ; in, This indicates the demand attention given to the k-th candidate product on the m-th standard evaluation element; This represents the frequency of occurrence of the m-th standard evaluation element in the k-th candidate product. This frequency is obtained by the ratio of the number of corresponding fine-grained evaluation records to the total number of fine-grained evaluation records for the candidate product. This represents the percentage of negative evaluations for the m-th standard evaluation element in the k-th candidate product. This percentage of negative evaluations is obtained by the ratio of the frequency of negative evaluations to the total frequency of evaluations for that standard evaluation element. This indicates the frequency of occurrence of the implicit demand corresponding to the m-th standard evaluation element in the k-th candidate product; , and These represent the weighting coefficients for frequency of occurrence, percentage of negative evaluations, and frequency of implicit demand, respectively. After this processing, each candidate product corresponds to a level of demand attention for each standard evaluation element.
[0049] Finally, based on the product identification, all standard evaluation element names, evaluation object categories, positive evaluation frequency, negative evaluation frequency, neutral evaluation frequency, weighted evaluation intensity, demand attention, implicit demand content set, and evaluation evidence fragment set are summarized to form a fine-grained evaluation element set corresponding to each candidate product.
[0050] In step three, the fine-grained evaluation factor set formed in step two is used to extract, quantify, normalize, fuse, and calculate the decision value for each candidate product, thereby obtaining the product selection decision value corresponding to each candidate product.
[0051] First, the set of fine-grained evaluation elements for each candidate product is retrieved according to the product identifier. Each set of fine-grained evaluation elements includes the standard evaluation element name, evaluation object category, positive evaluation frequency, negative evaluation frequency, neutral evaluation frequency, weighted evaluation intensity, demand attention level, implicit demand content set, and evaluation evidence fragment set. Based on the set of fine-grained evaluation elements, an evaluation element statistical table is constructed for each candidate product. The evaluation element statistical table is a structured data table organized by the standard evaluation element name, with each row corresponding to one standard evaluation element and each column corresponding to one statistical field. The statistical fields include at least the total evaluation frequency, positive evaluation frequency, negative evaluation frequency, neutral evaluation frequency, weighted evaluation intensity, demand attention level, implicit demand occurrence frequency, comment time distribution, and product variant attribute distribution. The total evaluation frequency is obtained by adding the frequencies of positive, negative, and neutral evaluations corresponding to the standard evaluation element; the frequency of implicit demand occurrence is obtained by the number of records corresponding to the standard evaluation element in the implicit demand content set; the comment time distribution is obtained by statistically analyzing the comment time of the fine-grained evaluation records corresponding to the standard evaluation element; and the product variant attribute distribution is obtained by statistically analyzing the product variant attributes in the fine-grained evaluation records corresponding to the standard evaluation element. After this processing, each candidate product corresponds to an evaluation element statistics table.
[0052] After obtaining the evaluation element statistics table, a market demand index is constructed for each candidate product. The market demand index represents the degree of demand concentration and activity reflected in the comments of the candidate product. The market demand index is calculated jointly by the evaluation element coverage, demand attention aggregation value, implicit demand density, and time growth trend. Evaluation element coverage represents the breadth of effective standard evaluation elements involved in the candidate product; its value is obtained by the ratio of the number of standard evaluation elements whose total evaluation frequency reaches the preset effective condition to the total number of standard evaluation elements of the candidate product. The demand attention aggregation value is obtained by averaging the accumulated demand attention corresponding to all standard evaluation elements of the candidate product based on the number of standard evaluation elements. Implicit demand density is obtained by the ratio of the total frequency of implicit demand occurrences to the total frequency of evaluations. The time growth trend is obtained by the slope of the change in demand attention within each time interval in the comment time distribution. Based on the above data, the market demand index for the k-th candidate product is calculated using the following formula: ; in, This represents the market demand index for the k-th candidate product; This indicates the coverage of evaluation elements for the k-th candidate product; This represents the aggregated demand attention value for the k-th candidate product; This represents the implicit demand density of the k-th candidate product; This represents the time-varying trend of the k-th candidate item; , , and These represent the weighting coefficients for evaluation element coverage, demand attention aggregation value, implicit demand density, and time growth trend, respectively. Higher evaluation element coverage indicates that the candidate product is discussed by users across more standard evaluation elements; a higher demand attention aggregation value indicates a higher concentration of discussion across multiple standard evaluation elements; a higher implicit demand density indicates that comments contain more expressions about improvement directions and unmet needs; and a stronger time growth trend indicates that related demands are strengthening over time.
[0053] After constructing market demand indicators, a product satisfaction indicator is constructed for each candidate product. The product satisfaction indicator represents the overall evaluation performance of the candidate product across all standard evaluation elements. The product satisfaction indicator is calculated jointly by the proportion of positive evaluations, the proportion of negative evaluations, the weighted evaluation intensity, and the consistency of ratings. The proportion of positive evaluations is obtained by the ratio of the sum of the frequencies of positive evaluations across all standard evaluation elements of the candidate product to the total frequency of evaluations; the proportion of negative evaluations is obtained by the ratio of the sum of the frequencies of negative evaluations across all standard evaluation elements of the candidate product to the total frequency of evaluations; the weighted evaluation intensity is obtained by weighting the weighted evaluation intensities corresponding to all standard evaluation elements of the candidate product according to the total frequency of evaluations; the consistency of ratings is obtained by statistically analyzing the consistency between the evaluation polarity category corresponding to the fine-grained evaluation records and the rating mapping values. The consistency of ratings increases when positive evaluation polarity mainly corresponds to higher rating mapping values and negative evaluation polarity mainly corresponds to lower rating mapping values. Based on the above data, the product satisfaction indicator for the k-th candidate product is calculated using the following formula: ; in, This represents the product satisfaction index for the k-th candidate product; This represents the percentage of positive reviews for the k-th candidate product; This represents the percentage of negative reviews for the k-th candidate product; This represents the comprehensive weighted evaluation intensity of the k-th candidate product. This comprehensive weighted evaluation intensity is obtained by weighting the weighted evaluation intensity of each standard evaluation element according to the total evaluation frequency. This represents the consistency of the ratings for the k-th candidate product; , , and These represent the weighting coefficients for the proportion of positive evaluations, the proportion of negative evaluations, the overall weighted evaluation intensity, and the consistency of scores, respectively. A negative sign is placed before the proportion of negative evaluations to indicate that a higher proportion of negative evaluations leads to a decrease in product satisfaction indicators.
[0054] After constructing product satisfaction indicators, product risk indicators are constructed for each candidate product. Product risk indicators represent the degree of defect concentration, negative diffusion, and evaluation instability revealed in reviews of candidate products. Product risk indicators are calculated jointly by defect concentration, negative evaluation intensity, time-dependent increase risk, and variant difference risk. Defect concentration is obtained by the ratio of the sum of negative evaluation frequencies corresponding to negative standard evaluation elements belonging to the product defect performance, product logistics, and product packaging categories to the total evaluation frequency. Negative evaluation intensity is obtained by weighting the absolute values of evaluation intensity of all negative fine-grained evaluation records according to the review credibility coefficient and review quality score. Time-dependent increase risk is obtained by the increase in negative evaluation frequency between adjacent time intervals; when the frequency of negative evaluations in a certain time interval increases significantly compared to the previous time interval, the time-dependent increase risk increases. Variant difference risk is obtained by the dispersion of weighted evaluation intensity of the same standard evaluation elements under different product variant attributes; when different colors, sizes, specifications, or models show significant differences in evaluation results, the variant difference risk increases. Based on the above data, the product risk index for the k-th candidate product is calculated using the following formula: ; in, This represents the product risk index of the k-th candidate product; This represents the defect concentration of the k-th candidate product; This indicates the negative evaluation intensity of the k-th candidate product; This indicates the risk of a sudden increase in time for the k-th candidate item; This represents the variant difference risk of the k-th candidate product; , , and These represent the weighting coefficients for defect concentration, negative evaluation intensity, time surge risk, and variant difference risk, respectively. The adjacent time intervals in the time surge risk are obtained by dividing the comment times according to their chronological order after uniform time formatting; the dispersion in the variant difference risk is calculated using the variance or standard deviation of the weighted evaluation intensity corresponding to different product variant attributes.
[0055] After constructing product risk indicators, a competitive opportunity indicator is constructed for each candidate product. The competitive opportunity indicator represents the potential entry points and differentiation directions of the candidate product as demonstrated in existing evaluations. The competitive opportunity indicator is calculated jointly by the proportion of high-attention, low-satisfaction evaluation elements, the degree of implicit demand aggregation, and the expansion of the evaluation object. The proportion of high-attention, low-satisfaction evaluation elements refers to the ratio of the number of standard evaluation elements whose demand attention is higher than the average level of all standard evaluation elements for the candidate product, but whose weighted evaluation intensity is lower than the average level of all standard evaluation elements for the candidate product, to the total number of standard evaluation elements. The degree of implicit demand aggregation is obtained by the number and concentration of recurring unified demand terms in the implicit demand content set; this indicator increases when unified demand terms are concentrated in a few standard evaluation elements. The expansion of the evaluation object is obtained by the coverage of evaluation object categories such as product attributes, product components, product performance, product appearance, product packaging, product logistics, product compatibility, and product usage scenarios; the more categories covered, the higher the expansion of the evaluation object. Based on the above data, the competitive opportunity indicator for the k-th candidate product is calculated using the following formula: ; in, This represents the competitive opportunity index for the k-th candidate product; This represents the proportion of high attention and low satisfaction evaluation factors for the k-th candidate product; This represents the implicit demand clustering degree of the k-th candidate product; This represents the expansion degree of the evaluation object for the k-th candidate product; , and These represent the weighting coefficients for the proportion of evaluation elements with high attention but low satisfaction, the degree of implicit demand aggregation, and the extent of evaluation object expansion, respectively. The high attention in the proportion of evaluation elements with high attention but low satisfaction is determined by comparing the demand attention level with the average demand attention level within the candidate product, while the low satisfaction level is determined by comparing the weighted evaluation intensity with the average weighted evaluation intensity within the candidate product.
[0056] After obtaining the market demand indicator, product satisfaction indicator, product risk indicator, and competitive opportunity indicator, normalization is performed on these indicators. Normalization converts indicators with different dimensions and value ranges into a unified scale. For positive indicators, range normalization is used; for negative indicators, reverse range normalization is used. Positive indicators, including market demand, product satisfaction, and competitive opportunity indicators, are those where larger values indicate better product performance. Negative indicators, including product risk indicators, are those where larger values indicate worse product performance. For any positive indicator of the k-th candidate product... Its normalized result is calculated according to the following formula: ; in, This represents the normalized result of the positive index corresponding to the k-th candidate product; This represents the original index value of the k-th candidate product; This represents the minimum value of this indicator among all candidate products; This represents the maximum value of this indicator among all candidate products. For any inverse indicator of the k-th candidate product... Its normalized result is calculated according to the following formula: ; in, This represents the normalized result of the inverse index corresponding to the k-th candidate product; This represents the original index value of the k-th candidate product; This represents the minimum value of this indicator among all candidate products; This represents the maximum value of this indicator among all candidate products. After normalization, the values of each candidate product for each indicator are converted to the same comparison scale.
[0057] After normalizing all indicators, indicator weights are determined. Indicator weights represent the relative importance of different indicators in product selection decisions. Indicator weights are determined using a combination of comment-driven dispersion and domain rule constraints. Comment-driven dispersion is obtained by comparing the differences among candidate products on the same indicator; the greater the difference, the more significant the indicator's role in distinguishing candidate products. Domain rule constraints limit the range of indicator weights through pre-defined product selection rules. These rules consist of general judgment requirements in cross-border e-commerce product selection scenarios, including four categories: demand priority, satisfaction balance, risk mitigation, and opportunity identification. First, cross-product dispersion coefficients are calculated for market demand, product satisfaction, product risk, and competitive opportunity indicators. Then, initial weights are formed based on the proportion of dispersion coefficients. Finally, the initial weights are adjusted according to domain rule constraints to obtain the final indicator weights. The final indicator weights are denoted as market demand weight, product satisfaction weight, product risk weight, and competitive opportunity weight, and their sum is one.
[0058] After obtaining the final indicator weights, a selection decision value is calculated for each candidate product. The selection decision value represents the overall selection priority of the candidate product after considering factors such as demand, satisfaction, risk, and opportunity. The selection decision value for the k-th candidate product is calculated using the following formula: ; in, This represents the product selection decision value for the k-th candidate product; This represents the normalized result of the market demand index for the k-th candidate product; This represents the normalized result of the product satisfaction index for the k-th candidate product; This represents the normalized result of the product risk index for the k-th candidate product; This represents the normalized result of the competition opportunity index for the k-th candidate product; , , and These represent the weights of market demand, product satisfaction, product risk, and competitive opportunity, respectively. Since the product risk indicator has already undergone inverse indicator normalization during the normalization stage, a larger normalization result indicates lower risk. When substituted into the product selection decision value calculation formula, it maintains the same direction as other positive indicators.
[0059] After obtaining the product selection decision value, the product selection stability is calculated for each candidate product. Product selection stability represents the consistency of the product selection decision value across different time intervals and different product variant attributes. To calculate the product selection stability, the standardized comment corpus for each candidate product is first divided into multiple time subsets based on comment time, and then divided into multiple variant subsets based on product variant attributes. Subsequently, the market demand indicator, product satisfaction indicator, product risk indicator, competitive opportunity indicator, and product selection decision value are repeatedly calculated on each time subset and each variant subset to obtain multiple local product selection decision values. The product selection stability is then calculated based on the dispersion of these multiple local product selection decision values. The product selection stability of the k-th candidate product is calculated using the following formula: ; in, This indicates the selection stability of the k-th candidate product; This represents the standard deviation of the local selection decision value of the k-th candidate product across each time subset and each variant subset; This represents the average local selection decision value for the k-th candidate product across all time subsets and variant subsets. A smaller standard deviation of the local selection decision value indicates more stable decision results across different time intervals and product variant attributes; the average local selection decision value is used for scaling correction of the standard deviation. If... Greater than Then the lower limit of product stability will be truncated to zero.
[0060] After obtaining the product selection stability, a product selection correction value is calculated for each candidate product. The correction value is used to adjust the product selection decision value based on the product selection stability and risk concentration status. Risk concentration status refers to whether negative evaluations are concentrated in a few high-attention standard evaluation elements. Risk concentration status is obtained by the proportion of negative evaluation frequencies in the high-demand attention standard evaluation elements. The product selection correction value for the k-th candidate product is calculated according to the following formula: ; in, This represents the selection adjustment value for the k-th candidate product; This indicates the selection stability of the k-th candidate product; This represents the risk concentration index of the k-th candidate product; and These represent the weighting coefficients for the product selection stability and risk concentration indicators, respectively. A higher risk concentration indicator indicates that negative evaluations are more concentrated on high-priority evaluation factors, thus weakening the product selection results.
[0061] After calculating the product selection adjustment value, calculate the final product selection value for each candidate product. The final product selection value is calculated using the following formula: ; in, This represents the final selection value of the k-th candidate product; This represents the product selection decision value for the k-th candidate product; This represents the selection adjustment value for the k-th candidate product. The larger the final selection value, the higher the priority of the candidate product after comprehensively considering market demand, product satisfaction, product risk, competitive opportunities, time stability, variant stability, and risk concentration.
[0062] Finally, according to the product identifier, the market demand index, product satisfaction index, product risk index, competitive opportunity index, product selection decision value, product selection stability, product selection correction value and final product selection value of each candidate product are saved to form a result set of candidate product selection indexes.
[0063] In step four, the candidate product selection index result set formed in step three is sorted, calculated, screened, corrected, differentiated, and the final product selection result is generated to obtain the final sorting result of the candidate products and the corresponding set of product selection criteria.
[0064] First, the candidate product selection indicator result set for each candidate product is retrieved according to its product identifier. This result set includes market demand indicators, product satisfaction indicators, product risk indicators, competitive opportunity indicators, selection decision value, selection stability, selection correction value, and final selection value. Using the final selection value as the primary ranking criterion, an initial ranking is performed on all candidate products, resulting in an initial ranking sequence. In this initial ranking sequence, each candidate product has a unique ranking position, with the candidate product having a larger final selection value ranking higher. When two or more candidate products have the same final selection value or a difference less than a preset distinction threshold, the final selection value is not used for forced distinction; instead, a further judgment process for candidates with the same value is initiated.
[0065] After obtaining the initial sorting sequence, a re-judgment process is performed on candidate products with the same final selection value or whose difference is less than a preset differentiation threshold. The re-judgment process compares market demand indicators, product satisfaction indicators, product risk indicators, competitive opportunity indicators, and product selection stability in that order. The comparison order follows the sequence of demand priority, satisfaction balance, risk mitigation, opportunity identification, and stability constraints. If two candidate products with the same value have different market demand indicators, the candidate product with the larger market demand indicator is ranked higher. If the market demand indicators are the same, the product satisfaction indicators are compared, and the candidate product with the larger product satisfaction indicator is ranked higher. If the product satisfaction indicators are the same, the normalized results of the product risk indicators are compared, and the candidate product with the larger normalized results of the product risk indicators is ranked higher. If the normalized results of the product risk indicators are the same, the competitive opportunity indicators are compared, and the candidate product with the larger competitive opportunity indicators is ranked higher. If the competitive opportunity indicators are the same, the product selection stability is compared, and the candidate product with the larger product selection stability is ranked higher. If all the above indicators are the same, the number of standard evaluation elements covered in the set of fine-grained evaluation elements is compared, and the candidate product with the larger number of standard evaluation element coverages is ranked higher. After this re-judgment process, all candidate products form a ranking result with no ties.
[0066] After sorting, a selection screening process is performed on each candidate product. This screening includes absolute and relative screening. Absolute screening compares each candidate product's market demand index, product satisfaction index, normalized product risk index, competitive opportunity index, and product stability with the corresponding screening criteria. If a candidate product's market demand index is lower than the effective demand condition, or its product satisfaction index is lower than the effective satisfaction condition, or its normalized product risk index is lower than the effective risk condition, or its competitive opportunity index is lower than the effective opportunity condition, or its product stability is lower than the effective stability condition, the candidate product is marked as not meeting the selection criteria. Relative screening selects candidates from those meeting the absolute screening criteria, ranking them from highest to lowest final selection value within the top-ranked interval. The top-ranked interval can be represented by a fixed ranking position interval or by a top proportion interval corresponding to the total number of candidate products. After this process, all candidate products are divided into selected and unselected candidate products.
[0067] It should be noted that in the cross-border e-commerce product selection process based on multilingual review mining, after candidate products undergo standardized review corpus construction, fine-grained evaluation element extraction, and product selection decision value calculation, an abnormal ranking state may emerge that appears stable on the surface but internally shifts. This abnormal ranking state, characterized by apparent stability but internal shifts, refers to a situation where the final selection value of a candidate product remains high, fluctuates little, or continuously increases during continuous calculation. However, the evaluation basis constituting this final selection value is not consistently stable but rather continuously shifts between different time intervals, different product variant attributes, different source regions, and different standard evaluation elements. Specifically, the standard evaluation factors that drove up the ranking of candidate products in the previous stage gradually weakened or disappeared from the main contributing position in the later stage, while standard evaluation factors, implicit demand content, or negative evaluation information that were originally in a marginal position began to take over and enter the main influencing position; the high evaluation results of the same candidate product under one product variant attribute still significantly pulled the overall ranking, while the negative evaluation under another product variant attribute began to gather towards the core evaluation object; the same candidate product still showed positive enhancement in one regional source, while in another regional source, the evaluation focus had shifted from non-core evaluation objects to core evaluation objects. Since the above changes may cancel each other out or mask each other in the overall numerical value, although the final selection value of the candidate product still remained high, the evaluation structure corresponding to this high value was no longer stable, but entered a high-value fragile state driven by multiple evaluation factors.
[0068] In this high-value vulnerable state, the ranking of candidate products is no longer solely influenced by the evaluation level, but is simultaneously affected by the migration of evaluation contribution sources, the shift in the center of evaluation influence, changes in the direction of implicit demand aggregation, and the reorganization of the multi-source evaluation pathways. It is difficult to identify whether this high-value state has transitioned from a stable high value to a vulnerable high value, and it is also difficult to identify whether a changing relationship of alternating dominance, partial coverage, and directional restructuring has formed among the multiple evaluation factors within a candidate product. This makes it easy to keep candidate products that have entered the high-value vulnerable state in the top ranking range.
[0069] Therefore, in this embodiment, after the selection and screening process is completed, high-value vulnerability identification and dynamic correction processing is performed on each shortlisted candidate product. High-value vulnerability identification and dynamic correction processing addresses situations where the final selection value of a candidate product remains high, fluctuates little, or continues to rise, but the evaluation basis constituting that final selection value continuously shifts between different time intervals, different product variant attributes, different source regions, and different standard evaluation elements. For each shortlisted candidate product, the corresponding standardized review unit is first divided into multiple time interval subsets according to the review time. Then, each time interval subset is further divided into multiple product variant attribute subsets according to the product variant attributes. Finally, each product variant attribute subset is further divided into multiple source region subsets according to the regional site information in the source identifier, thus forming a three-dimensional evaluation distribution unit set corresponding to the shortlisted candidate product, consisting of time interval, product variant attributes, and source region. Each time interval, product variant attribute, and source region three-dimensional evaluation distribution unit is associated with the corresponding standard evaluation element name, evaluation object category, evaluation polarity category, evaluation intensity, demand attention, implicit demand content, review credibility coefficient, and review quality score.
[0070] like Figure 1 As shown, after forming a three-dimensional evaluation distribution unit set encompassing the time interval, product variant attributes, and source region, an evaluation contribution trajectory tensor is constructed for each selected candidate product. The evaluation contribution trajectory tensor is used to represent the changing relationship of the evaluation contribution of the same candidate product under different time intervals, different product variant attributes, different source regions, and different standard evaluation elements. For the k-th selected candidate product, the local evaluation contribution value under the u-th time interval, v-th product variant attribute, s-th source region, and m-th standard evaluation element is calculated according to the following formula: ; in, This represents the local evaluation contribution value of the k-th selected candidate product under the u-th time interval, the v-th product variant attribute, the s-th source region, and the m-th standard evaluation element; This represents the local demand attention of the m-th standard evaluation element in the corresponding three-dimensional evaluation distribution unit. This local demand attention is calculated from the occurrence frequency, negative evaluation ratio, and implicit demand occurrence frequency of the m-th standard evaluation element in the three-dimensional evaluation distribution unit. This represents the local weighted evaluation intensity of the m-th standard evaluation element in the corresponding three-dimensional evaluation distribution unit. The local weighted evaluation intensity is calculated by weighting the evaluation intensity, the comment credibility coefficient, and the comment quality score in the three-dimensional evaluation distribution unit. This represents the aggregated local review credibility value of the m-th standard evaluation element in the corresponding three-dimensional evaluation distribution unit. This aggregated local review credibility value is obtained by aggregating the review credibility coefficients of the corresponding review records within the three-dimensional evaluation distribution unit. This represents the local review quality aggregation value of the m-th standard evaluation element in the corresponding three-dimensional evaluation distribution unit. This local review quality aggregation value is obtained by aggregating the review quality scores of the corresponding review records within this three-dimensional evaluation distribution unit. This is achieved by analyzing all... The calculation forms the evaluation contribution trajectory tensor of the kth selected candidate product.
[0071] After obtaining the evaluation contribution trajectory tensor, the evaluation contribution migration amount is calculated for each shortlisted candidate product. The evaluation contribution migration amount indicates whether the standard evaluation elements driving the candidate product to maintain a high value have shifted their dominant position between adjacent time intervals. First, within each time interval, the local evaluation contribution values corresponding to all standard evaluation elements of the shortlisted candidate product are normalized to obtain the standard evaluation element contribution distribution; then, the degree of change in the standard evaluation element contribution distribution between adjacent time intervals is compared. The evaluation contribution migration amount of the k-th shortlisted candidate product is calculated according to the following formula: ; in, This represents the evaluation contribution migration of the k-th selected candidate product; This represents the number of time intervals corresponding to the k-th selected candidate product; This represents the number of standard evaluation elements corresponding to the k-th selected candidate product; This represents the contribution percentage of the m-th standard evaluation element for the k-th selected candidate product within the u-th time interval. This contribution percentage is obtained by dividing the sum of the local evaluation contribution values of the m-th standard evaluation element within the u-th time interval by the sum of the local evaluation contribution values of all standard evaluation elements within that time interval. The larger the evaluation contribution migration, the more the high-value state of the selected candidate product depends on the alternating support of different standard evaluation elements at different times.
[0072] After obtaining the evaluation contribution migration, the evaluation center of gravity offset is calculated for each selected candidate product. The evaluation center of gravity offset indicates whether the evaluation influence center of the candidate product has shifted from a non-core evaluation object to a core evaluation object, or vice versa. First, all evaluation object categories are assigned fixed position codes. These codes characterize the position of product attributes, components, performance, appearance, packaging, logistics, compatibility, and usage scenarios within a unified evaluation object coordinate system. Then, the evaluation center of gravity coordinates for each time interval are calculated based on the local evaluation contribution value corresponding to each evaluation object category. Subsequently, the changes in evaluation center of gravity coordinates between adjacent time intervals are compared. The evaluation center of gravity offset for the kth selected candidate product is calculated using the following formula: ; in, This represents the evaluation center of gravity offset of the kth selected candidate product; This represents the evaluation centroid coordinates of the k-th candidate product in the u-th time interval. These coordinates are obtained by weighted averaging of the position codes of each evaluated object category and their corresponding local evaluation contribution values within the u-th time interval. The larger the evaluation centroid offset, the more the main influencing object category of the candidate product is continuously shifting.
[0073] After obtaining the evaluation center of gravity shift, the implicit demand shift is calculated for each shortlisted candidate product. The implicit demand shift indicates whether the implicit demand content of the same candidate product has continuously shifted from one demand direction to another. First, the implicit demand content is categorized according to unified demand terminology, with each unified demand term represented by both the demand object and demand direction. Then, the changes in the distribution of unified demand terms are compared between adjacent time intervals. The implicit demand shift of the k-th shortlisted candidate product is calculated using the following formula: ; in, This represents the implied demand shift for the k-th selected candidate product; This represents the number of uniform demand terms corresponding to the k-th selected candidate product; This represents the percentage of occurrence of the d-th unified demand term for the k-th selected candidate product within the u-th time interval. This percentage is obtained by dividing the frequency of occurrence of the d-th unified demand term within the u-th time interval by the total frequency of occurrence of all unified demand terms within that time interval. The larger the implied demand shift, the more significant the change in demand focus for the selected candidate product across different time intervals.
[0074] After obtaining the implicit demand shift, the source region diffusion amount is calculated for each shortlisted candidate product. The source region diffusion amount indicates whether the evaluation issues, originally limited to certain source regions, have begun to expand to more source regions. First, the frequency of negative evaluations, the intensity of negative evaluations, and the frequency of implicit demand for each standard evaluation element in different source regions are aggregated to obtain the problem distribution vector of that standard evaluation element in each source region; then, the expansion of the problem distribution vector along the source region dimension is compared in adjacent time intervals. The source region diffusion amount of the k-th shortlisted candidate product is calculated according to the following formula: ; in, This represents the diffusion amount of the source region for the k-th selected candidate product; This represents the number of source regions where the preset problem condition appears at least in the u-th time interval for the k-th selected candidate product; This represents the total number of source regions corresponding to the k-th selected candidate product. The preset problem condition is that a certain standard evaluation element in a source region simultaneously satisfies the following conditions: negative evaluation frequency is not zero, negative evaluation intensity is less than zero, and the frequency of occurrence of the corresponding implicit demand is not zero. The greater the diffusion of source regions, the higher the degree to which the problem of the candidate product has expanded from a local area to more areas.
[0075] After obtaining the diffusion volume in the source region, the penetration volume of product variant attributes is calculated for each selected candidate product. The penetration volume of product variant attributes indicates whether issues originally limited to a portion of product variant attributes have begun to expand to more product variant attributes. First, the distribution of issues for each standard evaluation element under different product variant attributes is statistically analyzed, and then the coverage changes of issues along the product variant attribute dimension are compared between adjacent time intervals. The penetration volume of the product variant attribute for the k-th selected candidate product is calculated using the following formula: ; in, This represents the penetration rate of the product variant attribute of the kth selected candidate product; This represents the number of product variant attributes for the k-th selected candidate product that at least meet the preset problem condition within the u-th time interval; This represents the total number of product variant attributes corresponding to the k-th selected candidate product. The preset problem condition is that a certain standard evaluation element under a certain product variant attribute simultaneously satisfies the following conditions: negative evaluation frequency is not zero, negative evaluation intensity is less than zero, and the frequency of occurrence of the corresponding implicit demand is not zero. The greater the penetration of product variant attributes, the higher the degree to which the problem of the candidate product expands from local product variant attributes to more product variant attributes.
[0076] After obtaining the above quantities, a high-value retention quantity is calculated for each selected candidate product. The high-value retention quantity represents the degree to which a candidate product maintains a high selection status over a continuous time interval. First, the local selection decision values within each time interval are calculated. The calculation method for local selection decision values is the same as that in step three, but the calculation object is limited to the comment data within the corresponding time interval. Then, the local selection decision values within each time interval are normalized to obtain a time-series high-value distribution. Subsequently, the high-value retention quantity is calculated based on the time-series high-value distribution. The high-value retention quantity of the kth selected candidate product is calculated according to the following formula: ; in, This represents the high-value retention amount of the kth selected candidate product; This represents the normalized result of the local selection decision value of the k-th candidate product within the u-th time interval. The larger the high value retention, the greater the degree to which the candidate product maintains a high value across multiple time intervals.
[0077] After obtaining the evaluation contribution migration, evaluation center of gravity shift, implicit demand shift, source region diffusion, commodity variant attribute penetration, and high-value retention, a high-value vulnerability state judgment matrix is constructed for each shortlisted candidate commodity. The high-value vulnerability state judgment matrix is used to represent the interaction relationships between these quantities. For the k-th shortlisted candidate commodity, the matrix is constructed as follows: : ; in, This represents the high-value vulnerability state judgment matrix for the k-th selected candidate product. Each off-diagonal element in the matrix represents the degree of coupling between two variables. The larger the matrix element, the stronger the linkage between the corresponding two variables.
[0078] After constructing the high-value vulnerability state determination matrix, the high-value vulnerability is calculated for each selected candidate product. The high-value vulnerability indicates whether the current high-value state of the candidate product is based on continuous track switching and diffusion. The high-value vulnerability of the k-th selected candidate product is calculated according to the following formula: ; in, This represents the high vulnerability value of the k-th selected candidate product; This represents the high-value vulnerability state judgment matrix for the k-th selected candidate product. The matrix element in the i-th row and j-th column; This represents the high-value retention amount for the k-th selected candidate product. The high-value retention amount is introduced into the denominator to distinguish between stable high-value retention and high-value track-changing maintenance. When multiple changes are coupled more strongly, and the high-value retention amount is insufficient to absorb these changes, the high-value vulnerability increases.
[0079] After obtaining the high vulnerability value, a structural shift coefficient is calculated for each shortlisted candidate product. The structural shift coefficient represents the degree of dominant path reorganization within the candidate product's internal evaluation structure. The structural shift coefficient of the k-th shortlisted candidate product is calculated using the following formula: ; in, This represents the structural track-switching coefficient of the kth selected candidate product; Indicates the amount of migration of the evaluation contribution; Indicates the evaluation center of gravity offset; This indicates the amount of implicit demand shifting; Indicates the amount of diffusion from the source region; This indicates the penetration rate of product variant attributes. A larger structural shift coefficient indicates that changes in the dominant path originate more from shifts within the evaluation contribution structure; a smaller structural shift coefficient indicates that the problem is more manifested as external diffusion.
[0080] After obtaining the high-value vulnerability and structural switching coefficient, a dynamic correction factor is calculated for each shortlisted candidate product. The dynamic correction factor is determined in a segmented manner. When the high-value vulnerability of the k-th shortlisted candidate product does not meet the vulnerability condition, the dynamic correction factor is set to one; when the high-value vulnerability meets the vulnerability condition and the structural switching coefficient is higher than the diffusion condition, the dynamic correction factor is calculated according to the structural switching attenuation formula; when the high-value vulnerability meets the vulnerability condition and the structural switching coefficient is lower than the diffusion condition, the dynamic correction factor is calculated according to the diffusion attenuation formula. Specifically, the dynamic correction factor of the k-th shortlisted candidate product is calculated according to the following formula: ; in, This represents the dynamic correction factor for the k-th selected candidate product; This represents the high vulnerability value of the k-th selected candidate product; This represents the structural track-switching coefficient of the kth selected candidate product; Indicates the amount of diffusion from the source region. Indicates the penetration rate of product variant attributes; This represents the threshold value corresponding to a vulnerable condition. This represents the threshold value corresponding to the diffusion condition. and These represent the structural track-changing attenuation intensity coefficient and the diffusion attenuation intensity coefficient, respectively. A smaller dynamic correction factor indicates that the current ranking result of the shortlisted product is more significantly affected by abnormally high values.
[0081] After obtaining the dynamic correction factor, the dynamically corrected selection value is calculated for each shortlisted candidate product. The dynamically corrected selection value is calculated according to the following formula: ; in, This represents the dynamically adjusted selection value of the k-th candidate product. This represents the final selection value obtained by the k-th candidate product in step three; This represents the dynamic correction factor for the k-th candidate product. This process maintains the high values of candidate products with stable evaluation bases, while lowering the ranking values of candidate products whose high values are supported by continuous track changes, regional diffusion, or product variant attributes.
[0082] like Figure 2 As shown, after obtaining the dynamically corrected selection values, a recovery judgment value is calculated for each selected candidate product. The recovery judgment value indicates whether the candidate product has shown a trend of stabilizing from a high value in the most recent time interval. The recovery judgment value is calculated jointly by the decrease in the evaluation contribution migration, evaluation center of gravity shift, implicit demand shift, source area diffusion, and product variant attribute penetration in the two most recent time intervals. The recovery judgment value of the kth selected candidate product is calculated according to the following formula:
[0083] in, This represents the recovery determination value of the k-th selected candidate product; and These represent the evaluation contribution migration of the k-th selected candidate product in the penultimate and last time intervals, respectively. and These represent the evaluation center of gravity shifts of the k-th selected candidate product in the penultimate and last time intervals, respectively. and These represent the implied demand shift for the k-th selected candidate product in the penultimate and last time intervals, respectively. and These represent the diffusion amount of the k-th selected candidate product in the penultimate and last time intervals, respectively; and These represent the penetration amount of the product variant attribute of the k-th selected candidate product in the penultimate and last time intervals, respectively. When the recovery judgment value is greater than zero, it indicates that the degree of internal track switching and diffusion has weakened in the most recent time interval.
[0084] After obtaining the recovery determination value, a rebound correction is performed on the dynamically adjusted selection value. The rebound correction is used to prevent candidate products from being continuously suppressed when they have already stabilized in the most recent time interval. The rebound-corrected selection value of the kth selected candidate product is calculated using the following formula: ; in, This represents the rebound-corrected selection value of the k-th selected candidate product; This represents the dynamically adjusted selection value of the k-th candidate product. This represents the recovery determination value of the k-th selected candidate product; This represents the rebound adjustment coefficient of the k-th selected candidate product. This rebound adjustment coefficient is jointly determined by the high value retention amount and the product stability of the k-th selected candidate product. The higher the high value retention amount and the product stability, the larger the rebound adjustment coefficient.
[0085] In a preferred embodiment, the rebound adjustment coefficient is determined by a weighted combination of high-value retention and product stability. For the k-th selected candidate product, a uniform scaling process is first applied to both the high-value retention and product stability. Then, the rebound adjustment coefficient is calculated according to a preset combination weight, where the sum of the combination weight corresponding to the high-value retention and the combination weight corresponding to the product stability is one. A larger high-value retention and higher product stability indicate that the selected candidate product has a more stable high-value distribution over previous time intervals and its local product selection decision value has a lower dispersion, thus resulting in a larger rebound adjustment coefficient. Conversely, a lower high-value retention or lower product stability results in a smaller rebound adjustment coefficient to prevent excessive rebound of the selected product value under short-term fluctuations after dynamic correction.
[0086] In a preferred embodiment, the preset vulnerability threshold and the preset diffusion threshold are determined based on the distribution results of the corresponding indicators of all selected candidate products within the current statistical period. For the preset vulnerability threshold, after sorting the high-value vulnerability of all selected candidate products according to their numerical magnitude, the boundary value corresponding to the upper quantile interval is taken as the preset vulnerability threshold. For the preset diffusion threshold, after sorting the structural track-changing coefficients of all selected candidate products according to their numerical magnitude, the boundary value corresponding to the median interval or the preset quantile interval is taken as the preset diffusion threshold. The structural track-changing attenuation intensity coefficient and the diffusion attenuation intensity coefficient are jointly determined based on the historical ranking fluctuation amplitude, the change amplitude of the selected product value after dynamic correction, and the recovery amplitude of the selected product value after rebound correction, so that when the dynamic correction factor attenuates the abnormally high value state, it does not change the ranking stability of the selected candidate products corresponding to the normal high value maintenance state.
[0087] Next, all shortlisted candidate products are re-ranked according to their rebound-corrected selection values, resulting in a dynamically corrected ranking sequence. Each shortlisted candidate product in the dynamically corrected ranking sequence is mapped to its corresponding evaluation contribution migration, evaluation center of gravity shift, implicit demand shift, source region diffusion, product variant attribute penetration, high value retention, high value vulnerability, structural track shift coefficient, dynamic correction factor, recovery judgment value, and rebound-corrected selection value. Subsequently, based on the dynamically corrected ranking sequence, further differentiated attribution processing and the generation of the final selection results are performed.
[0088] After obtaining the revised ranking sequence, differential attribution processing is performed on each shortlisted candidate product. Differential attribution processing is used to determine the main driving and limiting factors for a shortlisted candidate product to enter the top ranking. The main driving factors refer to the set of standard evaluation elements that positively contribute to the revised selection value, while the main limiting factors refer to the set of standard evaluation elements that negatively constrain the revised selection value. During differential attribution processing, the fine-grained evaluation element set and evaluation element statistics table for the shortlisted candidate product are first read, and then the element contribution value is calculated for each standard evaluation element. The element contribution value is calculated according to the following formula: ; in, This represents the contribution value of the m-th standard evaluation element among the k-th selected candidate products; This represents the demand attention level of the m-th standard evaluation element among the k-th selected candidate products; This represents the weighted evaluation intensity of the m-th standard evaluation element among the k-th selected candidate products; This represents the implicit demand clustering value of the m-th standard evaluation element among the k-th selected candidate products. This implicit demand clustering value is calculated by the frequency and concentration of the same unified demand term on this standard evaluation element. This represents the negative pressure value of the m-th standard evaluation element among the k-th selected candidate products. This negative pressure value is calculated by combining the frequency of negative evaluation, the intensity of negative evaluation, and the risk concentration state. , , and These represent the weighting coefficients for demand attention, weighted evaluation intensity, implicit demand aggregation value, and negative pressure value, respectively. For the same shortlisted candidate product, standard evaluation factors with larger contribution values are grouped into the main driving factors set, while standard evaluation factors with smaller contribution values or negative values are grouped into the main limiting factors set.
[0089] After obtaining the sets of main driving factors and main limiting factors, an evaluation evidence fragment aggregation process is performed on each shortlisted candidate product. This aggregation process uses the name of the standard evaluation element as the aggregation condition, extracting evaluation evidence fragments corresponding to the sets of main driving factors and main limiting factors from the evaluation evidence fragment set corresponding to the evaluation element statistics table. During extraction, the correspondence between the evaluation evidence fragments and the comment language tag, comment time, rating mapping value, comment credibility coefficient, and comment quality score is preserved. For evaluation evidence fragments with duplicate content under the same standard evaluation element, they are merged based on text similarity; for multiple evaluation evidence fragments expressing the same implicit demand content, they are merged based on unified demand terminology. After this processing, each shortlisted candidate product corresponds to a set of main driving factors, a set of main limiting factors, and a set of evaluation evidence fragments corresponding to both.
[0090] After completing the differential attribution process, a reason for non-selection is determined for each unselected candidate product. The determination follows the order of absolute screening and relative screening. If a candidate product is excluded in the absolute screening stage, the specific indicators it fails to meet are recorded. If a candidate product meets the criteria in the absolute screening stage but fails to enter the top ranking range in the relative screening stage, its differences from adjacent selected candidate products in market demand indicators, product satisfaction indicators, product risk indicators, competitive opportunity indicators, selection stability, or result correction coefficients are recorded. Subsequently, relevant standard evaluation elements, implicit demand content, and evaluation evidence fragments are extracted from the corresponding differences to form a set of reasons for non-selection. A one-to-one correspondence is established between each unselected candidate product and its set of reasons for non-selection.
[0091] After compiling the reasons for selecting and rejecting candidate products, the final selection results are generated. This generation includes generating the final ranking of candidate products and a set of selection criteria for each product. The final ranking is output according to a revised order, with each product including its identifier, ranking position, revised selection value, and selection status. The set of selection criteria includes market demand indicators, product satisfaction indicators, product risk indicators, competitive opportunity indicators, selection stability, result correction coefficient, set of main driving factors, set of main limiting factors, set of implicit demand content, and set of evaluation evidence fragments. For selected candidate products, the selection criteria are recorded in the selection criteria set; for rejected candidate products, the set of reasons for rejection is recorded. The final ranking and selection criteria are mapped according to the product identifiers to form the final selection results.
[0092] This invention also proposes an intelligent decision-making system for cross-border e-commerce product selection based on multilingual review mining, including: a standardized review corpus construction module, a fine-grained evaluation element extraction module, a product selection decision value calculation module, a high-value vulnerability identification and correction module, and signal connections between the modules; The standardized comment corpus construction module is used to limit the comment collection objects by product identifiers and associate variant identifiers with product identifiers. It combines comment time, commenter geographic identifier, product variant attributes and source identifiers to uniformly map the original comment records under different page sources and establish the correspondence between candidate products and corresponding standardized comment corpus sets. The fine-grained evaluation element extraction module is used to segment standardized comment units into sentence segments based on the standardized comment corpus, and identify evaluation element candidates by combining term matching and sequence labeling. Then, by combining the rating mapping value, comment credibility coefficient and comment quality score retained in each comment segment, the evaluation information corresponding to the evaluation element candidates is parsed, and after cross-language semantic alignment and ambiguity resolution, it is aggregated to form a fine-grained evaluation element set corresponding to each candidate product. The product selection decision value calculation module is used to read the set of fine-grained evaluation elements by product identifier, construct an evaluation element statistics table with the standard evaluation element name as the primary key, and extract time change features and variant difference features based on the comment time distribution and product variant attribute distribution in the evaluation element statistics table. Combined with the total evaluation frequency, weighted evaluation intensity, demand attention, and frequency of implicit demand, product selection is calculated. Then, the module divides the time subset according to comment time and the variant subset according to product variant attributes. The local product selection decision value is repeatedly calculated on each time subset and each variant subset. The module is then corrected based on the dispersion and risk concentration of multiple local product selection decision values to obtain the final product selection value and determine the candidate products. The high-value vulnerability identification and correction module is used to construct a three-dimensional evaluation distribution unit set for the selected candidate products, including time interval, product variant attributes, and source region, and to establish an evaluation contribution trajectory tensor. Based on the evaluation contribution trajectory tensor, it extracts the evolutionary relationship of evaluation contribution among time interval, product variant attributes, source region, and standard evaluation elements, distinguishes between changes in internal dominant paths and changes in external diffusion paths, and performs high-value vulnerability state identification and structural track-changing state identification accordingly. It then performs segmented correction and rebound correction on the final selected product value. After that, the corrected selected product value is re-ranked, and a differentiated attribution result corresponding to the corrected ranking result is formed by combining the element contribution value.
[0093] The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.
[0094] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, in the form of a computer program product.
[0095] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and inventive constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0096] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.
[0097] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0098] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A smart decision-making method for cross-border e-commerce product selection based on multilingual review mining, characterized in that, include: The review collection targets are limited by product identifiers, and variant identifiers are associated with product identifiers. By combining review time, reviewer location identifier, product variant attributes and source identifiers, the original review records from different page sources are uniformly mapped to establish the correspondence between candidate products and corresponding standardized review corpus sets. Based on the standardized comment corpus, the standardized comment units are segmented into sentences, and candidates for evaluation elements are identified by combining term matching and sequence labeling. Then, the evaluation information corresponding to the candidates for evaluation elements is analyzed by combining the rating mapping value, comment credibility coefficient and comment quality score retained in each comment sentence segment. After cross-language semantic alignment and ambiguity resolution, the information is collected to form a set of fine-grained evaluation elements corresponding to each candidate product. The set of fine-grained evaluation elements is read according to the product identifier. An evaluation element statistics table is constructed with the standard evaluation element name as the primary key. Based on the comment time distribution and product variant attribute distribution in the evaluation element statistics table, time change characteristics and variant difference characteristics are extracted. Combined with the total evaluation frequency, weighted evaluation intensity, demand attention, and frequency of implicit demand, product selection is calculated. Then, time subsets are divided according to comment time and variant subsets are divided according to product variant attributes. Local product selection decision values are repeatedly calculated on each time subset and each variant subset. The final product selection value is obtained by correcting the dispersion and risk concentration of multiple local product selection decision values to determine the candidate products. A three-dimensional evaluation distribution unit set is constructed for the selected candidate products, including time interval, product variant attributes, and source region, and an evaluation contribution trajectory tensor is established. Based on the evaluation contribution trajectory tensor, the evolutionary relationship of evaluation contribution over time intervals, product variant attributes, source regions, and standard evaluation elements is extracted. The internal dominant path change and external diffusion path change are distinguished, and high-value vulnerable state identification and structural track change state identification are performed accordingly. The final product selection value is then subject to segmented correction and rebound correction. After that, the corrected product selection value is re-ranked, and the factor contribution value is combined to form a differentiated attribution result corresponding to the corrected ranking result.
2. The intelligent decision-making method for cross-border e-commerce product selection based on multilingual review mining as described in claim 1, characterized in that: The system limits the scope of review collection to product identifiers and associates and saves the variant identifiers corresponding to colors, sizes, sets, or specifications with product identifiers. It also extracts the review time, reviewer's geographic identifier, product variant attributes, and source identifier.
3. The intelligent decision-making method for cross-border e-commerce product selection based on multilingual review mining as described in claim 2, characterized in that: Perform unified field mapping, time parsing, and rating mapping on the original review records to incorporate review time, reviewer geographic identifier, product variant attribute, and source identifier from different page sources into a unified field structure; Subsequently, language recognition, language-specific normalization, and cross-language domain terminology mapping were performed on the comment texts. Duplicate comment identification, abnormal comment identification, and comment quality assessment were conducted within the comment set with consistent product identifiers. Abnormal comment identification calculated the comment credibility coefficient based on templated risk indicators, behavioral abnormality risk indicators, and semantic mismatch risk indicators. Comment quality assessment calculated the comment quality score based on text integrity indicators, feature word richness indicators, scene description indicators, and expression clarity indicators. The comments were then grouped by product identifiers to establish a correspondence between candidate products and corresponding standardized comment corpora.
4. The intelligent decision-making method for cross-border e-commerce product selection based on multilingual review mining as described in claim 3, characterized in that: The standardized comment units are segmented into sentences, and each comment segment retains the product identifier, comment language tag, comment time, rating mapping value, comment credibility coefficient, and comment quality score; Then, by combining domain terminology matching and sequence labeling, candidate evaluation elements are identified. The identified candidate evaluation elements are then subjected to evaluation object attribution determination, evaluation polarity determination, evaluation intensity quantification, evaluation evidence fragment extraction, and implicit demand identification. Evaluation intensity is calculated based on basic polarity intensity, degree correction amount, negation correction amount, transition correction amount, and score mapping correction amount. Implicit demand content is mapped to unified demand terms after extraction based on demand triggering mode, evaluation object category, and evaluation evidence fragment. Subsequently, cross-language semantic alignment and ambiguity resolution were performed on the candidate evaluation elements. Evaluation expressions in different languages were replaced with standard evaluation element names. The standard evaluation element names, evaluation object categories, evaluation polarity categories, evaluation intensity, evaluation evidence fragments, implicit demand content, comment time, comment credibility coefficient, and comment quality score were assembled into fine-grained evaluation records. Finally, using product identifiers as the aggregation condition and standard evaluation element names as the aggregation primary key, the frequency of positive evaluations, negative evaluations, neutral evaluations, implicit demand occurrences, and evaluation evidence fragment sets were statistically analyzed for the fine-grained evaluation records under the same candidate product. Weighted evaluation intensity was calculated based on evaluation intensity, comment credibility coefficient, and comment quality score. Demand attention was calculated based on occurrence frequency, negative evaluation ratio, and implicit demand occurrence frequency. This established a structured correspondence between candidate products and standard evaluation elements, evaluation object categories, evaluation polarity categories, evaluation intensity, implicit demand content, comment time, comment credibility coefficient, and comment quality score, forming a set of fine-grained evaluation elements for each candidate product.
5. The intelligent decision-making method for cross-border e-commerce product selection based on multilingual review mining as described in claim 4, characterized in that: The set of fine-grained evaluation elements is read according to the product identifier, and an evaluation element statistics table is constructed with the standard evaluation element name as the primary key. The evaluation element statistics table is filled with the total evaluation frequency, weighted evaluation intensity, demand attention, frequency of occurrence of implicit demand, comment time distribution, and product variant attribute distribution. Then, based on the evaluation element statistics table, market demand indicators, product risk indicators, and product selection decision values are calculated. Among them, the comment time distribution is formed by statistically analyzing the comment time of the corresponding fine-grained evaluation records in chronological order, and the product variant attribute distribution is formed by statistically analyzing the product variant attributes in the corresponding fine-grained evaluation records. The time growth trend and time surge risk are extracted based on the change amplitude of adjacent time intervals, and the variant difference risk is calculated based on the dispersion of the weighted evaluation intensity of the same standard evaluation element under different product variant attributes.
6. The intelligent decision-making method for cross-border e-commerce product selection based on multilingual review mining as described in claim 5, characterized in that: The standardized comment corpus is divided into multiple time subsets based on comment time and into multiple variant subsets based on product variant attributes. Local product selection decision values are repeatedly calculated in each time subset and each variant subset. The product selection stability is calculated based on the dispersion of multiple local product selection decision values. Then, the product selection correction value is calculated in combination with the risk concentration state. Finally, the final product selection value is calculated from the product selection decision value and the selection correction value. After that, the products are sorted according to the final product selection value, and the candidate products that meet the screening criteria and are located in the top sorting interval are determined as selected candidate products.
7. The intelligent decision-making method for cross-border e-commerce product selection based on multilingual review mining as described in claim 6, characterized in that: After the shortlisted products have obtained their final selection scores, the standardized review units are first divided into three layers according to review time, product variant attributes, and source region. This constructs a three-dimensional evaluation distribution unit set of time interval, product variant attributes, and source region. Each three-dimensional evaluation distribution unit is then associated with the standard evaluation element name, evaluation object category, evaluation polarity category, evaluation intensity, demand attention, implicit demand content, review credibility coefficient, and review quality score. Next, the local evaluation contribution value is calculated by multiplying the local demand attention, local weighted evaluation intensity, local review credibility aggregation value, and local review quality aggregation value in each three-dimensional evaluation distribution unit. Based on this, an evaluation contribution trajectory tensor is constructed, and the evaluation contribution change relationship is displayed along the four dimensions of time interval, product variant attributes, source region, and standard evaluation elements within the same candidate product. Subsequently, the evaluation contribution migration, evaluation center of gravity shift, implicit demand shift, source region diffusion, commodity variant attribute penetration, and high value retention are calculated in parallel. Among them, the evaluation contribution migration is obtained by comparing the changes in the contribution distribution of standard evaluation elements in adjacent time intervals; the evaluation center of gravity shift is obtained by assigning position codes to each evaluation object category and calculating the changes in the evaluation center of gravity coordinates in adjacent time intervals; the implicit demand shift is obtained by comparing the changes in the distribution of unified demand terms in adjacent time intervals; the source region diffusion is obtained by statistically analyzing the changes in the expansion quantity of the preset problem conditions in different source regions; the commodity variant attribute penetration is obtained by statistically analyzing the changes in the coverage quantity of the preset problem conditions in different commodity variant attributes; and the high value retention is obtained by the time series high value distribution after normalization of the local product selection decision values in each time interval.
8. The intelligent decision-making method for cross-border e-commerce product selection based on multilingual comment mining as described in claim 7, characterized in that: The aforementioned multiple variables are then coupled in pairs to construct a high-value vulnerability state judgment matrix. The high-value vulnerability is calculated by combining the high-value retention quantity. The structural track switching coefficient is then calculated by the combination relationship between the evaluation contribution migration quantity, the evaluation center of gravity shift quantity, the implicit demand shift quantity and the source area diffusion quantity, and the commodity variant attribute penetration quantity. The internal dominant path track switching and the external diffusion path are separated and represented.
9. The intelligent decision-making method for cross-border e-commerce product selection based on multilingual review mining as described in claim 8, characterized in that: The comparison result between the high vulnerability value and the preset vulnerability condition threshold is used as the first segmentation basis, and the comparison result between the structural track switching coefficient and the preset diffusion condition threshold is used as the second segmentation basis. The dynamic correction factor is calculated segment by segment, and the original final selection value is attenuated and corrected by the dynamic correction factor. Next, the recovery judgment amount is calculated based on the decrease in the amount of evaluation contribution migration, the amount of evaluation center shift, the amount of implicit demand shift, the amount of diffusion in the source area, and the amount of penetration of product variant attributes in the two most recent time intervals. The rebound adjustment coefficient is calculated in combination with the high value retention amount and the product selection stability. The rebound correction is then applied to the dynamically corrected product selection values. Finally, the product selection values are re-sorted according to the rebound correction, and the factor contribution values of the selected candidate products are calculated again. The demand attention, weighted evaluation intensity, implicit demand aggregation value, and negative pressure value are combined to divide the main driving factor set and the main limiting factor set. The corresponding evaluation evidence fragments are then collected to form a differentiated attribution result corresponding to the corrected ranking result.
10. A cross-border e-commerce product selection intelligent decision-making system based on multilingual review mining, used to implement the cross-border e-commerce product selection intelligent decision-making method based on multilingual review mining as described in any one of claims 1-9, characterized in that, include: The module includes a standardized comment corpus construction module, a fine-grained evaluation element extraction module, a product selection decision value calculation module, a high-value vulnerability identification and correction module, and signal connections between the modules. The standardized comment corpus construction module is used to limit the comment collection objects by product identifiers and associate variant identifiers with product identifiers. It combines comment time, commenter geographic identifier, product variant attributes and source identifiers to uniformly map the original comment records under different page sources and establish the correspondence between candidate products and corresponding standardized comment corpus sets. The fine-grained evaluation element extraction module is used to segment standardized comment units into sentence segments based on the standardized comment corpus, and identify evaluation element candidates by combining term matching and sequence labeling. Then, by combining the rating mapping value, comment credibility coefficient and comment quality score retained in each comment segment, the evaluation information corresponding to the evaluation element candidates is parsed, and after cross-language semantic alignment and ambiguity resolution, it is aggregated to form a fine-grained evaluation element set corresponding to each candidate product. The product selection decision value calculation module is used to read the set of fine-grained evaluation elements by product identifier, construct an evaluation element statistics table with the standard evaluation element name as the primary key, and extract time change features and variant difference features based on the comment time distribution and product variant attribute distribution in the evaluation element statistics table. Combined with the total evaluation frequency, weighted evaluation intensity, demand attention, and frequency of implicit demand, product selection is calculated. Then, the module divides the time subset according to comment time and the variant subset according to product variant attributes. The local product selection decision value is repeatedly calculated on each time subset and each variant subset. The module is then corrected based on the dispersion and risk concentration of multiple local product selection decision values to obtain the final product selection value and determine the candidate products. The high-value vulnerability identification and correction module is used to construct a three-dimensional evaluation distribution unit set for the selected candidate products, including time interval, product variant attributes, and source region, and to establish an evaluation contribution trajectory tensor. Based on the evaluation contribution trajectory tensor, it extracts the evolutionary relationship of evaluation contribution among time interval, product variant attributes, source region, and standard evaluation elements, distinguishes between changes in internal dominant paths and changes in external diffusion paths, and performs high-value vulnerability state identification and structural track-changing state identification accordingly. It then performs segmented correction and rebound correction on the final selected product value. After that, the corrected selected product value is re-ranked, and a differentiated attribution result corresponding to the corrected ranking result is formed by combining the element contribution value.