User comment analysis method and device, equipment and storage medium

By employing stratified sampling and large language model analysis, the inefficiency caused by full-data processing in user review analysis was resolved, enabling efficient generation of analysis reports.

CN122199078APending Publication Date: 2026-06-12SHENZHEN ZHIXIAN VISION SOFTWARE TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN ZHIXIAN VISION SOFTWARE TECHNOLOGY CO LTD
Filing Date
2026-02-26
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing user comment analysis methods require full processing of massive amounts of data, resulting in low analysis efficiency and a large workload for manual processing.

Method used

By obtaining the total number of comments, setting batch divisions, and combining the total number of comments with the current batch divisions for stratified sampling, a comment sample set is obtained. A comment analysis report is then generated using a pre-configured large language model, avoiding full data processing.

Benefits of technology

Significantly reduces data processing volume, improves analysis efficiency, reduces manual intervention, and enhances the output efficiency and accuracy of analysis reports.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199078A_ABST
    Figure CN122199078A_ABST
Patent Text Reader

Abstract

The application discloses a user comment analysis method and device, equipment and a storage medium. The method comprises the following steps: obtaining a user comment data set, and determining a total number of comments corresponding to the user comment data set; setting a current division batch according to the total number of comments, and performing stratified sampling on the user comment data set in combination with the total number of comments and the current division batch to obtain a comment sample set; and inputting the comment sample set into a first large language model to obtain a comment analysis report, wherein the first large language model is a preconfigured large language model used for generating a comment analysis report based on input.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data analysis technology, and in particular to a user review analysis method, apparatus, device, and storage medium. Background Technology

[0002] With e-commerce becoming the mainstream consumption channel, television product sales are increasingly moving online, resulting in a massive accumulation of post-purchase user reviews across various sales platforms. This data, as direct user feedback, contains valuable information on product performance, functional experience, quality reliability, and service satisfaction, among other things. Effective analysis of this data has become a key requirement for companies to optimize product design, improve user experience, and formulate market strategies.

[0003] Currently, for this type of unstructured text comment data, the primary approach relies on general natural language processing tools for initial sentiment analysis and keyword extraction. The results are then manually processed, summarized, and reported. However, when dealing with massive amounts of raw text comment data exported from sales platforms, this manual-driven secondary processing model requires processing all the raw text data, resulting in a significant workload and hindering the efficiency of user comment data analysis. Summary of the Invention

[0004] The main objective of this application is to provide a user review analysis method, apparatus, device, and storage medium, aiming to solve the technical problem that existing user review analysis methods require full data processing, resulting in low efficiency in analyzing user review data.

[0005] To achieve the above objectives, this application proposes a user review analysis method, which includes: Obtain the user comment dataset and determine the total number of comments corresponding to the user comment dataset; The current batch is set based on the total number of comments, and the user comment dataset is stratified and sampled by combining the total number of comments and the current batch to obtain the comment sample set; The comment sample set is input into the first large language model to obtain the comment analysis report. The first large language model is a pre-configured large language model used to generate comment analysis reports based on the input.

[0006] Furthermore, to achieve the above objectives, this application also proposes a user review analysis device, which includes: The data acquisition module is used to acquire user comment datasets and determine the total number of comments corresponding to the user comment datasets; The intelligent sampling module sets the current batch division based on the total number of comments, and performs stratified sampling on the user comment dataset by combining the total number of comments and the current batch division to obtain a comment sample set; The model analysis module is used to input the comment sample set into the first large language model to obtain a comment analysis report. The first large language model is a pre-configured large language model used to generate comment analysis reports based on the input.

[0007] In addition, to achieve the above objectives, this application also proposes a user review analysis device, which includes: a memory, a processor, and a user review analysis program stored in the memory and executable on the processor, the user review analysis program being configured to implement the steps of the user review analysis method described above.

[0008] In addition, to achieve the above objectives, this application also proposes a storage medium that is a computer-readable storage medium, on which a user review analysis program is stored, which, when executed by a processor, implements the steps of the user review analysis method described above. Attached Figure Description

[0009] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0010] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 This is a flowchart illustrating the first embodiment of the user comment analysis method of this application; Figure 2 This is a flowchart illustrating the second embodiment of the user comment analysis method of this application; Figure 3 This is a flowchart illustrating the third embodiment of the user comment analysis method of this application; Figure 4 This is a schematic diagram illustrating the entire process of the user comment analysis method used in this application; Figure 5 This is a schematic diagram of the module structure of the user comment analysis device in this application; Figure 6 This is a schematic diagram of the user review analysis device for this application.

[0012] The realization of the purpose, functional features and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0013] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.

[0014] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.

[0015] The main solution of this application embodiment is: to obtain a user comment dataset and determine the total number of comments corresponding to the user comment dataset; to set the current batch division based on the total number of comments, and to perform stratified sampling of the user comment dataset by combining the total number of comments and the current batch division to obtain a comment sample set; to input the comment sample set into the first large language model to obtain a comment analysis report, wherein the first large language model is a pre-configured large language model used to generate a comment analysis report based on the input.

[0016] While existing user review analysis methods utilize general-purpose natural language processing tools for text sentiment analysis and keyword extraction, these tools lack customized training for the semantic features of television product reviews. This limits the industry relevance and accuracy of the analysis results, necessitating further manual processing of the tool-processed user review data. Given the large volume of raw text review data on sales platforms, the existing methods require full processing of all text review data, resulting in a significant workload for manual processing and ultimately low efficiency in user review data analysis.

[0017] This application can adaptively set batch divisions based on the total number of comments and perform stratified sampling of the original data. It obtains a statistically representative comment sample set without requiring full processing of all user comment data, thus significantly reducing the data size required for large language model analysis and effectively avoiding the excessive manual workload caused by processing full data in existing methods. Simultaneously, by inputting the comment sample set into a pre-configured first large language model specifically designed for generating comment analysis reports, its customized analysis capabilities for the semantic features of television product comments directly output the comment analysis report, eliminating the need for manual reprocessing of intermediate results and thereby improving the efficiency of user comment data analysis.

[0018] It should be noted that the executing entity in this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, mainframe computer, server cluster, etc., or a system server of a user review analysis system, such as a laptop computer. The following description uses a system server of a user review analysis system (hereinafter referred to as "the system") as an example to illustrate this embodiment and the subsequent embodiments.

[0019] Based on this, this application proposes a user review analysis method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the user comment analysis method of this application. In this embodiment, the user comment analysis method includes steps S10 to S30: Step S10: Obtain the user comment dataset and determine the total number of comments corresponding to the user comment dataset.

[0020] It should be understood that a user review dataset can be a structured collection of raw data generated from original review files obtained from a sales platform. This user review dataset may include several review entries, each of which may include at least: review ID, product model, review time, user rating, and review text. The total number of review entries in the user review dataset is equal to the total number of reviews, N.

[0021] Furthermore, considering that the original review files obtained from the sales platform are unformatted data, step S10 specifically includes the following steps S101-S104 to illustrate how to convert the platform review files into a user review dataset containing structured data: Step S101: Obtain the uploaded platform review file. The platform review file is a file exported by the sales platform through the interface, containing all user reviews related to TV products on the sales platform.

[0022] It should be noted that the platform review file can contain all original review data related to the target TV product from the sales platform, and the platform review file can be a comma-separated value (CSV) file or a spreadsheet (Excel) file.

[0023] In practice, users can download the original review data from the sales platform they want to analyze in CSV / Excel format to their local machine through the application programming interface or data export function provided by the platform, and then upload it to this system through the web interface or import interface.

[0024] Step S102: Call the data parsing engine to parse the platform comment files and obtain the original user comment text set.

[0025] It should be noted that the data parsing engine can be software pre-integrated into the system to read files of different formats and convert their contents into readable text. It can have functions such as recognizing file formats (e.g., comma-separated values, spreadsheets), processing character encoding, and mapping data types. Specifically, it can be an open-source engine such as Pandas or Apache POI.

[0026] In its implementation, the system can call the integrated data parsing engine to identify the file format of the platform comment files imported into the system, and then read their binary or text content. According to the built-in format specifications, the system performs lexical and syntactic analysis on the text content, extracts semantic fields and records, obtains several original comment records, and thus constitutes the original user comment text set.

[0027] Step S103: Standardize the original user comment text set to obtain the user comment dataset.

[0028] It should be noted that since the original review records from the sales platform may contain multiple duplicate reviews or blank reviews generated at the same time, the aforementioned original user review text set can also be processed for data standardization, including data cleaning operations such as data deduplication and null value processing.

[0029] Furthermore, the system can filter or escape special characters, invisible characters, and HTML tags present in the original comment records; or convert numerical fields such as ratings in the original comment records from text to numeric types. This ensures that the final standard comment records have the same or similar structured format.

[0030] In its implementation, the system can perform the aforementioned data standardization process on the original user comment text set, thereby converting each original comment record into a standard comment record, which then constitutes a user comment dataset. This dataset can also be in CSV or Excel format. Each standard comment record in this dataset has the same field structure and data type, and there are no obvious duplicate or invalid records, making it a suitable data foundation for subsequent user comment analysis.

[0031] Furthermore, if the platform review files imported by the user come from two or more sales platforms, the system can generate user review datasets for different sales platforms based on the above steps, and generate corresponding review analysis reports based on different sales platforms in subsequent steps, thereby preventing the accuracy of report generation from being affected by differences in platform review patterns across different sales platforms.

[0032] Step S104: Determine the total number of user comments contained in the user comment dataset as the total number of comments.

[0033] It should be noted that the system can perform a counting operation on the aforementioned generated user comment dataset to count the total number of standard comment records, i.e., the number of user comments, and thus determine the total number of comments.

[0034] Step S20: Set the current batch segmentation based on the total number of comments, and perform stratified sampling on the user comment dataset by combining the total number of comments and the current batch segmentation to obtain the comment sample set.

[0035] It should be understood that the current batch K can be a batch set by the user in real time based on the current analysis needs and the total number of comments, or it can be automatically generated based on predefined mapping rules within the system (e.g., adding a batch every 10,000 comments). This embodiment does not impose any restrictions on this. Furthermore, the current batch can be the total number of user comment datasets that the system sets for this comment analysis task, dividing the user comment dataset into independently processable units.

[0036] Understandably, the system first dynamically generates the total number of sampled entries S based on the total number of comments N and the current batch K, and then determines the batch sampled entries T=S / K for each batch. Next, it divides the user comment dataset into several layers based on pre-set stratification conditions, and then determines the corresponding layer weights based on the number of comments in different layers. Finally, it performs stratified sampling in the user comment dataset based on the layer weights and the target number of sampled entries to obtain the comment sample set.

[0037] It should be noted that, considering the difference in the total number of comments, when the total number of comments is small, stratified sampling can be omitted, and the user comment dataset can be directly determined as the comment sample set. Therefore, before step S20, the process may include: comparing the total number of comments with a preset sampling trigger threshold; and determining the user comment dataset as the comment sample set when the total number of comments is less than or equal to the preset sampling trigger threshold.

[0038] Accordingly, step S20 is defined as follows: when the total number of comments is greater than the preset sampling trigger threshold, the current batch is set according to the total number of comments, and the user comment dataset is stratified and sampled by combining the total number of comments and the current batch to obtain the comment sample set.

[0039] Understandably, the preset sampling trigger threshold M is a fixed parameter pre-set within the system to limit the maximum data size that does not require sampling; for example, it can be set to M=100. The system determines whether to perform stratified sampling on the user comment dataset by comparing the total number of comments N with the preset sampling trigger threshold M.

[0040] Furthermore, the value of the aforementioned current batch K can also be set based on the size of the preset sampling trigger threshold M to ensure that the total number of comments in the final comment sample set is less than the aforementioned preset sampling trigger threshold M.

[0041] Step S30: Input the comment sample set into the first large language model to obtain the comment analysis report. The first large language model is a pre-configured large language model used to generate the comment analysis report based on the input.

[0042] It should be understood that this primary language model can be a model derived from a large, open-source language model, fine-tuned using historical comment sample sets and corresponding sample comment analysis reports. This model can be specifically designed for this comment analysis task. That is, it can accept only the comment sample set as input and output the corresponding generated comment analysis report.

[0043] Furthermore, the primary language model can also directly utilize open-source online language models. Since these models are neural networks pre-trained on massive amounts of text data and possess deep semantic understanding and complex text generation capabilities, common analytical prompts can be pre-configured based on the comment analysis task. The system can then input these prompts along with the aforementioned comment sample set into the primary language model, leveraging its generation capabilities to generate and output the corresponding comment analysis report.

[0044] This embodiment can adaptively set batch divisions based on the total number of comments and perform stratified sampling of the original data. It obtains a statistically representative comment sample set without processing all user comment data, significantly reducing the data size required for large language model analysis and effectively avoiding the excessive manual workload caused by processing full data in existing methods. Simultaneously, by inputting the comment sample set into a pre-configured first large language model specifically designed for generating comment analysis reports, its customized analysis capabilities for the semantic features of television product comments directly output the comment analysis report, eliminating the need for manual reprocessing of intermediate results and thus improving the efficiency of user comment data analysis.

[0045] Based on the first embodiment of this application, in the second embodiment of this application, the content that is the same as or similar to that in Embodiment 1 above can be referred to the above description, and will not be repeated hereafter. Based on this, please refer to... Figure 2 , Figure 2 This is a flowchart illustrating the second embodiment of the user comment analysis method of this application.

[0046] In this embodiment, to improve sampling efficiency, a large language model can be used to perform stratified sampling of the user comment dataset. Therefore, step S20 includes: steps S201~S203: Step S201: Set the current batch division based on the total number of comments, and divide the user comment dataset into several batches of comment data subsets based on the current batch division.

[0047] It should be noted that the system can use the total number of comments N as the independent variable, obtain the calculation formula through preset mapping rules, and determine a positive integer K as the current batch division.

[0048] For example, if the total number of comments is 50,000, and the mapping rule is to add a batch every 5,000 comments, then the current batch division can be 10, and each batch corresponds to a subset of comment data, and the number of comments in the subset can be 5,000.

[0049] Step S202: Obtain preset sampling prompts and send the preset sampling prompts and each subset of comment data to at least one second-largest language model to obtain the sampled data returned by each second-largest language model. The preset sampling prompts are used to guide each second-largest language model to perform stratified sampling operations on each subset of comment data according to preset category conditions.

[0050] It should be noted that the preset sampling prompts can be pre-written text commands built into the system. These commands can be written by developers during the system development phase and then embedded into the system. The preset sampling prompts can be structured natural language command text, including role settings, skills, output format specifications, constraints, etc. These preset sampling prompts can be used to temporarily configure the second language model as a dedicated model with stratified sampling capabilities.

[0051] For example, the preset sampling prompt can be configured as follows: "(1) Role setting: You are a professional data analyst who is good at sampling and analyzing input data in Excel and CSV formats to provide a reliable sample set for subsequent analysis. (2) Skills: Data sampling, based on the following steps: A. Receive a set of input data in Excel or CSV format; B. Identify the column most suitable for stratification in the input data, based on: columns containing a limited number of categories (such as ratings, sentiment tags, product categories, etc.); if there are multiple columns containing a limited number of categories, select the one with the most uneven category distribution; C. Count the number of comments in each category in the target stratification column; D. Calculate the proportion of each category in the total; E. Randomly select the corresponding number of comments from each category according to the proportion, so that the total number of batch samples is 1.<sample_size> For example: if a certain category accounts for 10%, then extract...<sample_size> 10% (rounded up, at least 1 item; F. Ensure the sampling process is random and not sequential. (3) Output format specifications: The output must be strictly divided into two parts: A. The first line is metadata, in the format: #STRATIFY_COLUMN:[the column name of the selected target hierarchical column]; B. Starting from the second line, output the sampled data, including the table header, and the total number of rows (corresponding to the number of comments, excluding metadata rows and table headers) must be exactly 10%.<sample_size> . (4) Constraints: A. Only output metadata rows and CSV data, without any other analysis, interpretation or additional text; B. The sampling process is strictly performed according to the steps defined in the above skills.

[0052] in,<sample_size> The value in the table is the batch sampling number T determined above.

[0053] It should be understood that the second major language model can also be a similar open-source language model from the internet as the first major language model mentioned above. In the specific model selection, this second major language model can be an open-source language model from the internet with superior data processing capabilities, unlike the first major language model which has superior contextual reading and generation capabilities.

[0054] It should also be noted that, in order to further improve the efficiency of stratified sampling, the system can also simultaneously call the second largest language model with the same number of segments as the current segment: the system simultaneously initiates call requests for the same model service with the same number of segments as the current segment to achieve parallel calls of multiple second largest language models.

[0055] In its implementation, the system first packages the preset sampling prompts and a subset of comment data to obtain a batch of model input data, which is then concurrently submitted to multiple second-large language models that are invoked in parallel. Each second-large language model receives a batch of model input data and performs independent stratified sampling operations on its corresponding subset of comment data based on the preset sampling prompts, outputting a sample set corresponding to that batch, which contains approximately T comments.

[0056] Step S203: Integrate the various sample sets to obtain the comment sample set.

[0057] It should be understood that each sample set contains T sampled comment records drawn from the subset of comment data in that batch. The system can then merge the sample sets returned by each of the second-largest language models to obtain the comment sample set.

[0058] Furthermore, the aforementioned stratified sampling of the user comment dataset can also be directly performed using an algorithm, and this algorithm can be consistent with the skill steps configured by the aforementioned preset sampling prompts. Therefore, step S20 further includes: steps A201~A205: Step A201: Set the current batch based on the total number of comments, and divide the user comment dataset into several batches of comment data subsets based on the current batch.

[0059] It should be noted that this step can be directly referred to as step S201 above, and will not be repeated here.

[0060] Step A202: Determine the target hierarchical column in each subset of comment data. The target hierarchical column is a sequence of numbers in each subset of comment data that meets the preset category conditions. Each sequence includes comments of several categories.

[0061] It should be understood that the system can use a single subset of comment data as a processing unit, traversing all data columns within that subset, performing feature detection on each column, and determining whether it meets preset category conditions. These preset category conditions can be a pre-set set of standards used by the system to determine whether a column is suitable as a stratification criterion, and may include at least: the column's data type is discrete (non-continuous numerical), the column has a limited number of value types (e.g., ratings from 1 to 5 stars, a total of 5 categories), and the column's missing value ratio is below a preset tolerance threshold.

[0062] It should be noted that each column contains different discrete values, and each value can represent a category. Comment records belonging to different categories can be differentiated during subsequent sampling.

[0063] Step A203: Obtain the number of comments corresponding to different categories in the target hierarchical column, and determine the proportion of each category based on the number of comments.

[0064] It should be understood that the system uses the target hierarchical column as the grouping key to divide the subset of comment data into different categories, determines the number of comment records in each category, and then calculates the proportion of each category based on the number of comments in each category / the total number of the subset of comment data.

[0065] For example, if the target hierarchy is listed as "rating", and the rating categories include 5 types from 1 to 5 stars, with a subset of 500 comments, then the system can determine the number of comments under each category from 1 to 5 stars. For example, 1 star comments are 10, 2 star comments are 50, ..., 5 star comments are 200, etc.; and then calculate the percentage of 1 star as 0.02; the percentage of 2 star as 0.1, ..., and the percentage of 5 star as 0.4.

[0066] Step A204: In each subset of comment data, extract comments belonging to each category according to their respective proportions to obtain the sample set corresponding to each subset of comment data.

[0067] It should be understood that, in the subset of comment data, based on the previously determined batch sampling number T, the number of comments sampled for each category (rounded up, at least one) can be calculated. Then, based on the corresponding number, random sampling is performed in each category, and the sampled comment records of each category are integrated to obtain the sample set corresponding to the subset of comment data.

[0068] It should also be noted that, because the rounding up was used when calculating the number of category samples, the number of comment records in the final sample set may be slightly higher than T.

[0069] Step A205: Integrate the various sample sets to obtain the comment sample set.

[0070] It should be noted that since each sample set contains approximately T sampled comment records drawn from the subset of comment data in that batch, the system can merge the sample sets returned by each of the second-largest language models to obtain the comment sample set.

[0071] Furthermore, since the above stratified sampling process involves random sampling in each column after determining the proportion of each category, and different comment records have different information densities, reflecting different quality levels, to ensure that high-quality comments are prioritized for inclusion in the sampling set / comment sample set, the following steps can be included after the aforementioned steps S201 or A201: importing each batch of comment data subsets into a preset comment quality assessment model to obtain a quality score for each batch of data subsets, and identifying high-quality comment records in each batch of data subsets whose effective word count exceeds a preset word count threshold.

[0072] The preset comment quality assessment model can be a multi-head output model trained in advance based on a subset of historical comment data, subset quality scores, and high-quality sample comment records to achieve comment quality assessment. Its two outputs can predict the subset quality score and identify the set of high-quality comment records, respectively. The preset comment quality assessment model can also be constructed by inputting preset quality analysis words into an open-source large language model. The preset quality analysis words can be configured with two skills: subset quality score and high-quality comment record identification. High-quality comment records are those with an effective word count higher than a preset word count threshold or those with images.

[0073] Accordingly, the system can assign corresponding subset weights w to each batch of comment data subsets in descending order of subset quality scores. Then, by multiplying the total number of samples S by the weight w of each subset weight, a differentiated batch sample number T is obtained. This ensures that the final comment sample set contains a greater number of sampled comment records from the subsets with higher subset quality scores.

[0074] Meanwhile, in each subset of comment data, the system can also assign higher weight values ​​to the high-quality comment records identified above, so that when randomly sampling in each category, high-quality comment records can be given priority to enter the sampling set.

[0075] This embodiment performs stratified sampling through two parallel implementation paths. It allows for flexible selection of an adaptation scheme based on cost constraints, processing speed requirements, or model service availability in the actual deployment environment. While ensuring sample representativeness, it effectively reduces the data processing load and time overhead of the overall analysis process: First, preset sampling prompts and subsets of comment data are sent to the second large language model, which then performs stratified sampling according to preset category conditions and returns the sampled sample sets. Second, target stratification columns are determined within the comment data subsets, the number and proportion of comments in different categories are obtained, and comments from each category are extracted according to their proportions to obtain sampled sample sets. Finally, these sampled sample sets are integrated to obtain the comment sample set. Therefore, when dealing with massive amounts of user comment data, it is unnecessary to process the entire original set. Instead, by dividing the data into batches and using stratified sampling, a controllable and statistically representative comment sample set is obtained, significantly reducing the data processing volume in the subsequent deep analysis stage of the large language model.

[0076] Based on the first and second embodiments of this application, in the third embodiment of this application, the content that is the same as or similar to the first and second embodiments described above can be referred to the above description, and will not be repeated hereafter. Based on this, please refer to... Figure 3 , Figure 3 This is a flowchart illustrating the third embodiment of the user comment analysis method of this application.

[0077] In this embodiment, to specifically illustrate how to generate a comment analysis report using preset prompt words and the first large language model of the open-source network language model, step S30 specifically includes: steps S301~S302: Step S301: Obtain analysis prompts. The analysis prompts are used to guide the first language model to generate a global analysis report, which includes at least the following: report analysis dimension requirements, report structure requirements, and report format requirements.

[0078] It should be noted that the analysis prompts can also be pre-written text commands built into the system. These commands can be written by developers during the system development phase and then embedded into the system. The analysis prompts can be structured natural language command text, including report analysis dimension requirements, report structure requirements, and report format requirements. These analysis prompts can be used to temporarily configure the general second-largest language model into a dedicated model with in-depth analysis capabilities for television product reviews.

[0079] It should also be noted that the analysis prompts may also include: role setting, visualization requirements, etc.

[0080] For example, the analysis prompt can be configured as follows: "(1) Role setting: You are a senior data analyst and visualization expert who focuses on the consumer electronics field, especially TV products. You are able to convert representative TV product user review sample data that has been randomly sampled in a stratified manner into a beautiful and professional HTML analysis report that focuses on the performance and user experience of TV products; (2) Report analysis dimension requirements: A. Picture quality performance: Analyze users' evaluations of color, brightness, contrast, HDR effect, motion compensation, etc.; B. Sound effect experience: Analyze users' feedback on sound quality, volume, surround sound effect, built-in speakers, etc.; C. System and smart functions: Analyze users' evaluations of operating system smoothness, boot speed, interface design, application ecosystem, screen casting function, boot advertisements, etc.; D. Hardware and design: Analyze users' views on appearance design, screen bezel, stand, number and layout of interfaces, ease of use of remote control, etc.; E. Price and cost performance: Combine user ratings to analyze price satisfaction and product cost performance perception; F. Service quality: Analyze user feedback on the performance of TV products. (3) Visualization requirements: A. Use the Chart.js library to create interactive charts; B. Select appropriate charts for different dimensions (e.g., use pie charts for sentiment distribution, radar charts or bar charts for each dimension rating, and bar charts for problem frequency); C. Use professional and coordinated color schemes to ensure that all charts have clear titles and labels. (4) Report structure requirements: A. Executive summary: Summarize the core findings, main advantages and key improvements in the core dimensions, and indicate "This analysis is based on a statistically representative sample of user reviews"; B. Overview of key indicators: Display overall satisfaction, negative review rate, positive review rate of core dimensions, etc. in the form of a dashboard; C. Detailed dimension analysis: Display data charts and in-depth insights according to the above 6 core dimensions; D. Conclusions and action recommendations: Based on the analysis results, propose specific and actionable recommendations for product improvement, marketing and after-sales service. (5) Report format requirements: A. HTML report format: Use modern and responsive design (integrated Bootstrap) 5. CSS), The report title should be "In-depth Analysis Report on User Reviews of TV Products," and the analysis results of each of the above dimensions should be presented in sections; B. Output format requirements: Generate complete, stand-alone HTML code, which must embed: all necessary CSS styles (Bootstrap 5 CDN link), Chart.js library reference, Chart.js library reference, and professional analysis text and business insights for TV products.

[0081] It should be understood that the primary language model can be an open-source language model from the network that has better context reading and generation capabilities.

[0082] Step S302: Input the analysis prompts and comment sample set into the first language model to obtain a global analysis report, and use the global analysis report as the comment analysis report.

[0083] It should be noted that the system can directly call upon built-in analysis prompts and input these prompts along with the aforementioned comment sample set into the first language model. Based on this comment sample set, the first language model generates a global analysis report according to the analysis prompts. This global analysis report can be a complete Hypertext Markup Language (HTML) document, including all elements such as style definitions, chart library references, data visualization scripts, and analysis conclusion text, and can be directly rendered into a professional report with rich graphics and text in a browser.

[0084] In its implementation, after the first major language model outputs a global analysis report, the system can directly upload it as a comment analysis report to the associated cloud storage service (such as AWS S3) and obtain a publicly accessible Uniform Resource Locator (URL). Then, the system's interactive interface returns the URL link of the final global analysis report to the user. Clicking this URL link allows the user to view the global analysis report through the associated webpage.

[0085] Furthermore, in order to generate more personalized analysis reports based on different user analysis needs, after the first language model generates the global analysis report, it can be personalized based on the current user need prompts. Therefore, the step of using the global analysis report as the comment analysis report can also include: obtaining the current user need prompts, which are used to guide the first language model to update the generated global analysis report to obtain a demand analysis report; inputting the current user need prompts into the first language model to obtain the demand analysis report, and using the demand analysis report as the comment analysis report.

[0086] It should be noted that the current user need prompt can be natural language text freely written by the user, expressing the user's specific focus on the analysis report (personal focus dimension, specific time range) or personalized preferences (such as output tone, personal visualization preferences). For example, the current user need prompt could be: "Analyze the main complaints and positive reviews of users regarding the price of TV model A in the past month, and output them in a pie chart."

[0087] It should be understood that the current user demand prompts can be entered by the user through the interactive interface provided by the system at the same time as the user uploads the platform comment file to the system. Furthermore, in the comment analysis task, the system can input the current user demand prompts into the first large language model simultaneously with the aforementioned input of the analysis prompts and comment sample set into the first large language model, or it can input them into the first large language model after generating the aforementioned global analysis report. This embodiment does not impose any limitations on this.

[0088] In its implementation, after generating a global analysis report based on the analysis prompts, the first language model can further personalize the global analysis report based on the current user needs prompts: performing modifications such as adding or deleting content, adjusting the structure, and re-highlighting key points, thereby obtaining a needs analysis report focused on the user's personalized needs; this needs analysis report can then be used as the final comment analysis report displayed to the user.

[0089] This embodiment, after obtaining a sample set of comments, guides the primary language model through analysis prompts to complete the end-to-end generation of a complete report document from data input in one go, without requiring manual intervention in data interpretation, chart creation, or report writing. This significantly improves the efficiency and standardization of the analysis report output. Furthermore, it allows users to update the generated global analysis report in real time using natural language commands, enabling the report content to dynamically focus on current business priorities without re-executing the entire sample analysis process. This maintains high efficiency while achieving flexible expansion from standardized to customized analysis, which is beneficial for meeting diverse information needs in different decision-making scenarios.

[0090] Furthermore, you can also refer to this section. Figure 4 This document explains the entire process of the user comment analysis method used in this application. Figure 4 This is a schematic diagram illustrating the entire process of the user comment analysis method used in this application.

[0091] exist Figure 4 In this paper, the comment analysis task can be divided into two main stages and other auxiliary steps. The two main stages include: Stage 1: Large model-driven batch intelligent sampling; Stage 2: Large model-driven in-depth report analysis.

[0092] Depend on Figure 4 As can be seen, firstly, users who need comment analysis can upload a platform comment file in CSV / Excel format through the interactive interface provided by the system, and fill in personalized prompts for their current user needs.

[0093] After receiving the platform's comment file, the system can call the data parsing engine to parse it and perform data normalization processing (duplicate removal, null value handling, etc.) to obtain structured data, namely the user comment dataset, which includes several standard comment records.

[0094] The system performs a counting operation on the user comment dataset to determine the total number of standard comment records as the total number of comments N. Then, it compares the total number of comments N with a preset sampling trigger threshold M. Figure 4 The value is set to 1000 for comparison to determine whether stratified sampling should be performed on the user review dataset.

[0095] If N > M, then stratified sampling is required, which involves the steps in Phase 1: The system divides the user comment dataset into several batches of comment data subsets based on the current batch division settings; then, it sets the batch sampling count T for each batch of comment data subsets; it calls the corresponding number of second-large language models based on the current batch size K; it inputs preset sampling prompts for each batch of comment data subsets into each large language model; guided by the preset sampling prompts, the second-large language model performs stratified sampling on each comment data subset according to preset category conditions; finally, it outputs the sampled sets for each batch. Integrating the sampled sets from each batch yields the comment sample set.

[0096] If N If M is selected, then it is determined that no stratified sampling operation is required, and the user comment dataset can be directly used as the comment sample set.

[0097] Among them, the number of comment records in the comment sample set is no greater than the aforementioned preset sampling trigger threshold M.

[0098] After determining the comment sample set, the first language model can be invoked, which is the second step in the execution phase: injecting analysis prompts, the aforementioned current user demand prompts, and the comment sample set into the first language model; the first language model generates a global analysis report based on the analysis prompts, and updates and optimizes the global analysis report based on the current demand prompts to obtain an HTML format comment analysis report.

[0099] Finally, the system can upload the HTML-formatted comment analysis report to a cloud storage service to convert it into an accessible URL; then, the system's interactive interface will return this URL to the user so that the user can click to view the comment analysis report.

[0100] The method proposed in this application does not require full processing of the original comment data. It significantly reduces the amount of data entering the deep analysis stage through adaptive sampling decision-making. At the same time, it guides the large language model to complete sample extraction, report generation, and requirement updates end-to-end through prompt words. It eliminates the need for manual intervention in data cleaning, report writing, and visualization, effectively reducing the degree of manual intervention and time cost in the analysis process, and improving the efficiency of user comment analysis and the convenience of report delivery.

[0101] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the user comment analysis method of this application. Any simple modifications based on this technical concept are within the protection scope of this application.

[0102] This application also provides a user review analysis device; please refer to... Figure 5 , Figure 5 This is a schematic diagram of the module structure of the user comment analysis device of this application. The user comment analysis device includes: The data acquisition module 501 is used to acquire the user comment dataset and determine the total number of comments corresponding to the user comment dataset; The intelligent sampling module 502 sets the current batch division based on the total number of comments, and performs stratified sampling on the user comment dataset by combining the total number of comments and the current batch division to obtain a comment sample set; The model analysis module 503 is used to input the comment sample set into the first large language model to obtain a comment analysis report. The first large language model is a pre-configured large language model used to generate a comment analysis report based on the input.

[0103] The user review analysis device provided in this application, employing the user review analysis method described in the above embodiments, can solve the technical problem of low efficiency in analyzing user review data caused by existing user review analysis methods. Compared with the prior art, the beneficial effects of the user review analysis device provided in this application are the same as those of the user review analysis method provided in the above embodiments, and other technical features in the user review analysis device are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.

[0104] This application also provides a user review analysis device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the user review analysis method in Embodiment 1 above.

[0105] The following is for reference. Figure 6 , Figure 6This is a schematic diagram of the user review analysis device of this application. The user review analysis device in the embodiments of this application may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (PADs), portable media players (PMPs), and fixed terminals such as digital TVs and desktop computers. Figure 6 The user review analysis device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0106] like Figure 6 As shown, the user review analysis device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.) that can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 1002 or a program loaded from storage device 1003 into random access memory (RAM) 1004. The RAM 1004 also stores various programs and data required for the operation of the user review analysis device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via a bus 1005. An input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to the I / O interface 1006: input devices 1007 including, for example, touchscreens, touchpads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; output devices 1008 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 1003 including, for example, magnetic tapes, hard disks, etc.; and communication devices 1009. Communication device 1009 allows the user review analysis device to communicate wirelessly or wiredly with other devices to exchange data. While the figure shows user review analysis devices with various systems, it should be understood that implementing or having all of the systems shown is not required. More or fewer systems may be implemented alternatively.

[0107] The user review analysis device provided in this application, employing the user review analysis method described in the above embodiments, can solve the technical problems of user review analysis. Compared with the prior art, the beneficial effects of the user review analysis device provided in this application are the same as those of the user review analysis method described in the above embodiments, and other technical features of the user review analysis device are the same as those disclosed in the previous embodiment method, and will not be repeated here.

[0108] This application also provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the user comment analysis method in the above embodiments.

[0109] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, radio frequency (RF), etc., or any suitable combination thereof.

[0110] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described user review analysis method, and is capable of solving the technical problems of the user review analysis method. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as the beneficial effects of the user review analysis method provided in the above embodiments, and will not be repeated here.

[0111] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or system. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other elements in the process, method, article, or system that includes that element.

[0112] The above embodiment numbers are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments. They are only some embodiments of this application and do not limit the scope of this application. All equivalent structural transformations made under the technical concept of this application and using the content of this application specification and drawings, or direct / indirect applications in other related technical fields, are included within the protection scope of this application.

Claims

1. A user review analysis method, characterized in that, The method includes: Obtain the user comment dataset and determine the total number of comments corresponding to the user comment dataset; The current batch is set based on the total number of comments, and the user comment dataset is stratified and sampled by combining the total number of comments and the current batch to obtain a comment sample set; The comment sample set is input into the first large language model to obtain a comment analysis report. The first large language model is a pre-configured large language model used to generate comment analysis reports based on the input.

2. The method as described in claim 1, characterized in that, The step of setting the current batch segment based on the total number of comments, and performing stratified sampling on the user comment dataset by combining the total number of comments and the current batch segment to obtain a comment sample set includes: The current batch is set based on the total number of comments, and the user comment dataset is divided into several batches of comment data subsets based on the current batch. Obtain preset sampling prompt words and send the preset sampling prompt words and each of the comment data subsets to at least one second largest language model to obtain the sampled sample sets returned by each second largest language model. The preset sampling prompt words are used to guide each second largest language model to perform stratified sampling operations on each of the comment data subsets according to preset category conditions. By integrating the various sample sets, a comment sample set is obtained.

3. The method as described in claim 1, characterized in that, The step of setting the current batch based on the total number of comments, and performing stratified sampling on the user comment dataset by combining the total number of comments and the current batch to obtain a comment sample set, further includes: The current batch is set based on the total number of comments, and the user comment dataset is divided into several batches of comment data subsets based on the current batch. In each of the aforementioned comment data subsets, a target hierarchical column is determined. The target hierarchical column is a sequence of numbers in each of the aforementioned comment data subsets that satisfies a preset category condition. Each sequence includes comments of several categories. Obtain the number of comments corresponding to different categories in the target hierarchical column, and determine the proportion of each category based on the number of comments. In each of the aforementioned comment data subsets, comments belonging to each of the aforementioned categories are extracted according to the aforementioned proportions to obtain the sample set corresponding to each of the aforementioned comment data subsets; By integrating the various sample sets, a comment sample set is obtained.

4. The method as described in claim 1, characterized in that, The step of inputting the comment sample set into the first large language model to obtain a comment analysis report further includes: Obtain analysis prompts, which are used to guide the first large language model to generate a global analysis report, including at least: report analysis dimension requirements, report structure requirements, and report format requirements; The analysis prompts and the comment sample set are input into the first large language model to obtain a global analysis report, which is then used as the comment analysis report.

5. The method as described in claim 4, characterized in that, The step of using the global analysis report as a comment analysis report includes: Obtain the current user demand prompt, which is used to guide the first large language model to update the generated global analysis report to obtain a demand analysis report; The current user demand prompts are input into the first language model to obtain a demand analysis report, which is then used as a comment analysis report.

6. The method as described in claim 1, characterized in that, The step of obtaining the user comment dataset and determining the total number of comments corresponding to the user comment dataset includes: Obtain the uploaded platform review file, which is a file exported by the sales platform through an interface and contains all user reviews related to the TV product on the sales platform; The data parsing engine is invoked to parse the platform's comment files, obtaining the original set of user comment texts; The original user comment text set is standardized to obtain a user comment dataset; The number of user comments contained in the user comment dataset is determined as the total number of comments.

7. The method as described in claim 1, characterized in that, Before the step of setting the current batch division based on the total number of comments, and performing stratified sampling of the user comment dataset by combining the total number of comments and the current batch division to obtain the comment sample set, the method further includes: The total number of comments is compared with a preset sampling trigger threshold; When the total number of comments is less than or equal to the preset sampling trigger threshold, the user comment dataset is determined as the comment sample set; Accordingly, the step of setting the current batch division based on the total number of comments, and performing stratified sampling of the user comment dataset by combining the total number of comments and the current batch division to obtain a comment sample set includes: When the total number of comments exceeds the preset sampling trigger threshold, the current batch is set according to the total number of comments, and the user comment dataset is stratified and sampled by combining the total number of comments and the current batch to obtain a comment sample set.

8. A user review analysis device, characterized in that, The device includes: The data acquisition module is used to acquire user comment datasets and determine the total number of comments corresponding to the user comment datasets; The intelligent sampling module sets the current batch division based on the total number of comments, and performs stratified sampling on the user comment dataset by combining the total number of comments and the current batch division to obtain a comment sample set; The model analysis module is used to input the comment sample set into the first large language model to obtain a comment analysis report. The first large language model is a pre-configured large language model used to generate a comment analysis report based on the input.

9. A user review analysis device, characterized in that, The device includes: a memory, a processor, and a user review analysis program stored in the memory and executable on the processor, the user review analysis program being configured to implement the steps of the user review analysis method as described in any one of claims 1 to 7.

10. A storage medium, characterized in that, The storage medium is a computer-readable storage medium, and the storage medium stores a user review analysis program, which, when executed by a processor, implements the steps of the user review analysis method as described in any one of claims 1 to 7.