A public-oriented landscape perception quantification comment intelligent measurement method, system, electronic device and storage medium
By standardizing, identifying, and screening high-value samples from massive amounts of landscape comments, a multi-level perception quantification model is constructed. This solves the problems of insufficient quantification granularity, lack of stratification of comment quality, and insufficient closed-loop optimization in existing technologies, and achieves efficient and stable quantification of public landscape perception and management decision support.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING FORESTRY UNIVERSITY
- Filing Date
- 2026-04-20
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies struggle to achieve unified, stable, and quantifiable public landscape perception across massive amounts of landscape reviews. Furthermore, they suffer from issues such as a lack of stratified review quality, inefficient utilization of key samples, and a lack of closed-loop optimization mechanisms.
By standardizing massive amounts of comments, identifying and stratifying comment quality, automatically selecting high-value samples and performing targeted supplementation, a multi-level perception quantification model is constructed. This model is then validated by freezing an external test set, ultimately resulting in a public landscape perception quantification result applicable to scenic spots and regions.
It achieves efficient and stable quantification of massive amounts of comments, improves the ranking consistency and scoring stability of the model in key boundary intervals, reduces the need for manual intervention, and provides quantitative basis suitable for landscape evaluation and management decision-making.
Smart Images

Figure CN122309749A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of natural language processing, machine learning, public perception computing, and landscape evaluation. In particular, it relates to a method and system for automating, hierarchically filtering, targeted supplementation, and multi-level perception quantification of massive landscape comment data. It is applicable to the measurement of public perception, satisfaction analysis, and management decision support for urban parks, tourist attractions, urban green spaces, natural landscapes, and related public spaces. Technical Background Although various patented technologies and algorithms have been proposed in recent years for sentiment analysis of comments, prediction of comment ratings, and quantification of public subjective sentiment, existing technologies still have the following obvious shortcomings in terms of quantification granularity, comment quality control, efficiency of key sample utilization, and model closed-loop optimization capabilities when considering the application goal of quantifying public perception of massive landscape comments: 1. Insufficient quantification granularity makes it difficult to meet the public's demand for refined measurement of landscape perception. Currently, most mainstream related patent solutions focus on sentiment identification, sentiment classification, or recommendation rating prediction. For example, CN111353044B mainly uses sentiment dictionaries, rules, and classification logic to analyze the sentiment of comments; CN113591487A mainly focuses on sentence segmentation, word segmentation, sentiment identification, cluster analysis, and co-word matrix correction to form cognitive feedback results for tourist attraction reviews; CN105701229A combines comment sentiment analysis with collaborative filtering for rating prediction in recommendation systems. Although the above solutions can extract comment sentiment or estimate ratings to some extent, they still generally focus on "positive / negative judgment" or "user-item rating prediction," and have not yet established a multi-level, ordered, and aggregateable quantitative mechanism of 1 to 7 levels for landscape review texts, making it difficult to meet the actual needs of the public for refined and continuous measurement of landscape perception.
[0002] 2. The lack of a quality stratification mechanism for comments means that low-information and weakly relevant comments can easily interfere with the quantitative results. Existing patented technologies typically input comment text directly into sentiment analysis models or rating prediction models, rarely establishing independent identification and stratification mechanisms for differences in comment quality. However, in landscape commentary scenarios, there are a large number of "low-information short comments," "weakly relevant comments," and "invalid comments," such as those containing only simple attitude expressions, only details about ticket purchases or services, or even purely symbolic or obviously off-topic content. Without pre-identification and stratification of these comments, low-quality text can easily be mixed with high-information text in the training and prediction process, thus affecting the model's learning effect and quantitative stability regarding public landscape perception. While existing technologies such as CN111353044B and CN113591487A involve comment data acquisition, preprocessing, and sentiment classification, their focus remains on sentiment analysis itself, without disclosing specific identification, triage, and weight control mechanisms for comment quality levels.
[0003] 3. Lacks automatic screening capability for key boundary samples, resulting in low supplementation efficiency and difficulty in targeted optimization of weak areas of the model. For multi-level ordered quantization tasks, the most easily confused samples by the model are usually those near adjacent rating boundary regions, such as reviews near score boundaries like 4 / 5, 5 / 6, and 6 / 7. Most existing patent technologies do not disclose mechanisms for automatically ranking unlabeled reviews and selecting high-value samples based on model uncertainty, boundary proximity, and sample scarcity, nor do they establish a sample optimization path of "automatic selection—targeted labeling—retraining." For example, CN105701229A focuses on rating prediction within a collaborative filtering framework, CN111353044B focuses on dictionary and rule-driven sentiment classification, and CN113591487A focuses on sentiment analysis and cognitive feedback for travel reviews. None of these solutions specifically strengthen the key boundary samples that the model is most easily confused with, thus making it difficult to significantly improve the model's ability to distinguish adjacent score intervals with limited labeling costs.
[0004] 4. The lack of a complete closed-loop optimization system for quantifying public perception across massive amounts of comments makes it difficult to balance ranking consistency, rating stability, and large-scale application capabilities. While existing patents, such as CN114565300A, propose the concept of "quantifying public subjective emotions," their application focuses on urban images, and their technical approach revolves around image encoders and image sentiment index models, making them unsuitable for comment text scenarios. CN117079124A emphasizes the quantification and enhancement of urban and rural landscape imagery, primarily relying on image and scene classification and spatial cognitive analysis paths, and also does not construct a public perception quantification model for massive amounts of comment text. Overall, existing technologies have not yet formed a complete technical chain encompassing "comment quality identification—massive comment stratification—automatic mining of high-value samples—targeted labeling—model iterative training—freezing external testing and verification—full comment quantification output." Therefore, in scenarios involving massive amounts of landscape comments, existing solutions still struggle to simultaneously achieve a high degree of precision in public perception quantification, consistent ranking, stable scoring, and engineering application capabilities.
[0005] Therefore, there is an urgent need to propose a new intelligent measurement method for public landscape perception quantification to address the problems of insufficient quantification granularity, lack of stratification of comment quality, low efficiency of key sample utilization, and lack of closed-loop optimization mechanism in existing technologies, thereby achieving automated, refined, and aggregateable quantitative representation of massive landscape comment texts. Summary of the Invention
[0006] The technical problem to be solved by the present invention This invention aims to address the challenge of achieving unified, stable, and aggregated quantification of public landscape perception from massive amounts of landscape reviews in existing technologies. It also solves problems such as the difficulty in effectively distinguishing between high-information, low-information, and weakly correlated reviews, the inefficiency of supplementing key boundary samples, and the lack of closed-loop optimization and automated output mechanisms in massive review scenarios. The invention proposes an intelligent review measurement method and system for quantifying public landscape perception. This method can standardize massive amounts of reviews, identify review quality, automatically screen high-value samples, perform targeted supplementation, iteratively train the model, and output full quantification, thereby generating public landscape perception quantification results applicable to scenic spots, regions, and other research subjects.
[0007] Technical solution of the present invention To address the aforementioned technical problems, this invention proposes an intelligent measurement method for public landscape perception quantification. This method is applicable to the unified multi-level perception quantification calculation and structured result output of massive amounts of landscape comments, and includes the following steps: S1: Original Comment Acquisition and Standardization Processing The system acquires comment data to be analyzed stored in a predetermined directory, database, or online platform. It then performs format identification and filtering on the comment data, and executes field standardization, null value removal, duplicate sample removal, and text standardization to obtain standardized comment data. Preferably, the comment content and location information are integrated, and the input text is uniformly constructed into a standardized text expression form that includes location context and the comment body, thereby enhancing the model's ability to recognize the correspondence between comment semantics and landscape objects.
[0008] S2: Initial Quantization Sample Construction and Soft Label Aggregation A sample of comments is extracted from the standardized comment data to construct an initial public landscape perception annotation sample set, and each comment is assigned a preset level of public landscape perception quantitative label. Preferably, the quantitative label is an ordered rating of 1 to 7 levels, where a lower score indicates a lower perception evaluation and a higher score indicates a higher perception evaluation.
[0009] Duplicate samples with identical or equivalent comment content and location information in the initial public landscape perception annotation sample set are aggregated. For samples with multiple quantization results, in addition to determining the main label, a soft label distribution of the sample at each quantization level is further constructed to describe the quantization uncertainty of the sample in the boundary interval. Preferably, the corresponding probability distribution vector is generated by statistically normalizing the multiple quantization results for subsequent model training.
[0010] S3: Comment Quality Identification and Massive Comment Layering A comment quality identification model is constructed to classify comments based on their quality. Preferably, comment quality is divided into at least the following categories: invalid or weakly relevant comments, low-information but quantifiable comments, and high-information valid comments. Invalid or weakly relevant comments represent purely symbolic, garbled, obviously off-topic, or weakly relevant content such as ticketing, customer service, or accommodation. Low-information but quantifiable comments represent comments with limited information but still reflecting basic attitudes. High-information valid comments represent comments containing relatively complete descriptions of the landscape experience and capable of consistently supporting quantitative judgments about public landscape perception. The comment quality identification model is applied to a large amount of unlabeled comment data, outputting the comment quality category and corresponding category probability for each comment to obtain a stratified comment quality result. Preferably, based on the comment quality category and its predicted confidence level, quality training weights are further generated for each comment to be used for sample weighting control in the subsequent training of the public landscape perception quantification model.
[0011] S4: Automatic screening and relabeling of high-value samples Based on the prediction results of the current public landscape perception quantification model for unlabeled comments, and combined with the comment quality stratification results, high-value samples are automatically screened from a massive number of comments. The high-value sample screening is based on at least one or more of the following factors: model uncertainty, proximity of quantization boundaries, comment quality category, and scarcity of low-scoring samples. Preferably, model uncertainty can be characterized by the entropy of the predicted probability distribution, the difference between the first and second highest probabilities, or the distribution dispersion; proximity of quantization boundaries can be characterized by the distance between the continuous predicted scores of comments and preset key boundaries.
[0012] Based on the screening results, a high-value supplementary label sample set is formed. Preferably, the high-value supplementary label sample set can be further divided into at least one or more of the following categories: high-score boundary high-uncertainty samples in high-information comments, medium-to-high score boundary high-uncertainty samples in high-information comments, low-score key samples, and low-information but highly hesitant model samples. Subsequently, the high-value supplementary label sample set is further quantified and labeled with public landscape perception to form a newly labeled sample set.
[0013] S5: Iterative Training and Final Model Determination of Quantization Model The newly added labeled sample set is incorporated into the initial public landscape perception labeled sample set. Repeated sample aggregation, soft label construction, comment quality prediction, and training weight calculation are then re-executed to form an updated training sample set. The public landscape perception quantification model is iteratively trained based on this updated training sample set to obtain a candidate quantification model. Preferably, the comprehensive weight of the training samples includes at least comment quality weight, repeated sample correction weight, and high-value supplementary label enhancement weight. Further, a frozen external test set is constructed to independently evaluate the candidate quantification model, and the official public landscape perception quantification model is determined based on the evaluation results.
[0014] S6: Automatic Quantization Output and Aggregation Analysis of Massive Comments The formal public landscape perception quantification model is used to perform batch predictions on massive amounts of comment data, outputting the integer perception level, continuous perception value, and probability distribution of each level for each comment. Preferably, based on the above output results, the data is aggregated according to scenic spots, regions, routes, image objects, or other preset research objects to form public landscape perception quantification indicators for subsequent landscape evaluation, public preference analysis, planning management, and decision support.
[0015] Based on the above method, the present invention further proposes an intelligent measurement system for public landscape perception quantification, used to execute the method, the system comprising: 1. Comment preprocessing module, used to read massive amounts of comments, identify fields, remove null values, remove duplicate samples, and standardize text. 2. Initial annotation set construction and soft label aggregation module, used to construct a training sample set with multi-level public landscape perception quantitative labels, and aggregate the multiple quantification results of repeated samples to generate the corresponding soft label distribution; 3. The comment quality identification and stratification module is used to classify comments by quality and output the comment quality category, probability information, and quality weight; 4. High-value sample mining and labeling module, which is used to automatically filter high-value labeling samples from a large number of comments based on model uncertainty, boundary proximity, comment quality category and scarcity of low-scoring samples, and supports subsequent re-quantification and labeling; 5. The model training and evaluation module is used to merge the newly added supplementary samples with the original training samples, iteratively train the public landscape perception quantification model, and determine the formal model based on the frozen external test set. 6. Quantitative output and aggregation analysis module, used to output integer levels, continuous scores and probability distributions for massive amounts of reviews using formal models, and supports aggregation analysis for scenic spots, regions or other research objects; 7. An optional human-computer interaction module is used to receive input paths, output paths, sample screening parameters, quantization level parameters and other control parameters, and coordinate the execution flow of the above modules.
[0016] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it is configured to implement the steps of the aforementioned intelligent measurement method for quantifying public landscape perception. The present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the aforementioned method.
[0017] Beneficial effects Compared with the prior art, the present invention has the following beneficial effects: 1. By uniformly reading, standardizing, and automatically quantifying massive amounts of comment data, the public perception measurement of large-scale landscape comments can be achieved, significantly improving processing efficiency and reducing the need for manual intervention; 2. By constructing a comment quality identification mechanism, comments are divided into invalid or weakly relevant comments, low-information but quantifiable comments, and high-information valid comments, thereby reducing the interference of low-quality comments on the training and prediction of the quantization model and improving the stability of the quantization results; 3. By using an automatic high-value sample screening mechanism based on model uncertainty, boundary proximity, comment quality category, and scarcity of low-scoring samples, newly added supplementary labeling resources are prioritized to cover the key areas most easily confused by the model, significantly improving supplementary labeling efficiency and sample utilization value; 4. By performing targeted supplementation on high-value samples and iteratively training the model after fusing the new samples with the original samples, the ranking consistency and scoring stability of the public landscape perception quantification model in key boundary intervals were improved. 5. By using a soft label aggregation mechanism for repeated samples, the results of multiple quantizations are no longer simply compressed into a single label, but the uncertainty information of the quantization boundary is preserved, thereby improving the model's adaptability to fuzzy samples and boundary samples. 6. By freezing the external test set to validate the formal model, the evaluation distortion caused by repeatedly tuning parameters only on the internal validation set is avoided, making the quantization model more generalizable and reliable in application. 7. By outputting integer perception levels, continuous perception values, and probability distributions, a quantitative index of public landscape perception applicable to scenic spots, regions, and other research subjects is formed, thereby providing a quantitative basis for landscape evaluation, public preference research, urban park management, and tourism decision-making; 8. By using modular design and optional human-computer interaction methods, the preprocessing of comments, quality identification, sample screening, model training and quantization output are encapsulated into an operable tool, reducing the threshold for use and enhancing the application value of this invention in scientific research institutions, planning departments and related industries. Attached Figure Description
[0018] The accompanying drawings of this invention are used to illustrate the technical solutions of this invention and are only for the purpose of assisting in understanding this invention. They do not constitute a limitation on the scope of protection of this invention. For those skilled in the art, other drawings or equivalent modifications to the drawings can be obtained based on these drawings without any creative effort.
[0019] Figure 1 A schematic diagram of the overall process of the intelligent measurement method for public landscape perception quantification provided by the present invention; Figure 2 This is a schematic diagram of a process for one embodiment of comment quality identification in this invention; Figure 3 This is a schematic diagram of an embodiment of the automatic screening of high-value samples in this invention; Figure 4 This is a schematic diagram of an embodiment of model iterative training and formal model determination in this invention; Figure 5 A schematic diagram of the intelligent measurement system architecture for public landscape perception quantification provided by this invention; Figure 6 This is a schematic diagram of the electronic device structure provided by the present invention. Detailed Implementation Plan To make the objectives, technical solutions, and beneficial effects of this invention clearer, the embodiments of this invention will be further described below with reference to the accompanying drawings. It should be understood that the described embodiments are only some, not all, of the embodiments of this invention. Based on the embodiments of this invention, equivalent modifications and substitutions made by those skilled in the art without creative effort should all fall within the protection scope of this invention.
[0020] Example 1: An Intelligent Measurement Method for Public Landscape Perception Quantification Comments like Figure 1 As shown, this embodiment provides an intelligent measurement method for public landscape perception quantification, including the following steps: S1: Original Comment Acquisition and Standardization Processing In this embodiment, user-specified comment data is used as the object to be processed. This comment data can originate from tourism platforms, lifestyle service platforms, social media platforms, Q&A platforms, or other information carriers that reflect public landscape experiences and subjective perceptions. Preferably, the original comment data includes at least the comment text, location information, and comment identification information; in one implementation, it may further include fields such as comment time, user identifier, image identifier, attraction name, attraction number, and comment-image pairing relationship.
[0021] The original comment data undergoes format identification and filtering, followed by standardization processing. This standardization process includes at least: S11: Read comment data files or database records and identify comment content fields, location fields, comment unique identifier fields, and other auxiliary fields; S12: Remove empty comments, garbled comments, unparseable records, and samples with missing key fields; S13: Standardize the comment text, including but not limited to removing extra spaces, standardizing punctuation format, standardizing encoding, and text cleaning; S14: Standardize location information, such as unifying the expression of city names, scenic spot names, park names, or other spatial object names; S15: Perform preliminary removal of duplicate samples. Preferably, use "comment content + location information" as the joint key to identify duplicate records, so as to avoid the same comment from entering the subsequent quantification process repeatedly; S16: Integrate the comment text with location information to generate a standardized text expression suitable for subsequent model input. Preferably, the structure is "City: X. Comment: Y" or "Attraction: X. Comment: Y" to enhance the model's ability to recognize the correspondence between landscape objects and comment content.
[0022] Through the above steps, standardized comment data with uniform format, standardized content, and suitable for subsequent quantitative analysis can be obtained, providing basic data for the construction of quantitative samples of public landscape perception and model training.
[0023] S2: Initial Quantization Sample Construction and Soft Label Aggregation From the standardized comment data obtained in step S1, a portion of the comment samples are extracted to construct an initial public landscape perception quantification sample set, and each comment is assigned a preset level of perception quantification label.
[0024] In this embodiment, the quantitative label uses an ordered scoring system of 1 to 7 levels, where lower scores represent lower public perception evaluations of the landscape, and higher scores represent higher public perception evaluations. Preferably, the scoring comprehensively considers factors such as landscape aesthetics, environmental experience, spatial comfort, visitor experience, and overall satisfaction, ultimately forming a unified quantitative result for perception. Furthermore, the scoring reasons, supporting explanations, or manual review comments can be recorded for each sample to improve labeling consistency and traceability.
[0025] Considering that the same review sample may be repeatedly labeled at different stages, or multiple rating results may be generated due to multiple rounds of optimization, this embodiment further aggregates duplicate samples. The specific process may include: S21: Group duplicate samples using comment content and location information as joint matching conditions; S22: Perform statistical analysis on multiple quantification results for each group of repeated samples to determine the main quantification label for that group of samples. Preferably, the mode, median, or the main score after rule correction can be used as the main label; S23: While retaining the main label, count the frequency of the sample at each quantization level, and perform normalization to generate the corresponding soft label distribution; S24: Save the soft label distribution together with the main label as input information for subsequent model training.
[0026] For example, in one implementation, if a comment sample is quantified to 6, 6 and 7 points respectively in multiple rounds of annotation, its main label can be determined to be 6 points. At the same time, a corresponding soft label distribution is generated to indicate that the sample is more biased towards the high perception range, but there is still some boundary uncertainty.
[0027] Through the above steps, an initial quantized sample set containing both main labels and soft labels can be constructed. Compared to schemes that only retain a single integer label, this embodiment can more fully preserve the quantization information of boundary samples and fuzzy samples, thereby improving the ability of subsequent models to distinguish adjacent perception levels.
[0028] S3: Comment Quality Identification and Massive Comment Quality Stratification To avoid interference from low-quality, low-information, or weakly relevant comments on the training and prediction of the public landscape perception quantification model, this embodiment further introduces a comment quality identification and quality stratification mechanism.
[0029] like Figure 2 As shown, a comment quality identification model is first constructed. Preferably, comment quality is divided into the following three categories: (1) Invalid or weakly related comments, used to represent comments that are purely symbolic, garbled, obviously off-topic, or only involve content that is weakly related to landscape perception, such as ticket purchase, customer service, accommodation, and transportation; (2) Low-information but quantifiable comments are used to express comments that contain less information and are shorter in length, but can still reflect basic attitudes or overall perceptions. (3) High-information effective comments are used to represent comments that contain a relatively complete description of the landscape experience, environmental feelings, spatial impressions or tour evaluations, and can stably support the public's quantitative judgment of landscape perception.
[0030] In this embodiment, a batch of comment samples can be further extracted from the original comment data to construct a comment quality annotation set, and a comment quality recognition model can be trained based on the quality annotation set. The quality recognition model can be implemented using traditional machine learning models, neural network models, pre-trained language models, or combinations thereof, preferably using a text classification model with context representation capabilities.
[0031] After training, the comment quality recognition model is applied to a massive amount of unlabeled comments, specifically including: S31: Input standardized comment text into the comment quality recognition model; S32: Output the quality category and the probability of each category for each comment; S33: Generate comment quality weights based on the quality category and its predicted confidence level. Preferably, assign higher weights to highly informative and effective comments, medium weights to low-information but quantifiable comments, and lower weights to invalid or weakly relevant comments; S34: Automatically stratify massive amounts of comments based on quality category, forming a high-information comment layer, a low-information comment layer, and an invalid comment layer.
[0032] Through the above steps, the differences in comment quality can be automatically identified and hierarchically controlled in massive comment scenarios, thereby reducing the impact of low-quality comments on the quantitative results of public landscape perception and providing a basis for subsequent high-value sample selection.
[0033] S4: Automatic screening and relabeling of high-value samples After obtaining the stratification results of comment quality, this embodiment further uses model uncertainty analysis and boundary proximity analysis to automatically filter the most valuable samples that are worth re-labeling from a massive number of unlabeled comments.
[0034] like Figure 3 As shown, the specific process may include: S41: Utilize the current public landscape perception quantification model to predict massive amounts of unlabeled comments, and obtain the integer prediction level, continuous prediction value, and probability distribution of each quantification level for each comment; S42: Calculate the uncertainty index of the model based on the probability distribution. Preferably, the uncertainty of the current model's judgment on the comment can be measured by indicators such as the predicted probability distribution entropy, the difference between the first and second highest probabilities, and the degree of dispersion of the probability distribution; S43: Calculate the quantified boundary proximity based on the continuous predicted values. Preferably, the distance between the continuous predicted values and the key boundaries is used as the boundary proximity index, and the key boundaries include at least perception level boundaries such as 4 / 5, 5 / 6, and 6 / 7; S44: The samples are comprehensively ranked based on the comment quality category and the scarcity of low-scoring samples in the current data. Preferably, samples with high boundary uncertainty are selected first from high-information comments, and samples with high model hesitation are selected first from low-information comments, with appropriate supplementation of comment samples with relatively low predicted scores and insufficient numbers. S45: Output a high-value supplementary sample set based on the comprehensive ranking results.
[0035] In one specific implementation, the high-value supplementary sample set can be further divided into the following categories: (1) High-score boundary high uncertainty samples in high-information comments, such as samples with continuous predicted values between 5 and 7 points and close to the 5 / 6 or 6 / 7 boundary; (2) High uncertainty samples in the high-information comments, such as samples with continuous predicted values between 4 and 6 points and close to the 4 / 5 or 5 / 6 boundary; (3) Low-scoring key samples, that is, samples with low predicted scores but relatively scarce distribution in the existing training data; (4) Samples with low information but high model hesitation, i.e., samples with low information content in the comments themselves, but with dispersed probability distribution and high boundary uncertainty.
[0036] Subsequently, the high-value supplementary samples output in step S45 are subjected to further public landscape perception quantification annotation to form a new annotated sample set. Preferably, the new annotated sample set continues to use the same scoring criteria and quantification rules as the initial quantified sample set to ensure the consistency and comparability of annotation results at different stages.
[0037] This embodiment uses an automatic screening and re-labeling mechanism for high-value samples to prioritize the coverage of newly labeled resources in the regions most prone to model confusion and most sensitive to quantization performance improvement, thereby improving sample utilization efficiency and model optimization effect.
[0038] S5: Iterative Training and Final Model Determination of Quantization Model like Figure 4 As shown, after obtaining the newly added high-value supplementary sample set in step S4, it is fused with the initial quantization sample set, and the public landscape perception quantization model is iteratively trained to obtain a formal model with better performance.
[0039] Its specific process may include: S51: The newly added high-value supplementary sample set is incorporated into the initial quantized sample set to form an updated training sample set; S52: Re-execute the process of grouping repeated samples, determining the main label, and constructing the soft label distribution on the updated training sample set to unify the expression of labeled samples at different stages; S53: Calculate the comprehensive training weights of the training samples based on the comment quality category, duplicate sample status, and source of the supplementary samples. Preferably, the comprehensive training weights include at least comment quality weights, duplicate sample correction weights, and high-value supplementary sample enhancement weights. S54: Based on the updated training sample set and comprehensive training weights, iteratively train the public landscape perception quantification model and output a candidate quantification model; S55: Construct a frozen external test set to independently validate the candidate quantization models. The frozen external test set is preferably a dataset that was not used in training and does not involve any parameter adjustments during model comparison to ensure the objectivity of the evaluation results. S56: Based on the external test results, select the optimal model that meets the preset evaluation requirements from multiple candidate models as the official model.
[0040] In this embodiment, model evaluation metrics may include, but are not limited to, accuracy, macro-average F1 score, weighted F1 score, mean absolute error, continuous sub-error, quadratic weighted Kappa coefficient, and correlation metrics. Preferably, the formal model is determined by comprehensively considering ranking consistency, scoring error, and boundary judgment ability.
[0041] Through the above steps, iterative optimization of the quantization model based on targeted supplementation of high-value samples can be achieved, thereby improving the ranking stability and perceived quantization accuracy of the formal model in massive comment scenarios.
[0042] S6: Automatic Quantization Output and Aggregation Analysis of Massive Comments After determining the formal public landscape perception quantification model in step S5, the formal model is used to automatically quantify and output a large number of comments in batches. The specific process may include: S61: Input standardized comment texts in batches into the formal public landscape perception quantification model; S62: Output the integer perception level of each comment, wherein the integer perception level is preferably one of levels 1 to 7; S63: Output the continuous perception value of each comment to characterize the intensity of public landscape perception in a more granular way; S64: Output the probability distribution corresponding to each quantification level, which is used to reflect the confidence level of the model in the quantification judgment of the comment; S65: Based on research needs, aggregate and analyze the quantitative results of individual comments by scenic spot, region, city, image object, route or other preset object to generate quantitative indicators of public landscape perception.
[0043] In one implementation, the aggregated analysis results can be further exported in the form of spreadsheets, database records, or visualization charts to facilitate subsequent landscape evaluation, public preference research, planning management, and decision support.
[0044] Through the above steps, this embodiment forms a complete automated processing flow of "comment acquisition and standardization, initial quantitative sample construction and soft label aggregation, comment quality identification and massive comment quality stratification, automatic screening and re-labeling of high-value samples, iterative training of quantitative models and determination of formal models, and automatic quantitative output and aggregation analysis of massive comments". It solves the problems of insufficient quantitative granularity, lack of comment quality stratification, low utilization efficiency of key samples, and lack of closed-loop optimization mechanism in the existing technology.
[0045] Example 2: Intelligent Measurement System for Public Landscape Perception Quantification Comments like Figure 5 As shown, this embodiment provides an intelligent measurement system for public landscape perception quantification, which is used to implement the method described in Embodiment 1, including: 1. Comment preprocessing module 41 The comment preprocessing module 41 is used to read, identify, filter and standardize the original comment data to form standardized comment data suitable for subsequent quantitative analysis of public landscape perception.
[0046] In this embodiment, the comment preprocessing module 41 may include at least the following sub-functional units: (1) Comment data reading unit, used to read raw comment data from local files, databases, servers or network platforms; (2) Field identification and filtering unit, used to identify comment content field, location information field, comment identifier field and other auxiliary fields, and to remove data records that are missing key fields; (3) Null value removal unit, used to remove empty comments, empty location records, unparseable garbled samples and invalid data; (4) Duplicate sample removal unit, used to identify duplicate samples and remove duplicates based on the joint key of comment content and location information; (5) Text standardization processing unit, used to unify the encoding of comment text, clean up spaces, standardize punctuation and clean up content; (6) Location information fusion unit, used to fuse location information with comment text to form a standardized text expression form suitable for quantitative model input.
[0047] By setting up the above modules, unified preprocessing of comment data from multiple sources can be achieved, ensuring the consistency of input for subsequent quantitative analysis.
[0048] 2. Initial annotation set construction and soft tag aggregation module 42 The initial annotation set construction and soft label aggregation module 42 is used to construct an initial training sample set with public landscape perception quantitative labels and to aggregate the multiple quantification results of repeated samples.
[0049] In this embodiment, the module 42 may include at least the following sub-functional units: (1) Initial labeled sample extraction unit, used to extract some comment samples from standardized comment data to form an initial quantized sample set; (2) Multi-level quantitative labeling unit, used to assign 1 to 7 levels of public landscape perception quantitative labels to the extracted samples; (3) Duplicate sample aggregation unit, used to identify duplicate samples with the same or equivalent comment content and location information, and group them; (4) Soft label distribution construction unit, used to count the frequency of occurrence of the same repeated sample at multiple quantization levels and construct the corresponding soft label distribution; (5) Initial training sample set generation unit, used to output training sample data that simultaneously contains main labels and soft labels.
[0050] This module allows for the creation of a high-quality sample set suitable for training ordered quantization tasks in the initial stage.
[0051] 3. Comment Quality Identification and Layering Module 43 The comment quality identification and stratification module 43 is used to determine the information completeness and quantification applicability of comment samples, and to automatically stratify the quality of massive comments.
[0052] In this embodiment, the module 43 may include at least the following sub-functional units: (1) Comment quality identification model unit, used to classify comments by quality; (2) Quality category prediction unit, used to output the quality category to which each comment belongs, preferably three categories: invalid or weakly relevant comments, low-information but quantifiable comments, and high-information effective comments; (3) Category probability output unit, used to output the predicted probability of a comment belonging to each quality category; (4) Quality training weight generation unit, used to generate comment quality weights based on quality category and its prediction confidence; (5) Massive comment quality stratification unit, used to stratify all comments by quality, forming a high-information comment layer, a low-information comment layer and an invalid comment layer.
[0053] This module can effectively reduce the interference of low-quality comments on the training and prediction results of quantization models.
[0054] 4. High-value sample mining and supplementation module 44 The high-value sample mining and labeling module 44 is used to identify the most worthy samples for re-quantification and labeling in a large number of unlabeled comments and generate a new labeling sample set.
[0055] In this embodiment, the module 44 may include at least the following sub-functional units: (1) Current model prediction output unit, used to generate integer prediction level, continuous prediction value and probability distribution for unlabeled comments using the current public landscape perception quantification model; (2) Uncertainty calculation unit, used to calculate sample uncertainty based on probability distribution entropy, the difference between the first and second highest probabilities, or other dispersion indicators; (3) Boundary proximity calculation unit, used to calculate the boundary proximity based on the distance between the predicted continuous value and the key quantization boundary; (4) Multi-factor comprehensive ranking unit, used to combine model uncertainty, boundary proximity, comment quality category and scarcity of low-scoring samples to generate high-value ranking results; (5) High-value sample set generation unit, used to output high-value supplementary sample set; (6) The interface unit is labeled again to support the manual or semi-automatic method of assigning public landscape perception quantitative labels of level 1 to 7 to high-value samples again.
[0056] This module allows newly added annotation resources to be prioritized for the weakest areas of the model, thereby improving the efficiency of supplementary annotation and the utilization value of samples.
[0057] 5. Model Training and Evaluation Module 45 The model training and evaluation module 45 is used to iteratively train the public landscape perception quantification model by fusing the newly added supplementary samples with the original samples, and to determine the formal model by freezing the external test set.
[0058] In this embodiment, the module 45 may include at least the following sub-functional units: (1) Sample fusion and update unit, used to merge and deduplicate newly added high-value supplementary samples with the original training samples; (2) Repeated sample aggregation and soft label update unit, used to reconstruct the main label and soft label distribution of the fused samples; (3) Training weight calculation unit, used to calculate the comprehensive training weight of the sample, wherein the comprehensive training weight includes at least the comment quality weight, the duplicate sample correction weight and the supplementary sample enhancement weight; (4) Model Iteration Training Unit, used to continue training or retrain the public landscape perception quantification model based on the updated training sample set; (5) Freeze the external test set evaluation unit for evaluating candidate models based on an independent dataset that is not used for training; (6) Formal model determination unit, used to select a formal public landscape perception quantitative model according to the preset evaluation criteria.
[0059] This module enables continuous model optimization and objective determination of the official version.
[0060] 6. Quantitative Output and Aggregation Analysis Module 46 The quantitative output and aggregation analysis module 46 is used to perform batch prediction on massive comment data using a formal public landscape perception quantitative model, and to perform aggregation analysis on the results.
[0061] In this embodiment, the module 46 may include at least the following sub-functional units: (1) Massive comment batch prediction unit, used to output quantitative results for all comments to be tested; (2) Integer level output unit, used to output the integer perception level of 1 to 7 corresponding to each comment; (3) Continuous sensing value output unit, used to output more fine-grained continuous sensing values; (4) Probability distribution output unit, used to output the probability distribution corresponding to each quantization level; (5) Aggregation analysis unit, used to aggregate and statistically analyze quantitative results by scenic spot, region, city, image object, route or other research object; (6) Quantitative index output unit, used to output the final public landscape perception quantitative index.
[0062] This module enables the conversion from quantitative results of a single comment to quantitative indicators at the regional or object level.
[0063] 7. Optional human-computer interaction module 47 Preferably, the system further includes an optional human-computer interaction module 47, which provides users with a visual operation entry point and process control interface. The module 47 can be used to receive input paths, output paths, sample screening parameters, quantization level parameters, model training parameters, and other operation control parameters, and can further provide functions such as processing progress display, result visualization, and operation status monitoring.
[0064] In one embodiment, the human-computer interaction module 47 can be connected to the above-mentioned functional modules to coordinate the execution flow between the modules and realize automated control from comment import to quantitative result output.
[0065] 8. Data and Resources In this embodiment, the system can also be connected to a data and resource unit, which preferably includes: (1) Original comment database; (2) Location information database; (3) Initial labeled dataset; (4) Freeze the external test set; (5) Model file library; (6) Run log and results database.
[0066] Through the modular architecture design described above, this embodiment integrates comment preprocessing, quality identification, high-value sample mining, model training and evaluation, and quantitative output into a unified system, which has a good degree of automation, scalability, and engineering application value.
[0067] Example 3: Electronic device and computer-readable storage medium like Figure 6 As shown, this embodiment provides an electronic device, which can be a server, personal computer, laptop, workstation, or other terminal device with data processing capabilities. The electronic device includes hardware components such as a processor, memory, and bus, wherein: The memory stores a computer program, which, when executed by the processor, implements the steps of the intelligent measurement method for public landscape perception quantification described in Embodiment 1, including original comment acquisition and standardization processing, initial quantification sample construction and soft label aggregation, comment quality identification and massive comment quality stratification, automatic screening and re-labeling of high-value samples, iterative training of the quantification model and determination of the formal model, and automatic quantification output and aggregation analysis of massive comments.
[0068] In one implementation, the processor can call program code stored in memory to perform automated processing on the raw comment data, and complete the corresponding processing flow according to the user-defined input / output paths, filtering parameters, quantization parameters, and training parameters. Furthermore, the electronic device can also communicate with external databases, servers, or cloud platforms via a network interface to achieve functions such as comment data reading, model loading, result feedback, and log recording.
[0069] The present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, is used to implement the method described in any embodiment of the present invention. The computer-readable storage medium may be a read-only memory, a random access memory, a magnetic disk, an optical disk, a flash memory, a memory card, a portable hard drive, or any combination thereof.
[0070] Those skilled in the art will understand that the above-mentioned functional modules can be implemented by software, hardware, or a combination of both. These functional modules can be physically separated or integrated into a single device, and their specific implementation does not affect the substantive content of the technical solution of this invention. Equivalent substitutions and structural adjustments made to the embodiments without departing from the spirit and scope of the claims should be considered to fall within the protection scope of this invention.
Claims
1. A method for intelligently measuring public landscape perception through comments, characterized in that, Includes the following steps: Step S1: Obtain massive amounts of comment data from the user-specified source; perform format recognition, field filtering, null value removal, duplicate sample removal, and text standardization on the comment data; and fuse the comment content with location information to obtain standardized comment data. Step S2: Extract a portion of the comment samples from the standardized comment data to construct an initial public landscape perception quantitative sample set. Assign a preset quantitative level label to the comment samples and aggregate the multiple quantitative results of duplicate samples to generate a distribution of main label and soft label. Step S3: Construct a comment quality identification model, classify comments by quality, apply the comment quality identification model to a large amount of unlabeled comment data, obtain comment quality categories and corresponding category probabilities, and generate comment quality weights based on the comment quality categories; Step S4: Based on the prediction results of the current public landscape perception quantification model for unlabeled comments, and combined with model uncertainty, quantification boundary proximity, comment quality category and scarcity of low-scoring samples, high-value samples in the massive comments are automatically screened, and the screened high-value samples are re-quantified and labeled to form a new labeled sample set. Step S5: The newly added labeled sample set is incorporated into the initial public landscape perception quantification sample set. Repeated sample aggregation, soft label construction, comment quality prediction and training weight calculation are re-executed. The public landscape perception quantification model is iteratively trained based on the updated training sample set. The formal public landscape perception quantification model is determined by combining the evaluation results of the frozen external test set. Step S6: Use the formal public landscape perception quantification model to perform batch prediction on massive comment data, output the integer perception level, continuous perception value and probability distribution of each quantification level corresponding to each comment, and perform aggregate analysis according to scenic spots, regions, routes, image objects or other preset research objects to form public landscape perception quantification indicators.
2. The intelligent measurement method for public landscape perception quantification according to claim 1, characterized in that, Step S1 includes: Read comment data files or database records and identify comment content fields, location information fields, comment unique identifier fields, and auxiliary fields; Remove empty comments, garbled comments, unparseable records, and samples missing key fields; The comment text is standardized, including removing extra spaces, standardizing punctuation, standardizing encoding, and text cleaning. The location information is standardized and then merged with the comment text into a standardized text expression form, which is used as input for subsequent models.
3. The intelligent measurement method for public landscape perception quantification according to claim 1, characterized in that, Step S2 includes: Duplicate samples were grouped using comment content and location information as joint matching criteria; Statistical analysis was performed on multiple quantization results for each group of repeated samples to determine the main quantization label; While retaining the main quantization label, the frequency of occurrence of the sample at each quantization level is counted and normalized to generate a soft label distribution. The main quantization label and the soft label distribution are saved together as input information for subsequent model training.
4. The intelligent measurement method for public landscape perception quantification according to claim 1, characterized in that, Step S3 includes: The quality of comments is categorized into three types: invalid or weakly relevant comments, low-information but quantifiable comments, and high-information effective comments. Input standardized comment text into the comment quality recognition model, and output the quality category of each comment and the probability of the corresponding category. Generate comment quality weights based on comment quality categories and their predicted confidence levels; The massive amount of comments is automatically stratified based on their quality category, forming a high-information comment layer, a low-information comment layer, and an invalid comment layer.
5. The intelligent measurement method for public landscape perception quantification according to claim 1, characterized in that, Step S4 includes: Using the current public landscape perception quantification model, we predict massive amounts of unlabeled comments to obtain integer prediction levels, continuous prediction values, and probability distributions for each quantification level. The uncertainty index of the model is calculated based on the probability distribution. The model uncertainty index includes one or more of the following: predicted probability distribution entropy, the difference between the first high probability and the second high probability, and the degree of dispersion of the probability distribution. The degree of proximity of the quantization boundary is calculated based on the continuous predicted values, and the quantization boundary includes one or more boundaries among 4 / 5, 5 / 6, and 6 / 7; By combining the quality category of comments and the distribution of low-scoring samples in the current data, the samples are comprehensively ranked, and a high-value supplementary sample set is output.
6. The intelligent measurement method for public landscape perception quantification according to claim 1, characterized in that, Step S5 includes: The newly added high-value supplementary sample set is incorporated into the initial quantized sample set to form the updated training sample set; Repeated sample aggregation and soft label distribution construction are performed again on the updated training sample set; Based on the comment quality category, duplicate sample status, and source of supplementary samples, the comprehensive training weight of the training samples is calculated. The comprehensive training weight includes at least the comment quality weight, duplicate sample correction weight, and high-value supplementary sample enhancement weight. The public landscape perception quantification model is iteratively trained based on the updated training sample set and comprehensive training weights to output candidate quantification models. Construct a frozen external test set, independently evaluate the candidate quantization models, and determine the formal model according to preset evaluation requirements.
7. A smart measurement system for public landscape perception quantification, characterized in that, include: The comment preprocessing module is used to read massive amounts of comments, identify fields, remove null values, remove duplicate samples, and standardize text. The initial annotation set construction and soft label aggregation module is used to construct a training sample set with multi-level public landscape perception quantitative labels, and to aggregate the multiple quantification results of repeated samples to generate the corresponding soft label distribution. The comment quality identification and stratification module is used to classify comments by quality and output the comment quality category, probability information, and quality weight. The high-value sample mining and labeling module is used to automatically filter high-value labeling samples from a large number of comments based on model uncertainty, boundary proximity, comment quality category, and scarcity of low-scoring samples, and supports subsequent re-quantification and labeling. The model training and evaluation module is used to merge the newly added supplementary samples with the original training samples, iteratively train the public landscape perception quantification model, and determine the formal model based on the frozen external test set. The quantitative output and aggregation analysis module is used to output integer levels, continuous scores and probability distributions for massive amounts of reviews using formal models, and supports aggregation analysis for attractions, regions or other research objects.
8. An electronic device, characterized in that, The system includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein the processor, when executing the computer program, is configured to perform the intelligent measurement method for public landscape perception quantification as described in any one of claims 1 to 6.
9. A non-transitory computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, is used to implement the intelligent measurement method for public landscape perception quantification as described in any one of claims 1 to 6.