A method for extracting important opinions in public opinion events
By utilizing mutual information and cross-entropy algorithms, the Glove model, and the NER model in public opinion events, combined with expert rules, to extract and classify viewpoints, this method overcomes the limitations of existing technologies in public opinion event viewpoint analysis. It achieves accurate extraction and classification of important viewpoints from public opinion events of various data types, supporting applications across multiple industries.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG SHUYUAN ZHIHUI TECH CO LTD
- Filing Date
- 2022-12-30
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies struggle to analyze public opinion events in the context of the internet, effectively classify and differentiate the importance and influence of these opinions, and lack support for various data types.
By aggregating public opinion event data, the most frequent phrases and words are extracted using mutual information and left-right cross-entropy algorithms. A dictionary set is constructed using the Glove model, non-opinion opinions are filtered using expert rules, and entities are extracted using the NER model and syntactic dependency tree to screen and classify opinions.
It enables accurate extraction and classification of important and influential viewpoints from public opinion events of various data types, supports multiple industries and event types, eliminates noise, and provides a basis for decision-making.
Smart Images

Figure CN116150461B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of network information processing technology, specifically a method for extracting important viewpoints from public opinion events. Background Technology
[0002] Currently, the main techniques for extracting and recognizing opinions primarily utilize word vectors, sentiment analysis, keyword clustering, and other methods. These techniques are mainly used to extract opinions from online forum data or user comment data, or as opinion extraction methods for specific scenarios, such as e-commerce.
[0003] Current identification techniques are limited to extracting viewpoints from textual data. With the development of new media, netizens are no longer confined to text-based platforms like forums and microblogs, but are increasingly using video, audio, and images to express their opinions. Furthermore, there is a lack of categorization and ranking of viewpoints based on their importance; some seemingly insignificant viewpoints are extracted as key points. After processing massive amounts of data, the extracted viewpoint data is vast, and there is a lack of effective methods for classifying and identifying these extracted viewpoints.
[0004] Patent CN108363725A discloses a method for extracting user comment opinions and generating opinion tags. This method first constructs an initial opinion part-of-speech rule base based on user comments, then automatically discovers new user opinion part-of-speech rules through continuous iteration, and obtains user comment opinions through part-of-speech rule matching. This method focuses more on comment data and does not incorporate other data types, nor does it provide a method for classifying opinions, thus having its limitations.
[0005] Patent CN201210038746 discloses a method for extracting attribute-opinion pairs from Chinese opinion and evaluation information. This method boasts high accuracy and robustness, eliminating the need for annotation and model training. However, it primarily extracts opinions from the metadata itself, neglecting to consider the logic of opinion differentiation across different events. A single opinion might be present in one event but not in another, failing to integrate the opinion with the event itself.
[0006] Patent CN101408883B discloses a method for collecting online public opinion opinions. This method extracts trending terms from online forums, then extracts related information documents based on these terms to form a set of trending event documents related to those terms. Finally, it clusters key sentences within these documents to obtain multiple sets of key opinion sentences for a specific trending event. However, this method only analyzes forum data and is not comprehensive enough. The development of new media has replaced online forums, and extracting opinions based on online forums is no longer suitable for the current state of the internet. Furthermore, it does not provide a method for differentiating the importance of different opinions.
[0007] In today's internet-driven world, the main challenge is to analyze public opinion events by identifying relevant viewpoints, effectively categorizing them, and determining the relative importance and influence of different viewpoints within each event. Summary of the Invention
[0008] In view of the problems existing in the prior art, the present invention discloses a method for extracting important viewpoints from public opinion events, including the following steps:
[0009] Step 1: Aggregate public opinion event data: Describe the theme of online public opinion events in terms of entities, locations, and events, and extract the main keywords accordingly. Combine the extracted keywords with AND, OR, and NOT operations using the main keywords. Search the public opinion database using these keywords to obtain the dataset related to the event.
[0010] Step 2: Extract the most frequent phrases and words using mutual information and left-right cross-entropy algorithms: Extract phrases and words that appear at least twice in the event dataset from the massive data based on the mutual information and left-right cross-entropy algorithm model. These phrases and words are used as proper nouns representing the corresponding events. The vector values of the extracted proper nouns are calculated using the mutual information and left-right cross-entropy algorithms.
[0011] Step 3: Construct a dictionary based on the Glove model and extracted phrase proper nouns: Based on the vector values of the corresponding event proper nouns, construct a co-occurrence matrix by combining the Glove model with proper nouns and industry-specific phrase libraries. Each element in the matrix is represented by X. ij This represents the number of times word i and its context word j co-occur in a context of a specific size. Generally, the smallest unit of this number is 1, but based on the Glove model, the weights are calculated using a decay function decay = 1 / d, based on the distance d between the two keywords in the context. The formula is as follows:
[0012]
[0013] Based on this formula, its loss function is constructed as follows:
[0014]
[0015] After training the machine learning model, a dictionary set that matches or is similar to the event is obtained, and the top 30 with the highest scores are selected.
[0016] Step 4: Expert rule filtering of non-speech opinions: Experts manually screen out words that do not conform to the expression of subjective opinions and remove them from the dictionary;
[0017] Step 5: Extract candidate viewpoint context based on dictionary: Based on the filtered dictionary, after word segmentation by a word segmenter, extract context sentences related to the viewpoint from the event dataset, and consider these data as candidate viewpoints;
[0018] Step 6: Extract entities based on NER model and syntactic dependency tree: The NER system extracts entities from unstructured input text and identifies more categories of entities according to business needs.
[0019] In business scenarios, expressing opinions typically consists of the speaker and the content. The speaker, besides being a specific person, also includes their organization, title, and position. Entity extraction is performed on the content using NER (Network Entity Retrieval) and dependency trees, and the entities are categorized based on their organization, title, and position. The categorization rules utilize a pre-organized code table library, and the importance of each entity is determined by comparing it with this library. Examples include authoritative opinions, official opinions, and opinions from influential figures (like KOLs).
[0020] As a preferred embodiment of the present invention, the mutual information formula in step two is:
[0021]
[0022] The formula for the cross-entropy loss function is:
[0023]
[0024] As a preferred embodiment of the present invention, the NER mentioned in step six is also known as proper name recognition, which is a fundamental task in natural language processing and has a very wide range of applications; entities generally refer to entities in text that have specific meanings or strong referentiality, usually including personal names, organization names, proper nouns, etc.
[0025] The beneficial effects of this invention are as follows: This invention utilizes machine learning and algorithmic models to extract specific phrases and proper nouns from massive amounts of text based on mutual information and left-right cross-entropy. It trains a word vector model using industry-specific corpora based on the GloVe model, recalls synonyms for "say" and "express" using word vectors, extracts a dictionary of proper nouns, and recalls sentences belonging to opinions based on expert rules. It uses a NER model to determine whether the expressive field in the opinion contains a business-specified entity type, filters the opinions, and analyzes the lexical dependencies of the expressive field using syntactic dependency trees to extract expressive entity relationships as important evidence for the opinions. The key point is the combination of machine learning algorithmic models to train a batch of industry- and event-related proper noun and phrase models, expressive entity models, etc. Through the vector relationships between these models and actual public opinion events, important and influential opinions are effectively extracted. This technology can be extended to multiple industries and various types of events, and is not limited to a single data type. It supports multiple data types, clusters multiple viewpoints under large data volumes for easy viewing and understanding, judges the importance of viewpoints based on influence, provides a basis for decision-making, distinguishes them from objective facts, accurately extracts subjective viewpoints, and eliminates noise. Attached Figure Description
[0026] Figure 1 A flowchart illustrating the steps of the invention;
[0027] Figure 2 This is a flowchart of the method of the invention; Detailed Implementation
[0028] Example 1
[0029] This invention discloses a method for extracting key viewpoints from public opinion events, comprising the following steps:
[0030] Step 1: Aggregate public opinion event data: Describe the theme of online public opinion events in terms of entities, locations, and events, and extract the main keywords accordingly. Combine the extracted keywords with AND, OR, and NOT operations using the main keywords. Search the public opinion database using these keywords to obtain the dataset related to the event.
[0031] For example, in the "Li Moumou XX Clothing Public Opinion Incident," Li Moumou is the entity keyword, XX is the event keyword, and the new product launch is the location keyword. By associating the keywords "Li Moumou + new product launch + XX," all data results related to this event can be retrieved from the public opinion database, thus completing the first step of event aggregation. Theoretically, as long as there are relevant keywords that can represent the event, they should be included in the keyword combination scope to maximize the aggregation of event data.
[0032] Step 2: Extract the most frequent phrases and words using mutual information and left-right cross-entropy algorithms: Extract phrases and words that appear at least twice in the event dataset from the massive data based on the mutual information and left-right cross-entropy algorithm model.
[0033] The mutual information formula is:
[0034]
[0035] The formula for the cross-entropy loss function is:
[0036]
[0037] Based on the above formula, text processing of the event document set revealed that keywords such as "XX", "Li Moumou", "XX helmet", "hat", and "helmet" appeared frequently. This indicates that these keywords are strongly correlated with the event itself and have extensibility, making them suitable as proper nouns representing the public opinion event related to Li Moumou's XX uniform. The vector values of the extracted keywords were calculated using mutual information and cross-entropy, as shown in the table below.
[0038]
[0039] Step 3: Construct a dictionary based on the Glove model and extracted phrase proper nouns: Based on the vector values of the corresponding event proper nouns, construct a co-occurrence matrix by combining the Glove model with proper nouns and industry-specific phrase libraries. Each element in the matrix is represented by X. ij This represents the number of times word i and its context word j co-occur in a context of a specific size. Generally, the smallest unit of this number is 1, but based on the Glove model, the weights are calculated using a decay function decay = 1 / d, based on the distance d between the two keywords in the context. The formula is as follows:
[0040]
[0041] Based on this formula, its loss function is constructed as follows:
[0042]
[0043] After training the machine learning model, a dictionary set that matches or is similar to the event is obtained, and the top 30 with the highest scores are selected.
[0044] Step 4: Expert rule filtering of non-speech opinions: Experts manually screen out words that do not conform to the expression of subjective opinions and remove them from the dictionary;
[0045] Step 5: Extract candidate viewpoint context based on dictionary: Based on the filtered dictionary, after word segmentation by a word segmenter, extract context sentences related to the viewpoint from the event dataset, and consider these data as candidate viewpoints;
[0046] Step Six: Entity Extraction Based on NER Model and Syntactic Dependency Tree: NER, also known as proper name recognition, is a fundamental task in natural language processing with a wide range of applications. Entities generally refer to entities in text that have specific meaning or strong referentiality, typically including names of people, organizations, proper nouns, etc. The NER system extracts entities from unstructured input text and identifies more categories of entities according to business needs.
[0047] In business scenarios, expressing opinions typically consists of the speaker and the content. The speaker includes not only a specific person but also their organization, title, and position. Entity extraction is performed on the content using NER and dependency trees, and the entities are categorized based on their organization, title, and position. The categorization rules utilize a pre-organized code table library, and the importance of entities is determined by comparing them with this library. Examples include authoritative opinions, official opinions, and opinions from influential figures. Important opinions are then extracted from the obtained opinion data and processed systematically, as shown in the table below:
[0048] Example text Classification A university professor said authority Epidemic prevention experts authority A spokesperson for the county government stated official The municipal party secretary said in his speech official CBN Big V Dr. Zeng Big V …… ……
[0049] The parts not described in detail in this article are existing technologies.
[0050] While the specific embodiments of the present invention have been described in detail above, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention, and modifications or variations without creative effort are still within the protection scope of the present invention.
Claims
1. A method for extracting key viewpoints from public opinion events, characterized in that, The steps include the following: Step 1: Aggregate public opinion event data: Describe the theme of online public opinion events in terms of entities, locations, and events, and extract the main keywords accordingly. Combine the extracted keywords with AND, OR, and NOT operations using the main keywords. Search the public opinion database using these keywords to obtain the dataset related to the event. Step 2: Extract the most frequent phrases and words using mutual information and left-right cross-entropy algorithms: Extract phrases and words that appear at least twice in the event dataset from the massive data based on the mutual information and left-right cross-entropy algorithm model. These phrases and words are used as proper nouns representing the corresponding events. The vector values of the extracted proper nouns are calculated using the mutual information and left-right cross-entropy algorithms. Step 3: Construct a dictionary based on the Glove model and extracted phrase proper nouns: Based on the vector values of the corresponding event proper nouns, construct a co-occurrence matrix by combining the Glove model with proper nouns and industry-specific phrase libraries. Each element in the matrix is represented by X. ij This represents the number of times word i and its context word j co-occur in a context of a specific size. The smallest unit of this number is 1, but based on the Glove model, the weights are calculated using the decay function decay=1 / d based on the distance d between two keywords in the context, where the formula is as follows: ; Based on this formula, its loss function is constructed as follows: ; After training the machine learning model, a dictionary set matching the events is obtained, and the top 30 with the highest scores are selected. Step 4: Expert rule filtering of non-speech opinions: Experts manually screen out words that do not conform to the expression of subjective opinions and remove them from the dictionary; Step 5: Extract candidate viewpoint context based on dictionary: Based on the filtered dictionary, after word segmentation by a word segmenter, extract context sentences related to the viewpoint from the event dataset, and consider these data as candidate viewpoints; Step 6: Extract entities based on NER model and syntactic dependency tree: The NER system extracts entities from unstructured input text and identifies more categories of entities according to business needs; The classification rules use a pre-organized code table library, and the importance of an entity is determined by comparing it with the code table library; Important viewpoints can be extracted sequentially from the obtained viewpoint data and then processed in an orderly manner.
2. The method for extracting important viewpoints from public opinion events according to claim 1, characterized in that: The mutual information formula mentioned in step two is: ; The formula for the cross-entropy loss function is: 。 3. The method for extracting important viewpoints from public opinion events according to claim 1, characterized in that: The NER mentioned in step six, also known as proper name recognition, is a fundamental task in natural language processing. Entities refer to entities in text that have specific meaning and strong referentiality, including personal names, organization names, and proper nouns.