A method for extracting important opinions in public opinion events

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By utilizing mutual information and cross-entropy algorithms, the Glove model, and the NER model in public opinion events, combined with expert rules, to extract and classify viewpoints, this method overcomes the limitations of existing technologies in public opinion event viewpoint analysis. It achieves accurate extraction and classification of important viewpoints from public opinion events of various data types, supporting applications across multiple industries.

CN116150461BActive Publication Date: 2026-06-23GUANGDONG SHUYUAN ZHIHUI TECH CO LTD

View PDF 5 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: GUANGDONG SHUYUAN ZHIHUI TECH CO LTD
Filing Date: 2022-12-30
Publication Date: 2026-06-23

Application Information

Patent Timeline

30 Dec 2022

Application

23 Jun 2026

Publication

CN116150461B

IPC: G06F16/953; G06F16/334; G06F16/353; G06F16/355; G06F16/36; G06F40/242; G06F40/295

AI Tagging

Application Domain

Natural language data processing Special data processing applications

Technology Topics

Data class Mutual information

Technical Efficacy Phrases

Easy to read and understandaccurate extraction

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Reactors and methods for biological treatment of wastewater
CN111344258Bavoid separation accurate extractionBiological treatment regulationBiological treatment apparatusSludge Environmental chemistry
A method and system for automatic generation and optimization of database table structure
CN122285523ARealize intelligent generationachieve optimizationTable (database)Theoretical computer science
A Health Status Estimation Method for Lithium-ion Batteries Based on Aging Multi-channel Feature Enhancement
CN121856819Btimely maintenanceReplace in timeCells structural combination Electrical testing Battery degradation Electrical battery
A low signal-to-noise ratio direct spread signal detection method based on noise cancellation
CN120567342BDetection is weakEliminate non-stationary noise interferenceInterference (communication)Frequency spectrum
A method for optimizing the pose graph of a monocular camera in a drone
CN118674637Bsuppress noisesmall gradient changeImage enhancement Image analysisDifference of GaussiansRadiology

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies struggle to analyze public opinion events in the context of the internet, effectively classify and differentiate the importance and influence of these opinions, and lack support for various data types.

Method used

By aggregating public opinion event data, the most frequent phrases and words are extracted using mutual information and left-right cross-entropy algorithms. A dictionary set is constructed using the Glove model, non-opinion opinions are filtered using expert rules, and entities are extracted using the NER model and syntactic dependency tree to screen and classify opinions.

Benefits of technology

It enables accurate extraction and classification of important and influential viewpoints from public opinion events of various data types, supports multiple industries and event types, eliminates noise, and provides a basis for decision-making.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116150461B_ABST

Patent Text Reader

Abstract

The present application relates to a kind of methods for extracting important opinions in public opinion events.The present application utilizes machine learning and algorithm model, extracts specific phrases and proper nouns in industry from mass text based on mutual information and left-right cross entropy, utilizes industry corpus to train word vector model based on glove model, utilizes word vector to recall the synonyms of "say" and "express", extracts the proper noun dictionary, and according to expert rules, the sentences belonging to speech opinions are recalled, whether the speaker field in the opinion contains the entity type specified by the business is judged using NER model, the opinion is screened, the lexical dependency relationship of speaker field is analyzed using syntax dependency tree, the speaker entity relationship is obtained as important opinion basis.The present technology can be extended to multiple industries and multiple types of events, is not limited to a single data type, supports multiple data types, and clusters multiple opinions under large data, which is convenient for viewing and understanding.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of network information processing technology, specifically a method for extracting important viewpoints from public opinion events. Background Technology

[0002] Currently, the main techniques for extracting and recognizing opinions primarily utilize word vectors, sentiment analysis, keyword clustering, and other methods. These techniques are mainly used to extract opinions from online forum data or user comment data, or as opinion extraction methods for specific scenarios, such as e-commerce.

[0003] Current identification techniques are limited to extracting viewpoints from textual data. With the development of new media, netizens are no longer confined to text-based platforms like forums and microblogs, but are increasingly using video, audio, and images to express their opinions. Furthermore, there is a lack of categorization and ranking of viewpoints based on their importance; some seemingly insignificant viewpoints are extracted as key points. After processing massive amounts of data, the extracted viewpoint data is vast, and there is a lack of effective methods for classifying and identifying these extracted viewpoints.

[0004] Patent CN108363725A discloses a method for extracting user comment opinions and generating opinion tags. This method first constructs an initial opinion part-of-speech rule base based on user comments, then automatically discovers new user opinion part-of-speech rules through continuous iteration, and obtains user comment opinions through part-of-speech rule matching. This method focuses more on comment data and does not incorporate other data types, nor does it provide a method for classifying opinions, thus having its limitations.

[0005] Patent CN201210038746 discloses a method for extracting attribute-opinion pairs from Chinese opinion and evaluation information. This method boasts high accuracy and robustness, eliminating the need for annotation and model training. However, it primarily extracts opinions from the metadata itself, neglecting to consider the logic of opinion differentiation across different events. A single opinion might be present in one event but not in another, failing to integrate the opinion with the event itself.

[0006] Patent CN101408883B discloses a method for collecting online public opinion opinions. This method extracts trending terms from online forums, then extracts related information documents based on these terms to form a set of trending event documents related to those terms. Finally, it clusters key sentences within these documents to obtain multiple sets of key opinion sentences for a specific trending event. However, this method only analyzes forum data and is not comprehensive enough. The development of new media has replaced online forums, and extracting opinions based on online forums is no longer suitable for the current state of the internet. Furthermore, it does not provide a method for differentiating the importance of different opinions.

[0007] In today's internet-driven world, the main challenge is to analyze public opinion events by identifying relevant viewpoints, effectively categorizing them, and determining the relative importance and influence of different viewpoints within each event. Summary of the Invention

[0008] In view of the problems existing in the prior art, the present invention discloses a method for extracting important viewpoints from public opinion events, including the following steps:

[0009] Step 1: Aggregate public opinion event data: Describe the theme of online public opinion events in terms of entities, locations, and events, and extract the main keywords accordingly. Combine the extracted keywords with AND, OR, and NOT operations using the main keywords. Search the public opinion database using these keywords to obtain the dataset related to the event.

[0010] Step 2: Extract the most frequent phrases and words using mutual information and left-right cross-entropy algorithms: Extract phrases and words that appear at least twice in the event dataset from the massive data based on the mutual information and left-right cross-entropy algorithm model. These phrases and words are used as proper nouns representing the corresponding events. The vector values of the extracted proper nouns are calculated using the mutual information and left-right cross-entropy algorithms.

[0011] Step 3: Construct a dictionary based on the Glove model and extracted phrase proper nouns: Based on the vector values of the corresponding event proper nouns, construct a co-occurrence matrix by combining the Glove model with proper nouns and industry-specific phrase libraries. Each element in the matrix is represented by X. ij This represents the number of times word i and its context word j co-occur in a context of a specific size. Generally, the smallest unit of this number is 1, but based on the Glove model, the weights are calculated using a decay function decay = 1 / d, based on the distance d between the two keywords in the context. The formula is as follows:

[0012]

[0013] Based on this formula, its loss function is constructed as follows:

[0014]

[0015] After training the machine learning model, a dictionary set that matches or is similar to the event is obtained, and the top 30 with the highest scores are selected.

[0016] Step 4: Expert rule filtering of non-speech opinions: Experts manually screen out words that do not conform to the expression of subjective opinions and remove them from the dictionary;

[0017] Step 5: Extract candidate viewpoint context based on dictionary: Based on the filtered dictionary, after word segmentation by a word segmenter, extract context sentences related to the viewpoint from the event dataset, and consider these data as candidate viewpoints;

[0018] Step 6: Extract entities based on NER model and syntactic dependency tree: The NER system extracts entities from unstructured input text and identifies more categories of entities according to business needs.

[0019] In business scenarios, expressing opinions typically consists of the speaker and the content. The speaker, besides being a specific person, also includes their organization, title, and position. Entity extraction is performed on the content using NER (Network Entity Retrieval) and dependency trees, and the entities are categorized based on their organization, title, and position. The categorization rules utilize a pre-organized code table library, and the importance of each entity is determined by comparing it with this library. Examples include authoritative opinions, official opinions, and opinions from influential figures (like KOLs).

[0020] As a preferred embodiment of the present invention, the mutual information formula in step two is:

[0021]

[0022] The formula for the cross-entropy loss function is:

[0023]

[0024] As a preferred embodiment of the present invention, the NER mentioned in step six is also known as proper name recognition, which is a fundamental task in natural language processing and has a very wide range of applications; entities generally refer to entities in text that have specific meanings or strong referentiality, usually including personal names, organization names, proper nouns, etc.

[0025] The beneficial effects of this invention are as follows: This invention utilizes machine learning and algorithmic models to extract specific phrases and proper nouns from massive amounts of text based on mutual information and left-right cross-entropy. It trains a word vector model using industry-specific corpora based on the GloVe model, recalls synonyms for "say" and "express" using word vectors, extracts a dictionary of proper nouns, and recalls sentences belonging to opinions based on expert rules. It uses a NER model to determine whether the expressive field in the opinion contains a business-specified entity type, filters the opinions, and analyzes the lexical dependencies of the expressive field using syntactic dependency trees to extract expressive entity relationships as important evidence for the opinions. The key point is the combination of machine learning algorithmic models to train a batch of industry- and event-related proper noun and phrase models, expressive entity models, etc. Through the vector relationships between these models and actual public opinion events, important and influential opinions are effectively extracted. This technology can be extended to multiple industries and various types of events, and is not limited to a single data type. It supports multiple data types, clusters multiple viewpoints under large data volumes for easy viewing and understanding, judges the importance of viewpoints based on influence, provides a basis for decision-making, distinguishes them from objective facts, accurately extracts subjective viewpoints, and eliminates noise. Attached Figure Description

[0026] Figure 1 A flowchart illustrating the steps of the invention;

[0027] Figure 2 This is a flowchart of the method of the invention; Detailed Implementation

[0028] Example 1

[0029] This invention discloses a method for extracting key viewpoints from public opinion events, comprising the following steps:

[0030] Step 1: Aggregate public opinion event data: Describe the theme of online public opinion events in terms of entities, locations, and events, and extract the main keywords accordingly. Combine the extracted keywords with AND, OR, and NOT operations using the main keywords. Search the public opinion database using these keywords to obtain the dataset related to the event.

[0031] For example, in the "Li Moumou XX Clothing Public Opinion Incident," Li Moumou is the entity keyword, XX is the event keyword, and the new product launch is the location keyword. By associating the keywords "Li Moumou + new product launch + XX," all data results related to this event can be retrieved from the public opinion database, thus completing the first step of event aggregation. Theoretically, as long as there are relevant keywords that can represent the event, they should be included in the keyword combination scope to maximize the aggregation of event data.

[0032] Step 2: Extract the most frequent phrases and words using mutual information and left-right cross-entropy algorithms: Extract phrases and words that appear at least twice in the event dataset from the massive data based on the mutual information and left-right cross-entropy algorithm model.

[0033] The mutual information formula is:

[0034]

[0035] The formula for the cross-entropy loss function is:

[0036]

[0037] Based on the above formula, text processing of the event document set revealed that keywords such as "XX", "Li Moumou", "XX helmet", "hat", and "helmet" appeared frequently. This indicates that these keywords are strongly correlated with the event itself and have extensibility, making them suitable as proper nouns representing the public opinion event related to Li Moumou's XX uniform. The vector values of the extracted keywords were calculated using mutual information and cross-entropy, as shown in the table below.

[0038]

[0039] Step 3: Construct a dictionary based on the Glove model and extracted phrase proper nouns: Based on the vector values of the corresponding event proper nouns, construct a co-occurrence matrix by combining the Glove model with proper nouns and industry-specific phrase libraries. Each element in the matrix is represented by X. ij This represents the number of times word i and its context word j co-occur in a context of a specific size. Generally, the smallest unit of this number is 1, but based on the Glove model, the weights are calculated using a decay function decay = 1 / d, based on the distance d between the two keywords in the context. The formula is as follows:

[0040]

[0041] Based on this formula, its loss function is constructed as follows:

[0042]

[0043] After training the machine learning model, a dictionary set that matches or is similar to the event is obtained, and the top 30 with the highest scores are selected.

[0044] Step 4: Expert rule filtering of non-speech opinions: Experts manually screen out words that do not conform to the expression of subjective opinions and remove them from the dictionary;

[0045] Step 5: Extract candidate viewpoint context based on dictionary: Based on the filtered dictionary, after word segmentation by a word segmenter, extract context sentences related to the viewpoint from the event dataset, and consider these data as candidate viewpoints;

[0046] Step Six: Entity Extraction Based on NER Model and Syntactic Dependency Tree: NER, also known as proper name recognition, is a fundamental task in natural language processing with a wide range of applications. Entities generally refer to entities in text that have specific meaning or strong referentiality, typically including names of people, organizations, proper nouns, etc. The NER system extracts entities from unstructured input text and identifies more categories of entities according to business needs.

[0047] In business scenarios, expressing opinions typically consists of the speaker and the content. The speaker includes not only a specific person but also their organization, title, and position. Entity extraction is performed on the content using NER and dependency trees, and the entities are categorized based on their organization, title, and position. The categorization rules utilize a pre-organized code table library, and the importance of entities is determined by comparing them with this library. Examples include authoritative opinions, official opinions, and opinions from influential figures. Important opinions are then extracted from the obtained opinion data and processed systematically, as shown in the table below:

[0048] Example text Classification A university professor said authority Epidemic prevention experts authority A spokesperson for the county government stated official The municipal party secretary said in his speech official CBN Big V Dr. Zeng Big V …… ……

[0049] The parts not described in detail in this article are existing technologies.

[0050] While the specific embodiments of the present invention have been described in detail above, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention, and modifications or variations without creative effort are still within the protection scope of the present invention.

Claims

1. A method for extracting key viewpoints from public opinion events, characterized in that, The steps include the following: Step 1: Aggregate public opinion event data: Describe the theme of online public opinion events in terms of entities, locations, and events, and extract the main keywords accordingly. Combine the extracted keywords with AND, OR, and NOT operations using the main keywords. Search the public opinion database using these keywords to obtain the dataset related to the event. Step 2: Extract the most frequent phrases and words using mutual information and left-right cross-entropy algorithms: Extract phrases and words that appear at least twice in the event dataset from the massive data based on the mutual information and left-right cross-entropy algorithm model. These phrases and words are used as proper nouns representing the corresponding events. The vector values of the extracted proper nouns are calculated using the mutual information and left-right cross-entropy algorithms. Step 3: Construct a dictionary based on the Glove model and extracted phrase proper nouns: Based on the vector values of the corresponding event proper nouns, construct a co-occurrence matrix by combining the Glove model with proper nouns and industry-specific phrase libraries. Each element in the matrix is represented by X. ij This represents the number of times word i and its context word j co-occur in a context of a specific size. The smallest unit of this number is 1, but based on the Glove model, the weights are calculated using the decay function decay=1 / d based on the distance d between two keywords in the context, where the formula is as follows: ； Based on this formula, its loss function is constructed as follows: ； After training the machine learning model, a dictionary set matching the events is obtained, and the top 30 with the highest scores are selected. Step 4: Expert rule filtering of non-speech opinions: Experts manually screen out words that do not conform to the expression of subjective opinions and remove them from the dictionary; Step 5: Extract candidate viewpoint context based on dictionary: Based on the filtered dictionary, after word segmentation by a word segmenter, extract context sentences related to the viewpoint from the event dataset, and consider these data as candidate viewpoints; Step 6: Extract entities based on NER model and syntactic dependency tree: The NER system extracts entities from unstructured input text and identifies more categories of entities according to business needs; The classification rules use a pre-organized code table library, and the importance of an entity is determined by comparing it with the code table library; Important viewpoints can be extracted sequentially from the obtained viewpoint data and then processed in an orderly manner.

2. The method for extracting important viewpoints from public opinion events according to claim 1, characterized in that: The mutual information formula mentioned in step two is: ； The formula for the cross-entropy loss function is: 。 3. The method for extracting important viewpoints from public opinion events according to claim 1, characterized in that: The NER mentioned in step six, also known as proper name recognition, is a fundamental task in natural language processing. Entities refer to entities in text that have specific meaning and strong referentiality, including personal names, organization names, and proper nouns.

Citation Information

Patent Citations

Method for collecting network public feelings viewpoint
CN101408883B
Method for extracting attribute-viewpoint pairs of Chinese viewpoint and evaluation information
CN102637165A
User comment viewpoint extraction and viewpoint tag generation method
CN108363725A
Opinion mining method for ten-million-scale news comments
CN104778209A
Emotion viewpoint information analysis method and device, storage medium and electronic equipment
CN115203412A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Method for collecting network public feelings viewpoint

Method for extracting attribute-viewpoint pairs of Chinese viewpoint and evaluation information

User comment viewpoint extraction and viewpoint tag generation method

Opinion mining method for ten-million-scale news comments

Emotion viewpoint information analysis method and device, storage medium and electronic equipment