Unsupervised online public opinion junk long text recognition method
A technology of network public opinion and identification method, applied in the field of information processing, can solve the problems of discounted effect, unusable, low accuracy, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Examples
Embodiment 1
[0049] An unsupervised method for identifying long texts of Internet public opinion garbage, the identification method comprising the following steps:
[0050] (1) Acquisition of corpus: Obtain corresponding marked public opinion spam text and normal text data from the existing internal system;
[0051] (2) Model training: build two models respectively, including the language model based on the network public opinion text training and the BERT next sentence prediction model based on the network public opinion text, and input the long text of the network public opinion to be predicted into the above language model and BERT respectively. In a predictive model;
[0052] The judgment process of the language model is as follows:
[0053] (X1) Statistical language model;
[0054] A statistical language model is used to compute a sentence is the probability of a normal sentence, formalized ,in express sentence The probability, Indicates the first in this sentence A mini...
Embodiment 2
[0063] An unsupervised method for identifying long texts of Internet public opinion garbage, the identification method comprising the following steps:
[0064] (1) Acquisition of corpus: Obtain corresponding marked public opinion spam text and normal text data from the existing internal system;
[0065] (2) Model training: build two models respectively, including the language model based on the network public opinion text training and the BERT next sentence prediction model based on the network public opinion text, and input the long text of the network public opinion to be predicted into the above language model and BERT respectively. In a predictive model;
[0066] The judgment process of the language model is as follows:
[0067] (X1) Statistical language model;
[0068] A statistical language model is used to compute a sentence is the probability of a normal sentence, formalized ,in express sentence The probability, Indicates the first in this sentence A mini...
Embodiment 3
[0083] An unsupervised method for identifying long texts of Internet public opinion garbage, the identification method comprising the following steps:
[0084] (1) Acquisition of corpus: Obtain corresponding marked public opinion spam text and normal text data from the existing internal system;
[0085] (2) Model training: build two models respectively, including the language model based on the network public opinion text training and the BERT next sentence prediction model based on the network public opinion text, and input the long text of the network public opinion to be predicted into the above language model and BERT respectively. In a predictive model;
[0086] The judgment process of the language model is as follows:
[0087] (X1) Statistical language model;
[0088] A statistical language model is used to compute a sentence is the probability of a normal sentence, formalized ,in express sentence The probability, Indicates the first in this sentence A mini...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com