Public opinion information text processing method for minority language countries

A technology of public opinion information text and processing method, applied in the field of public opinion information text processing in countries with small languages, can solve the problems of concealment of effective information, large volume of text, complex structure, etc. Effect

Pending Publication Date: 2019-03-19
大连瀚闻资讯有限公司
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for political and economic news texts related to countries with small languages, the...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Public opinion information text processing method for minority language countries
  • Public opinion information text processing method for minority language countries
  • Public opinion information text processing method for minority language countries

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] Such as figure 2 As shown, according to the method proposed by the present invention, real-time monitoring and text processing of public opinion information of countries along the “Belt and Road” are carried out. Effective morphemes in the text can be extracted in real time, providing a high-quality data basis for subsequent in-depth data mining of public opinion information such as sentiment analysis or hotspot identification, and greatly improving the efficiency and accuracy of extracting effective information from public opinion texts. And according to the tf-idf value of each word in the text, you can get the hot words of public opinion information. The attached picture shows the hot words of public opinion in countries along the "Belt and Road" route on a certain day.

Embodiment 2

[0048] Such as image 3 As shown, it is the priority assignment process of each keyword in the word segmentation processing in this embodiment, such as Figure 4 As shown, it is some words in the cleaning lexicon in the cleaning process of this embodiment, such as Figure 5 As shown in the figure, it is the hot word ranking result graph after removing non-keywords lower than the set threshold, which can intuitively understand the popularity of hot words in real time.

[0049] The serial numbers of the above embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a public opinion information text processing method for minority language countries. The method comprises the following steps: translating a small language text into English, and translating the English into a Chinese text according to the English text; performing word segmentation processing on the Chinese text based on a hidden Markov model; cleaning all the segmented wordmaterials, and completely cleaning non-linguistic words which are not beneficial to effective information judgment and appear repeatedly away; calculating word frequency values of the word frequencyinverse texts of the segmented words, deleting non-keywords of which the word frequency values are lower than a set threshold value, and establishing an analysis corpus through reserved words. According to the method, the public opinion monitoring range can be greatly improved, and information of countries with various small languages along the line can be monitored and analyzed by the model in real time. According to the method, the newsprint text which is difficult to segment is subjected to high-accuracy word segmentation cleaning. After manual detection, the occurrence rate of non-effective information and the error arrangement rate of the effective information are extremely low.

Description

technical field [0001] The present invention relates to the technical field of text processing, in particular, to a text processing method for public opinion information in minority language countries. Background technique [0002] The existing methods of processing public opinion information on the market mainly focus on the speeches and comments of users on Weibo or other social media. The text itself is small in size and simple in structure. As for political and economic news texts related to countries with small languages, the volume of the texts is large, the structure is complex, and effective information is concealed. Moreover, the targets of public opinion monitoring on the market are basically domestic social media and self-media. The monitoring of public opinion in small-language countries is missing. Contents of the invention [0003] According to the technical problems raised above, a text processing method for public opinion information in small-language co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/28
CPCG06F40/216G06F40/289G06F40/58
Inventor 童友俊
Owner 大连瀚闻资讯有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products