Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for processing text information

A text information and processing method technology, applied in the computer field, can solve problems such as finding and content vary widely

Active Publication Date: 2020-11-27
ADVANCED NEW TECH CO LTD
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the huge amount of UGC, the extremely rapid update, and the wide variety of content, there is no effective way to find useful information from UGC.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for processing text information
  • Method and device for processing text information
  • Method and device for processing text information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] figure 1 Is the flow chart of the method of this embodiment, as figure 1 As shown, the method includes:

[0050] S1: Obtain text information, and filter the text information.

[0051] Obtaining text information includes obtaining text information from user-generated content, preferably, includes obtaining text information from news channels, Weibo channels and forum channels, and using the text content in these channels as text information. Among them, the news channels include Sina, Netease, Sohu, Tencent and "Today's Headlines"; the microblog channels include Sina Weibo; the forum channels include Tianya, Baidu Tieba, and Zhihu. For the news channel, the title text of the news is used as the text information; for the forum channel, the text content of the post is used as the text information. For the Weibo channel, the text content of the Weibo post is used as text information. New text information can be obtained very well through the text information obtained by...

Embodiment 2

[0096]In Embodiment 1, calculating the similarity between text information and text information is accomplished by calculating the similarity between text vectors I corresponding to the text information. However, since the text vector I is a vector containing text features and node features, its dimension is often as high as hundreds of thousands of dimensions, so in the calculation of similarity, it is necessary to calculate the multiplication and addition of hundreds of thousands of numbers, and the calculation load Very big. On the other hand, the storage of such a high-dimensional text vector I also takes up quite a lot of space.

[0097] In this embodiment, the following method is used to calculate the similarity between text information and text information:

[0098] Calculate the minimum count table (Count-Min Sketch) of the text vector I corresponding to the text information;

[0099] Calculate the similarity between the text vector I corresponding to the different t...

Embodiment 3

[0112] The embodiment of the present invention also proposes a text information processing device, which includes:

[0113] a text information filtering device, which acquires text information and filters the text information;

[0114] The text information classification device calculates the similarity of the filtered text information, and classifies the filtered text information into different events according to the similarity;

[0115] A flagging means calculates a metric for each event and flags the event when the metric exceeds a threshold.

[0116] In this embodiment, the text information filtering device is used to obtain and filter the text information, and the text information from the webpage can be obtained, and the text information is classified into different events according to the similarity calculated by the text information classification device, and the marking device calculates each event A metric that flags an event when the metric crosses a threshold. T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text information processing method and a text information processing device. The method comprises the step of acquiring text information and filtering the text information; calculating a similarity of the filtered text information, and classifying the filtered text information into different events according to the similarity; calculating an importance index of each event according to the text information in the events; judging whether the importance index value corresponding to each event exceeds a preset importance threshold value; and if so, marking the event, the importance index value of which exceeds the preset importance threshold value. By use of the method and the device disclosed by the invention, the text information can be automatically filtered and classified into different events, each event is monitored, the event is marked when the index of the event exceeds the threshold value so as to facilitate searching of useful information.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a method and device for processing text information by using a computer. Background technique [0002] With the advent of the wave of informatization and the popularization of the Internet, more and more users publish and exchange various text information on the Internet, and generate more and more User Generated Content (UGC for short). Common UGC includes microblogs, forum posts, news, etc. published by users. Every moment, a large number of new UGCs appear. These new UGCs contain various information, some of which are repetitions of old information, and some are brand new information that is not yet well known to the public. Regardless of whether it is new or old, the information may contain information that meets predetermined conditions, such as information that is highly concerned, and such information that meets predetermined conditions is of great value and i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/9535G06F16/335
CPCG06F16/335G06F16/9535
Inventor 任望
Owner ADVANCED NEW TECH CO LTD