A hot event aggregation method and device

A technology of hot events and aggregation methods, applied in the field of information processing, can solve the problems of similarity expression defects, affecting text similarity, inaccurate similarity judgment, etc., and achieve the effect of good aggregation effect.

Active Publication Date: 2021-05-25
BEIJING QIYI CENTURY SCI & TECH CO LTD
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, since TF-IDF does not take into account the influence of the context of the text, it will have some defects in the expression of similarity, and will bring the following disadvantages to the application of aggregation hotspot events:
[0005] 1. Since the calculation of TF-IDF is more dependent on the size and quality of the corpus, the larger the corpus, the better the quality, and the more accurate the calculated TF-IDF is, but it will cost a lot in the process of preparing the corpus
[0006] 2. The calculated TF-IDF is based on the assumption of independence between words, so the corresponding word weights obtained are also independent of each other, but in the actual text, the relationship between words in the text is also close, which directly affects Calculation of subsequent text similarity
[0008] In addition, due to the independence of TF-IDF's own words, when evaluating the similarity between reports and events, there will be a problem of ignoring the emphasis of the text itself, resulting in inaccurate discrimination of similarity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A hot event aggregation method and device
  • A hot event aggregation method and device
  • A hot event aggregation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0073] refer to figure 1 , which shows a flow chart of steps of an embodiment of a method for aggregating hotspot events in the present invention, which may specifically include the following steps:

[0074] Step 101, obtaining an original report based on the title of the hot event;

[0075] In practical applications, the search engine can obtain hot events from the hot search list, and can also dig out hot events from data with a sharp number of query hits. Certainly, hot events can also be determined in other ways. This is not limited.

[0076] In this embodiment of the present invention, there may be one hot event, that is, there may also be one title of the hot event.

[0077] In a preferred embodiment of the present invent...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the present invention provides a hot event aggregation method and device, the method comprising: obtaining an original report based on the title of the hot event; determining a seed report based on the title of the hot event and the original report and multiple non-seed reports; use the seed report to generate hot event clusters; calculate the similarity between each non-seed report and the title of the hot event and each report in the hot event cluster; obtain the non-seed report with the highest similarity Seed report; determine whether the similarity of the non-seed report with the highest similarity is greater than a similarity threshold; if so, store the non-seed report with the highest similarity in the hotspot event cluster. The embodiment of the present invention introduces seed reports, hot events, and the similarity between reports in the aggregation process, so that the clustering algorithm can focus more on the event itself, measure the similarity of the text more accurately, and obtain better aggregation effects.

Description

technical field [0001] The present invention relates to the technical field of information processing, in particular to a hot event aggregation method and a hot event aggregation device. Background technique [0002] Hot event aggregation is an important basic technology of NLP (natural language processing, natural language processing), which plays an important role in recommendation, search, bubble and other businesses. [0003] According to the aggregation of reports related to hot events, most of them currently use the TF-IDF word weight clustering method to achieve a certain effect on the similarity between related reports. After the text is segmented, TF-IDF is calculated as the weight of the corresponding word. After the word vector is generated, the similarity is calculated according to the cosine distance, and then the corresponding reports are aggregated according to the similarity between the texts through the related clustering algorithm. [0004] However, since ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/9535G06F16/35G06F40/258G06F40/289G06F40/30
CPCG06F40/258G06F40/289G06F40/30
Inventor 张轩玮
Owner BEIJING QIYI CENTURY SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products