Method and device for mining hot-spot words

A technology of hot words and word clusters, which is applied in the field of computer communication, can solve problems such as heavy workload, small mining range of hot word lists, and difficulty in reflecting a hot event or topic, so as to improve mining efficiency and expand mining range Effect

Inactive Publication Date: 2013-04-17
TENCENT TECH (SHENZHEN) CO LTD
View PDF4 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] It can be seen from the above that the existing methods for mining hot words need to manually organize the hot word vocabulary, which is a heavy workload; at the same time, a large number of newly emerging names, place names, and organization names may be unregistered words, that is, they have not been sorted into hot words However, these words are often the main part of a hot event or topic, so that the mining scope of the hot word vocabulary based on manual sorting is small, and such hot events or topics cannot be mined, which makes hot word mining more efficient. Furthermore, many hot words, such as Beijing, movie, gossip, etc., are often words with high frequency. Since this word will be included in multiple events, especially on the Weibo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for mining hot-spot words
  • Method and device for mining hot-spot words
  • Method and device for mining hot-spot words

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0063] In an existing method for mining hot words, after matching the candidate word set with the hot word vocabulary, the N hot candidate words with the highest frequency are output as hot words. Due to the long update period of the hot word list, more hot words in the candidate word set are filtered by the hot word list, so that the mining range of hot words is small and the mining efficiency is low. In the embodiment of the present invention, consider and record the historical frequency of each candidate word in the candidate word set, calculate its frequency anomaly degree in conjunction with the current frequency of the candidate word, and mine hot words through the frequency anomaly degree, so that the hot word and hot word word of e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for mining hot-spot words. The method includes acquiring input text streams; subjecting the text streams to word segmentation to obtain a candidate word set; accounting a current frequency that each candidate word in the candidate word set appears in the text streams to acquire each history frequency of each candidate word in prestored history data; and calculating a frequency abnormity value of the candidate word according to the current frequency and each history frequency of the candidate word, storing a current frequency message of the candidate word in the history data, and outputting a preset number of candidate words with abnormal frequencies. By means of the method and the device for mining the hot-spot words, mining ranges of the hot-spot words can be extended, and mining efficiencies of the hot-spot words can be improved.

Description

technical field [0001] The invention relates to computer communication technology, in particular to a method and device for mining hot words. Background technique [0002] With the development of computer communication technology, especially the development of 3G network and intelligent mobile terminals, users' network life is becoming more and more abundant, such as chatting, browsing news, watching movies, playing games, searching, shopping, and publishing information on the Internet. It is increasingly becoming a part of online life. For example, microblog (MicroBlog), that is, as a platform for information sharing, dissemination and acquisition based on user relationships, users can form personal communities through WEB, WAP and various clients, update information with about 140 characters, and Realize instant sharing. [0003] Due to the abundance of network content, it takes more and more time for network users to obtain relevant information. In order to improve the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 罗侃陈洪亮杨志峰
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products