Automatic hot topic mining system based on internet corpora
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 北京一览群智数据科技有限责任公司
- Publication Date
- 2016-04-13
Smart Images
Figure 1 Figure 2 Figure 3
Abstract
Description
technical field
[0001] The invention relates to an automatic hot topic mining system based on Internet corpus. Background technique
[0002] There are three main methods in existing hot word mining systems: the method based on rule matching, the method based on site statistics and the method based on event detection. The method based on rule matching requires a lot of domain knowledge, and hot words are mined by using manually established hot word matching templates. The method based on site statistical information mainly utilizes the statistical data of site traffic, such as news access logs of portal websites, query logs of search engines, etc., and mines hot words from frequently accessed content. The method based on event detection first uses named entity recognition, high-frequency string statistics and other methods to mine candidate hot words, and then uses related methods of time series analysis to select words with obvious hot trends in the candidate set as the fin...