An Automatic Hot Topic Mining System Based on Internet Corpus
A hot topic and automatic mining technology, applied in unstructured text data retrieval, instrumentation, computing, etc., can solve problems such as poor scalability and non-reusable matching templates
Active Publication Date: 2019-01-22
北京一览群智数据科技有限责任公司
View PDF5 Cites 0 Cited by 
- Summary
- Abstract
- Description
- Claims
- Application Information
 AI Technical Summary 
Problems solved by technology
The method based on rule matching requires a lot of prior knowledge. Although the accuracy is high, the scalability is poor, and matching templates in different fields cannot be reused; the method based on site statistics needs to collect a large number of logs based on a large number of user groups  , these data cannot be obtained by small and medium-sized companies or research institutes; the method based on event detection first needs to generate high-quality candidate words, because the information on the Internet is changing with each passing day, and new words emerge in an endless stream, the problem of unregistered words is a challenge for this method
Method used
 the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine 
View moreImage
Smart Image Click on the blue labels to locate them in the text.
Smart ImageViewing Examples
Examples
 Experimental program 
 Comparison scheme 
 Effect test 
Embodiment Construction
 the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine 
 Login to View More PUM
 Login to View More
 Login to View More Abstract
The invention discloses an automatic hot topic mining system based on internet corpora. The system is composed of two routes: 1) crawling hot words of existing hot word statistics sites, and generating a series of hot topics through the steps of clustering, entity extraction and key word mining; and 2) extracting n-gram from massive news documents, mining high-frequency hot words from the massive news documents by calculating mutual information and conditional entropy values of the n-gram, and recognizing new topics by using an event detection method based on a time sequence. By adopting the system, not only can current hot events be mined in real time, but also can relevant keywords and named entities of a hot topic be mined when the topic is generated.
Description
technical field   The invention relates to an automatic hot topic mining system based on Internet corpus.   Background technique   There are three main methods in existing hot word mining systems: the method based on rule matching, the method based on site statistics and the method based on event detection.  The method based on rule matching requires a lot of domain knowledge, and hot words are mined by using manually established hot word matching templates.  The method based on site statistical information mainly utilizes the statistical data of site traffic, such as news access logs of portal websites, query logs of search engines, etc., and mines hot words from frequently accessed content.  The method based on event detection first uses named entity recognition, high-frequency string statistics and other methods to mine candidate hot words, and then uses related methods of time series analysis to select words with obvious hot trends in the candidate set as the final  result.   Th...
Claims
 the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine 
 Login to View More Application Information
 Patent Timeline 
 Login to View More
 Login to View More  Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/9535
CPCG06F16/35G06F16/9535
 Inventor 窦志成文继荣江政宝
 Owner 北京一览群智数据科技有限责任公司
Features
- R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
Why Patsnap Eureka
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
 Learn More Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com



