Topic detection method and device based on big data

A topic detection and big data technology, applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as poor timeliness, and achieve the effect of ensuring accuracy and detection efficiency

Active Publication Date: 2013-06-26
亿赞普(北京)科技有限公司
View PDF6 Cites 32 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the advantage of the backtracking detection method is that some text mining algorithms with better effects can be selected to process the collected webpage data offline, so more optimized results can be obtained, but because it processes webpage data in an offline manner, its maximum The shortcoming is that the timeliness is poor; the online detection method is receiving more and more attention, which can meet the needs of real-time detection of hot topics, but due to the constraints of processing time, the algorithm used is generally relatively simple, so it is different from the backtracking detection method. Compared with the detection effect, there is still a certain gap
[0006] In short, a technical problem that needs to be urgently solved by those skilled in the art is: how to solve the sharp contradiction between the accuracy of the detection effect and the timeliness of the topic detection in the case of a large number of webpage texts being updated rapidly in the Internet environment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic detection method and device based on big data
  • Topic detection method and device based on big data
  • Topic detection method and device based on big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0035] Big data, or huge amount of data, refers to the amount of data involved is so large that it cannot be captured, managed, processed, and organized within a reasonable time through current mainstream software tools to help enterprises operate Information with a more positive purpose for decision-making, which is often used in the field of social sentiment statistics such as social public opinion or public opinion statistics to discover hot topics.

[0036] Among them, a hot topic is often a topic that attracts the attention of many users, that is, a topic with high user attention, and its generation is inseparable from the attention of the majority of users. Therefore, user behavior plays an important role in the hot topic det...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a topic detection method and device based on big data. By means of the topic detection method and device, accuracy and timeliness of detection effects can be guaranteed at the same time under the condition that a large amount of web page text is updated quickly in the internet environment. The method includes the following steps: extracting hot web pages according to user network behavior data; collecting contents of the hot web pages; extracting and acquiring web page feature vectors of the hot web pages according to the contents of the hot web pages; clustering the hot web pages according to the web page feature vectors of the hot web pages to acquire corresponding potential hot topic classes; carrying out incremental clustering on newly-added web pages with the potential hot topic classes as seed classes, wherein the newly-added web pages comprise on-line web pages; and judging whether the potential hot topic classes after the incremental clustering are hot topic classes through analyzing corresponding user attention degree parameters of the potential hot topic classes.

Description

technical field [0001] The invention relates to the technical field of Internet information processing, in particular to a method and device for topic detection based on big data. Background technique [0002] With the rapid development of the Internet, the information on the Internet has become more and more diverse and abundant. At the same time, the social influence of Internet public opinion has continued to increase. Many hot social events are disclosed and disseminated in the Internet at the first time. Therefore, its important value is becoming more and more obvious. In the Internet environment, there are a large number of webpage texts in the form of natural language, including news, blogs, forum posts, and emerging microblogs, etc. These webpage texts provide the most basic data source for discovering hot topics. [0003] The TDT (Topic Detection and Tracking, Topic Detection and Tracking) project carried out by the US Department of Defense was the first to conduct...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 罗峰黄苏支李娜
Owner 亿赞普(北京)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products