Method and device for mining Internet hot words

A technology of Internet and hot words, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as Internet hot word mining efficiency bottlenecks, and achieve the effect of improving efficiency and ensuring accuracy

Active Publication Date: 2015-06-03
TRS INFORMATION TECH CO LTD
View PDF6 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For this reason, the present invention divides Internet hot words into entity strings and non-entity strings, and pro

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for mining Internet hot words
  • Method and device for mining Internet hot words

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0052] specific implementation plan

[0053] In order to make the objectives, technical methods, and advantages of the embodiments of the present invention clearer, the technical solutions provided by the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings, but they are not intended to limit the present invention.

[0054]Hot words refer to words that are used frequently within a certain period of time and have certain time attributes. Therefore, the embodiment of the present invention builds a background library to store corpus and statistical information before a certain period of time; meanwhile, hot words are divided into entity strings and non-entity strings to better utilize the attribute characteristics of each entity string to perform Training and learning, and use the high-frequency string statistical algorithm to extract candidate non-entity strings; when calculating the popularity, not only the basic w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for mining Internet hot words. The method comprises the following steps: initializing a word graph and a background library; identifying an entity string and a non-entity string; updating a word string statistical index; calculating the popular degree of the word string; and sorting the popular degree of the word string, and outputting the word string. The word string is divided into the entity string and the non-entity string, the entity string and the non-entity string are subjected to targeted division identification, the background library is arranged to realize the incremental updating of corpuses and calculation indexes, and hot word extraction accuracy and efficiency can be improved. Meanwhile, the invention also provides a device for mining the Internet hot words. The device comprises a storage unit, an entity string identification unit, a non-entity string identification unit and a hot word extraction unit, wherein the hot word extraction unit finishes the incremental updating of the statistical index, the calculation of the popular degree of the word string and word string sorting output. The hot words can be orderly, efficiently and accurately extracted.

Description

technical field [0001] The invention relates to natural language processing technology, in particular to a method and device for mining Internet hot words. Background technique [0002] Hot words refer to words that are used frequently within a certain period of time, and often have the characteristics of the times, reflecting hot topics and people's livelihood issues in a certain period of time. In addition to the words included in the dictionary, there are also some hot words on the Internet. These words come from, spread in cyberspace, and are widely used in daily communication, such as "how to give up treatment", "unknown and serious" , "Chen Ou Ti", etc. It is usually difficult for word segmentation systems to identify such words, but Internet hot words appear as a new and important communication phenomenon on the Internet today, and are evolving and changing with the changes of the times. [0003] Internet hot words are closely related to social events or phenomena, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
Inventor 肖诗斌孙丽华
Owner TRS INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products