Public opinion hot word finding method based on keyword weighting algorithm

A public opinion and part-of-speech technology, applied in the field of hot word discovery based on keyword weighting algorithm, to achieve the effect of improving accuracy and ensuring real-time performance

Inactive Publication Date: 2017-09-12
CHANGZHOU PUSHI INFORMATION TECH +1
View PDF0 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] The present invention mainly solves the problems and deficiencies of the existing hot word discovery methods in the self-media era, and provides a hot word discovery method based on the keyword weighted TF-IDF algorithm to solve the problem of hot word discovery under massive public opinion information. Efficiency and accuracy issues, so as to achieve efficient and accurate identification of hot words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Public opinion hot word finding method based on keyword weighting algorithm
  • Public opinion hot word finding method based on keyword weighting algorithm
  • Public opinion hot word finding method based on keyword weighting algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The invention discloses a method for discovering hot words based on keyword weighted TF-IDF for mass public opinion information, including:

[0038] A public opinion corpus, which stores a large amount of preprocessed public opinion information captured from the Internet;

[0039] A filter lexicon, which is divided into two parts: part-of-speech filter table and word-meaning filter table, which is used for part-of-speech such as auxiliary words, prepositions, conjunctions and other function words, adjectives expressing modification and adverbs showing the degree of representation, collocations of numerals and quantifiers in word segmentation results, etc. Words that have no actual meaning are filtered;

[0040] An IDF table is used to store the inverse document frequency of words or phrases, and realize dynamic updating;

[0041] A part-of-speech weight table is used to store the weights of different parts of speech. The weight level ranges from 1 to 5, increasing seq...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hot word finding method, and particularly relates to a public opinion hot word finding method based on a keyword weighting algorithm. According to the public opinion hot word finding method based on the keyword weighting algorithm, a Chinese word segmentation tool is utilized to conduct preliminary word segmentation on massive public opinion information, part-of-speech tagging is provided, an IDF table, a filter word table and a part-of-speech weighting value table are combined at the same time, according to a weighting type TF-IDF algorithm, a candidate word popularity value is calculated, the calculation is not only relied on word frequency, instead, effective information contained in part-of-speech, position and the like of a word is taken into full account, and reliability basis is provided for hot word recognition. In addition, in the public opinion hot word finding method based on the keyword weighting algorithm, the characteristic that the public opinion has a distinct topic and theme under a we media time is taken into full account, corpus processing is mainly conducted on the public opinion topic, and the problem of the efficiency of the hot word recognition under massive public opinion information is solved. Finally, dynamic incremental type updating is achieved for the IDF table, the real-time performance of the word inverse document frequency is guaranteed, and the accuracy of the hot word recognition is improved.

Description

technical field [0001] The invention relates to a hot word discovery method, in particular to a hot word discovery method based on a keyword weighting algorithm. technical background [0002] With the popularization and rapid development of the Internet, a large amount of news data emerges on the Internet every day. On the other hand, the emergence of self-media such as microblogs, blogs, and forums has transformed the publishers of information on the Internet from professional news media reporters into ordinary netizens of all walks of life, and the general public has also changed from passive information receivers in the past to current ones. Disseminator of information. As a result, Internet terms have become more and more colorful, such as "Gili", "Diaosi", "Lie Gun" and other new words emerging one after another. Under such circumstances, how to mine hot words in the complicated network information, how to obtain popular new entries and new concepts, and then effectiv...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/951G06F40/289
Inventor 赵一昕李华康杨天若杨天楚
Owner CHANGZHOU PUSHI INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products