Method for monitoring novel words on Internet

An Internet and novel technology, applied in the field of Internet novel word monitoring, can solve the problems of hot information measurement that cannot be abnormal, no consideration of term history, etc., to achieve the effect of improving efficiency and accurate alarm

Inactive Publication Date: 2010-02-10
PEKING UNIV
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The main disadvantage of using this method is that the history of each term is not considered, so the abnormal hotspot information cannot be accurately measured according to the information entropy of each word, and only the horizontal comparison of each term can be performed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for monitoring novel words on Internet
  • Method for monitoring novel words on Internet
  • Method for monitoring novel words on Internet

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0073] (1) This embodiment takes the financial field as the target field, selects "finance" as the field keyword to collect websites, and stores the collected website list into the database. Table 1 shows a part of it.

[0074] Table 1

[0075] serial number

Link

1

http: / / finance.sina.com.cn /

2

http: / / finance.163.com /

3

http: / / cn.finance.yahoo.com /

4

http: / / finance.sohu.com /

5

http: / / finance.tom.com /

6

http: / / www.jrj.com

7

http: / / www.hexun.com.cn

8

http: / / www.enet.com.cn / finance /

9

http: / / www.qq.com / finance /

10

http: / / news.chinabyte.com /

11

http: / / www.gov.cn / jrzg / zgyw.htm

12

http: / / news.hexun.com /

13

http: / / news.china.com /

14

http: / / msn.ynet.com /

[0076] 15

http: / / www.zaobao.com.sg /

16

http: / / www.xinhua.org /

17

http: / / www.people.com.cn / ...

Embodiment 2

[0115] This embodiment uses a test set to evaluate the effectiveness of the present invention. The test set is taken from the news and messages of various portal websites. Because the invention is based on the novel word monitoring system in the field, the financial field is taken as the research object to evaluate the effectiveness of the invention.

[0116] Since novel words are a brand-new concept, how to evaluate whether the choice is correct and how high the correct rate of these novel words requires a reference standard. Currently, there is no objective and comprehensive standard for evaluation. For this situation, this paper The invention is based on information rankings on portal websites such as Sina Finance and Economics, and is carried out in the manner of manually determining a reference list of a novel vocabulary (due to changes over time, the vocabulary is also constantly changing, and may change every day).

[0117] At present, in the field of information retrie...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for monitoring novel words on Internet and belongs to the field of Internet information mining. The method comprises the following steps: acquiring all articles of thesame day on home pages and subpages of all subordinates directed to by all internal links of the home pages according to a target information source collection; subjecting the articles to word segmentation, assigning weight for each word in each article, using top a words having the largest weight as alternative novel words of the article, and using the number of a plurality of articles as the freq of an alternative novel word on the same day when the word is an alternative novel word of the articles at the same time; calculating the novelty coefficients n of the alternative words according to the records of the freq of the alternative novel words in b days; and determining novelties theta according to the novelty coefficients and determining an alternative novel word as a novel word of the same day when the novelty of the word is greater than a novelty threshold thetat. The method of the invention can effectively find the novel word of the same day and direct the practice for findingand monitoring Internet information.

Description

technical field [0001] The invention relates to a method for monitoring novel words on the Internet, which belongs to the field of Internet information mining. Background technique [0002] As the network has increasingly become the main media for people to publish and communicate information, the network has gradually become a diversified information platform. On this platform, there are both official news and gossip. How to grasp these news at the first time, grasp people's views on these news, and find new focus and new hot spots that people pay attention to has become a natural demand. Both ordinary users and industry experts hope to have an automated tool or method to help them track the latest hot topics or news in the field they are concerned about in real time, so as to understand the latest developments in this field. [0003] It is not difficult to find that, under normal circumstances, a sudden and concentrated appearance of a certain keyword often means the occ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 王超梁循
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products