Title high-frequency segmentation-based news hotspot phrase extraction method

An extraction method and news technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as too short phrases, unable to satisfy users to quickly understand hot events, semantics, incorrect references, etc., and achieve improvement efficiency effect

Active Publication Date: 2018-01-09
贵州耕云科技有限公司
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the abstract method is to extract multi-document abstracts, which has the following disadvantages: because the length of the multi-document abstract itself is too long, and the splicing order of the summary sentences of each document cannot be determined, and the abstract may have incorrect semantics and references. So this method cannot be effectively applied to this scenario
It has the following disadvantages: due to limitations in the description of words, complete information cannot be accurately expressed like sentence phrases, and there are different syntactic combinations for multiple words. The same keywords or tags may be different due to different order of user understanding produce a completely different meaning
It has the following disadvantages: most of the phrases are too short and can only be regarded as phrases to a certain extent, and in order to describe a topic completely, multiple phrases are still needed, so fragmented descriptions still cannot meet the user's reading habits
[0004] It can be seen that the current various ways of summarizing topic content have certain limitations, and cannot provide a brief and accurate summary of hot topics, and cannot meet the needs of users to quickly understand current hot events happening in real time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Title high-frequency segmentation-based news hotspot phrase extraction method
  • Title high-frequency segmentation-based news hotspot phrase extraction method
  • Title high-frequency segmentation-based news hotspot phrase extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] In order to make the technical problems, technical solutions and beneficial effects solved by the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0052] In the prior art, the method of describing topic content by extracting keywords or tags has the disadvantage that the description of words has limitations, cannot accurately express complete information like phrases, and there are different combinations for multiple words. The same keywords or tags may have completely different meanings due to the different understanding order of users. The traditional method of describing topic content by extracting phrases has the disadvantage that the current phrases are generated by first segmenting and then combining them...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a title high-frequency segmentation-based news hotspot phrase extraction method. The method comprises the following steps of extracting a news title for each hotspot topic class; performing word segmentation on the news title, performing statistics on a word frequency of each segmented word, and screening out first n segmented words with the highest word frequencies to serveas a high-frequency word set; searching for a high-frequency segmentation boundary of the news title by using the high-frequency word set, and according to the segmentation boundary, performing segmentation on the news title to obtain candidate phrases; obtaining a candidate phrase set; and performing evaluation on each candidate phrase in the candidate phrase set, and performing screening to obtain the candidate phrases with the highest evaluation indexes, thereby serving as optimal phrases. The method has the advantages that one hotspot phrase which describes topic contents accurately in asimplified way can be extracted for each hotspot topic; a solution is provided for quick summarization and effective display of the hotspot topic contents of current news; and the information displayefficiency and the information obtaining efficiency of a user are improved.

Description

technical field [0001] The invention belongs to the technical field of news text data mining, in particular to a method for extracting news hot phrases based on high-frequency segmentation of titles. Background technique [0002] With the explosive growth of Internet news data, how to identify current real-time news hotspots and present them to users has become more and more important. Based on this purpose, the detection technology of news hot topics was developed. However, the amount of news detected in the same hot topic may still be huge for users and cannot be directly presented to users. How to briefly and accurately summarize hot topics according to the news in the topic has become a key issue. Users only need to By reading the summarized topic description, you can quickly understand the current hot events happening in real time. [0003] At present, the main methods of generalizing topic content can be roughly divided into: abstract type, tag type, keyword type, an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 黄瑞章刘于雷梁山雪
Owner 贵州耕云科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products