Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Topic extraction method based on news text

An extraction method and text technology, applied in the field of news text-based topic extraction, can solve problems such as inability to express clustering results, high order correlation of single-pass algorithm, and unclear clustering results

Active Publication Date: 2016-02-24
TIANYUN RONGCHUANG DATA TECH BEIJING CO LTD
View PDF4 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These clustering algorithms have various shortcomings. The single-pass algorithm has a great correlation with the order of article input; the KNN algorithm has a high time complexity; and the k-means algorithm must determine the number of clusters in advance. number, but this is usually more difficult
Moreover, the clustering results obtained by these algorithms after the clustering is completed are often incomprehensible, and we cannot use specific words or content to express the clustering results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic extraction method based on news text
  • Topic extraction method based on news text
  • Topic extraction method based on news text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The "a news text-based topic extraction method" of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0047] The present invention provides a method for extracting news text topics. Firstly, the user-defined dictionary is expanded by means of new word identification or manual addition, so as to ensure that the extracted words can cover commonly used words and new words in the news field, and then the text is Carry out word segmentation, word frequency statistics, document frequency statistics, etc., use these information to calculate the weight of each word in the news text, get the sequence of subject words in the text set, and finally use two subject words to represent a topic, and condense the text belonging to this topic Get into this hot topic and get the final results. as attached figure 1 As shown, the specific steps are as follows:

[0048] 1. User dictionary expansion. User-defined ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a topic extraction method based on a news text. The topic extraction method comprises the following operating steps: 1, setting a user defined dictionary, preprocessing the text and generating a word text, part of speech and participle series vectors; 2, carrying out weight calculation and ranking according to word information such as a word frequency, and extracting the words in higher weight as a subject term series of an article from news titles and contents; 3, acquiring a hotspot subject term series gathered by texts according to the weights of subject terms of all articles; 4, forming a subject term expression vector of a topic by the collection of the subject terms; 5, gathering topics by utilizing the inclusion relation of clue words in the titles, the subject terms and the contents of the articles as well as the subject term expression vector of the topic, finally, obtaining a plurality of hotspot topics. In the topic extraction method, the subject terms of the article are utilized to express the topic of the article and complete relevant article gathering, finally, two clue words are utilized to express the one current hotspot news topic.

Description

technical field [0001] The invention relates to the fields of natural language processing and artificial intelligence, in particular to a news text-based topic extraction method. Background technique [0002] With the popularization of the Internet, there are more and more ways for people to obtain information, and the Internet has gradually become the carrier of various information in society. Especially with the continuous development of China's economy, online news products have gradually become an important channel for people to obtain information, and more and more people obtain real-time news and related information through the Internet. Web text has become an important source of information for us, and a large number of news hotspots are generated every day. How to obtain current hot topics from massive news texts has become a necessary basic technology for news text processing. [0003] The general text clustering technology mostly adopts the automatic text clusteri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/374
Inventor 雷涛吕慧张鹏起
Owner TIANYUN RONGCHUANG DATA TECH BEIJING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products