Method and device for extracting topic names

An extraction method and technology of an extraction device, which are applied in the information field, can solve the problems of incorrect noun phrase extraction results, low readability of topic names, and inability to accurately represent topic content, etc., and achieve the effect of high readability

Active Publication Date: 2019-12-31
BEIJING GRIDSUM TECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the topic representation method based on word clusters, due to the difficulty of extracting noun phrases in text data, and the extraction of noun phrases is affected by Chinese word segmentation and part-of-speech tagging, there are certain errors in the extraction results of noun phrases , so that the topic representation method based on word clusters will not be abl

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting topic names
  • Method and device for extracting topic names
  • Method and device for extracting topic names

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0024] Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0025] In order to make the advantages of the technical solutions of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and embodiments.

[0026] The embodiment of the present invention provides a topic name extraction method, such as figure 1 As shown, the method includes:

[0027] S101: Obtain the mutual information value correspon...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an extraction method and device for topic names and relates to the field of information technology. The problem of low topic name readability is solved. According to the main technical scheme, mutual information values corresponding to all co-occurrence words in text data are acquired, target co-occurrence words with the mutual information value greater than a preset mutual information value are extracted from the co-occurrence words, similarity values of the target co-occurrence words and topic word clusters of the text data are acquired, and the target co-occurrence words with the similarity value greater than a preset threshold value are determined as the topic names of the text data. The extraction method and device are mainly used for extracting the topic names from the text data.

Description

technical field [0001] The invention relates to the field of information technology, in particular to a method and device for extracting topic names. Background technique [0002] Topic names refer to words or phrases that can represent the focus of text data such as news, microblogs, forums, and blogs. Among them, topic names can represent the core content of text data, so extracting topic names from massive text data is helpful The core content of analyzing text data. [0003] At present, there are mainly two methods for extracting topic names, which are cluster-based extraction methods and topic model-based extraction methods. The topic representation methods of the two topic name extraction methods are word clusters composed of multiple words, each A word cluster can represent a topic. [0004] However, in the topic representation method based on word clusters, due to the difficulty of extracting noun phrases in text data, and the extraction of noun phrases is affected...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 朱波
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products