Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic extraction method of conversation text topic

A text and topic technology, which is applied in the field of automatic topic extraction of dialogue texts, can solve problems such as large spoken vocabulary, chaotic organizational structure, and field limitations, and achieve high-accuracy results

Inactive Publication Date: 2009-12-09
HUAZHONG UNIV OF SCI & TECH
View PDF0 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of the topic extraction applied to dialogue texts is that there are domain restrictions, while dialogue texts on the Internet are open domains, and a large number of knowledge understanding systems need to be manually compiled, so the feasibility is not high
[0010] However, due to the characteristics of network communication dialogue texts, the similarity of words between sentences in the dialogue is relatively low, there are many spoken words, themes are intertwined and the organizational structure is chaotic, resulting in low accuracy of the subject words extracted by the above methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic extraction method of conversation text topic
  • Automatic extraction method of conversation text topic
  • Automatic extraction method of conversation text topic

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The embodiment of the present invention focuses on the dialogue text in the form of an online chat dialogue, and summarizes its three salient features that are different from written language texts: the dialogue text contains a large number of question-answer sentence patterns, the boundaries between dialogues on different topics are blurred, and the theme Intertwined and disorganized. In view of these three characteristics, the embodiment of the present invention performs a series of data preprocessing such as word segmentation and part-of-speech tagging on the dialogue text, and then finds all question-answer pairs from the dialogue text, and combines the question sentences and corresponding answer sentences into The same sentence; then the dialogue text is subject-segmented, and adjacent dialogue sentences belonging to different topics are divided into different language chunks; finally, clustering is performed on adjacent chunk groups belonging to different themes, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an automatic extraction method of a conversation text topic, comprising preprocessing conversation text data and detecting question-answer pairs in the preprocessed conversation text; segmenting topics of the conversation text, clustering language chunk groups after topic segmentation and extracting a topic sentence from the clustered language chunk groups. The conversation text topic is more accurate by adopting the extraction method, enabling a user to search or retrieve conversation records of interest from the extracted topic sentence, thereby improving the user experience.

Description

technical field [0001] The invention relates to the technical fields of computer and communication, in particular to a method for automatically extracting dialogue text topics. Background technique [0002] Network communication has now become an important way of people's daily communication, which provides great convenience for people's communication. At the same time, communication methods such as instant messaging software, online message boards, e-mails, and online conferences generate a large amount of network information data. These data are fundamentally different from web page data. views and attitudes of one or more participants. Therefore, the network dialogue data contains rich information, which can bring great help to people's work and study. For example, it can be used to assist the police in detecting the thoughts and actions of suspects, help psychologists understand the way patients think, and assist anthropologists to explore human behavior patterns. How...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 黄本雄黄毅青胡广温杰
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products