Dialogue topic partitioning method and system based on context correlation

A context and correlation technology, applied in semantic analysis, special data processing applications, instruments, etc., can solve the problems that the method of detecting topic transfer has not yet appeared, and achieve the effect of stable reliability, strong reliability, and high test accuracy

Active Publication Date: 2017-12-15
SHANDONG NORMAL UNIV
View PDF6 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the topic tracking of the dialogue system is carried out based on the open domain text correlation, and the topic boundary is determined by calculating the correl

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dialogue topic partitioning method and system based on context correlation
  • Dialogue topic partitioning method and system based on context correlation
  • Dialogue topic partitioning method and system based on context correlation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] This embodiment provides a method for segmenting dialogue topics based on contextual correlation, including the following steps:

[0056] Step 1: Collect multiple rounds of dialogue data, and randomly sample them to obtain a training data set;

[0057] Step 2: Carry out vectorization processing on the training data set to obtain the corresponding corpus vector space of the training data set;

[0058] Step 3: organizing the corpus vector space into a sequence of sentences;

[0059] Step 4: Calculate the correlation between adjacent sentences;

[0060] Step 5: Identify the topic boundaries of multiple rounds of dialogue data based on the correlation between adjacent sentences to form a topic segmentation model.

[0061] Optionally, step 6 is also included: testing the topic segmentation model with a verification data set. The verification data set is obtained by randomly sampling the collected dialogue data of multiple rounds.

[0062] Optionally, step 7 is also inclu...

Embodiment 2

[0114] The purpose of this embodiment is to provide a dialogue topic segmentation system based on context information.

[0115] In order to achieve the above object, the present invention adopts the following technical scheme:

[0116] A dialogue topic segmentation system based on context information, comprising a processor and a computer-readable storage medium, the processor is used to implement instructions; the computer-readable storage medium is used to store multiple instructions, and the instructions are suitable for being loaded by the processor and Perform the following processing:

[0117] Step 1: Collect multiple rounds of dialogue data, and randomly sample them to obtain a training data set;

[0118] Step 2: Carry out vectorization processing on the training data set to obtain the corresponding corpus vector space of the training data set;

[0119] Step 3: organizing the corpus vector space into a sequence of sentences;

[0120] Step 4: Calculate the correlation...

Embodiment 3

[0123] The purpose of this embodiment is to provide a computer-readable storage medium.

[0124] In order to achieve the above object, the present invention adopts the following technical scheme:

[0125] A computer-readable storage medium, on which a computer program is stored for dialogue topic segmentation based on context information, the program performs the following steps when executed by a processor:

[0126] Step 1: Collect multiple rounds of dialogue data, and randomly sample them to obtain a training data set;

[0127] Step 2: Carry out vectorization processing on the training data set to obtain the corresponding corpus vector space of the training data set;

[0128] Step 3: organizing the corpus vector space into a sequence of sentences;

[0129] Step 4: Calculate the correlation between adjacent sentences;

[0130] Step 5: Identify the topic boundary of multiple rounds of dialogue data according to the correlation between adjacent sentences, form a topic segmen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a dialogue topic partitioning method and system based on the context correlation. The method comprises the steps of collecting multi-dialogue data and performing random sampling to obtain a training data set; performing vectoring processing on the training data set to obtain a corpus vector space corresponding to the training data set; arranging the corpus vector space into a sentence sequence; calculating the correlation between adjacent sentences; according to the correlation between the adjacent sentences, recognizing the topic boundary of the multi-dialogue data to form a topic partitioning model, and achieving topic partitioning of the multi-dialogue data. The topic partitioning method has the advantages of being high in testing accuracy, high in reliability and stable.

Description

technical field [0001] The invention relates to the field of data mining, in particular to building a dialog topic tracking system based on the correlation of words and sentences in a dialog context. Background technique [0002] The core task of the human-computer dialogue system is to generate answer sentences according to the historical dialogue information. The key to accomplishing this task is topic tracking. Topic tracking is responsible for detecting topic changes in the entire dialogue process and realizing topic segmentation. In the process of generating answer sentences, the system can generate relevant sentences or topic guidance sentences according to the current topic, so that the dialogue system will not appear "answers that are not asked". The basis of topic segmentation is of course the chat content in the dialogue system, which provides a very important reference for searching and generating answer sentences. However, these historical dialogue materials ha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/30
Inventor 王红何天文胡晓红于晓梅周莹房有丽孟广婷狄瑞彤刘海燕王露潼王倩宋永强
Owner SHANDONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products