Core content mining method and equipment for large-scale voice data

A technology of voice data and core content, applied in the computer field, can solve problems such as low mining efficiency and inconsistent content, and achieve the effect of improving efficiency and accuracy

Active Publication Date: 2018-01-16
BEIJING SINOVOICE TECH CO LTD
View PDF8 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method and device for mining large-scale voice data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Core content mining method and equipment for large-scale voice data
  • Core content mining method and equipment for large-scale voice data
  • Core content mining method and equipment for large-scale voice data

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0041] Example one

[0042] figure 1 It is a step flow chart of a method for mining core content of large-scale voice data provided by the first embodiment of the present invention, such as figure 1 As shown, the method can include:

[0043] Step 101: Convert a large-scale to-be-processed speech data set into a corresponding to-be-processed text data set.

[0044] In the embodiment of the present invention, the large-scale to-be-processed voice data set includes multiple pieces of to-be-processed voice data, and the corresponding to-be-processed text data set includes corresponding multiple pieces of to-be-processed text data. As an example, suppose that a large-scale to-be-processed speech data set includes 3 pieces of to-be-processed speech data, and the corresponding 3 pieces of to-be-processed text data are obtained after conversion, then these 3 pieces of to-be-processed text data constitute the to-be-processed text data set.

[0045] When the to-be-processed voice data set is co...

Example Embodiment

[0059] Example two

[0060] figure 2 It is a flowchart of the steps of another method for mining core content of voice data provided by the second embodiment of the present invention, such as figure 2 As shown, the method can include:

[0061] Step 201: Convert a large-scale to-be-processed speech data set into a corresponding to-be-processed text data set.

[0062] There are many formats of general voice data, such as: MP3 format, WMA format, VMA format, etc. Therefore, the format of the voice data to be processed may be different. In the embodiment of the present invention, the large-scale voice data set to be processed is converted Before corresponding to the text data set to be processed, the format of the voice data to be processed can be unified. For example, the format of all voice data to be processed can be unified into MP3 format, or unified into WMA format, etc., which can facilitate the large The conversion operation of the large-scale voice data set to be processed fu...

Example Embodiment

[0107] Example three

[0108] image 3 It is a large-scale voice data core content mining device provided in the third embodiment of the present invention, such as image 3 As shown, the device 30 may include:

[0109] The conversion module 301 is configured to convert the to-be-processed speech data set into a corresponding to-be-processed text data set;

[0110] The preprocessing module 302 is configured to preprocess the corresponding to-be-processed text data set to obtain the to-be-processed word text set corresponding to the to-be-processed text data set;

[0111] The clustering module 303 is configured to perform text clustering on the to-be-processed word text set by using a text clustering algorithm to obtain at least one corresponding category;

[0112] The determining module 304 is configured to determine the theme corresponding to the at least one category as the core content of the large-scale voice data set to be processed.

[0113] To sum up, the core content mining device...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a core content mining method and equipment for large-scale voice data and belongs to the technical field of computers. According to the core content mining method and equipmentfor the large-scale voice data, which are provided by the embodiments of the invention, a to-be-processed voice data set can be converted into a corresponding to-be-processed text data set, and then text clustering is carried out on a to-be-processed word text set corresponding to the to-be-processed text data set through a text clustering algorithm in order to obtain at least one corresponding category, a subject corresponding to the at least one category is determined as a core content of the to-be-processed voice data set, and a subject of the category where each piece of to-be-processed text data in the to-be-processed text data set belongs is determined as the core content of the to-be-processed text data, namely the core content of each piece of to-be-processed voice data is determined. According to the method and the equipment, mining of the core content of the large-scale voice data is realized under the zero priori condition, and the efficiency and the accuracy rate of miningof the core content are improved.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a method and equipment for mining large-scale voice data within the core. Background technique [0002] At present, some clients usually generate a large amount of voice data due to business reasons. For example, clients involving services such as telephone customer service, live video broadcast, and Internet telephony usually have a large amount of voice data. In order to better understand the user's points of interest or intention, etc., the service provider usually mines the core content of the voice data, so as to provide better services for the user. [0003] In the prior art, when mining the core content of voice data, it is usually necessary to manually listen to the voice data, and then manually summarize the core content based on personal understanding. [0004] However, the manual mining method adopted in the prior art is inefficient, and due to certain d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27G10L15/08
Inventor 王富田李健张连毅武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products