Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Core content mining method and equipment for large-scale voice data

A technology of voice data and core content, applied in the computer field, can solve problems such as low mining efficiency and inconsistent content, and achieve the effect of improving efficiency and accuracy

Active Publication Date: 2018-01-16
BEIJING SINOVOICE TECH CO LTD
View PDF8 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method and device for mining large-scale voice data in the core, so as to solve the problems of low mining efficiency and inconsistent content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Core content mining method and equipment for large-scale voice data
  • Core content mining method and equipment for large-scale voice data
  • Core content mining method and equipment for large-scale voice data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] figure 1 It is a flow chart of the steps of a method for mining the core content of large-scale speech data provided by Embodiment 1 of the present invention, as figure 1 As shown, the method may include:

[0043] Step 101, converting a large-scale speech data set to be processed into a corresponding text data set to be processed.

[0044] In the embodiment of the present invention, the large-scale voice data set to be processed includes multiple pieces of voice data to be processed, and the corresponding text data set to be processed includes corresponding pieces of text data to be processed. For example, assuming that the large-scale speech data set to be processed includes 3 pieces of speech data to be processed, and the corresponding 3 pieces of text data to be processed are obtained after conversion, then these 3 pieces of text data to be processed constitute the text data set to be processed.

[0045] When converting the speech data set to be processed into the ...

Embodiment 2

[0060] figure 2 It is a flow chart of the steps of another core content mining method for voice data provided in Embodiment 2 of the present invention, as figure 2 As shown, the method may include:

[0061] Step 201, converting a large-scale speech data set to be processed into a corresponding text data set to be processed.

[0062] The format of general voice data has multiple, for example: MP3 format, WMA format and VMA format etc., so the format of voice data to be processed may be different, in the embodiment of the present invention, after converting large-scale voice data set to be processed Before the corresponding text data set to be processed, the format of the voice data to be processed can be unified. For example, the format of all the voice data to be processed can be unified into MP3 format, or into WMA format, etc., which can facilitate the large Scale the conversion operation of the speech data set to be processed, thereby improving the accuracy of the conve...

Embodiment 3

[0108] image 3 It is a core content mining device for large-scale speech data provided by Embodiment 3 of the present invention, such as image 3 As shown, the device 30 may include:

[0109] Conversion module 301, for converting the speech data set to be processed into a corresponding text data set to be processed;

[0110] A preprocessing module 302, configured to preprocess the corresponding text data set to be processed to obtain a word text set to be processed corresponding to the text data set to be processed;

[0111] A clustering module 303, configured to perform text clustering on the word text set to be processed by a text clustering algorithm to obtain at least one corresponding category;

[0112] The determination module 304 is configured to determine the subject corresponding to the at least one category as the core content of the large-scale speech data set to be processed.

[0113] In summary, the core content mining device for large-scale speech data provid...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a core content mining method and equipment for large-scale voice data and belongs to the technical field of computers. According to the core content mining method and equipmentfor the large-scale voice data, which are provided by the embodiments of the invention, a to-be-processed voice data set can be converted into a corresponding to-be-processed text data set, and then text clustering is carried out on a to-be-processed word text set corresponding to the to-be-processed text data set through a text clustering algorithm in order to obtain at least one corresponding category, a subject corresponding to the at least one category is determined as a core content of the to-be-processed voice data set, and a subject of the category where each piece of to-be-processed text data in the to-be-processed text data set belongs is determined as the core content of the to-be-processed text data, namely the core content of each piece of to-be-processed voice data is determined. According to the method and the equipment, mining of the core content of the large-scale voice data is realized under the zero priori condition, and the efficiency and the accuracy rate of miningof the core content are improved.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a method and equipment for mining large-scale voice data within the core. Background technique [0002] At present, some clients usually generate a large amount of voice data due to business reasons. For example, clients involving services such as telephone customer service, live video broadcast, and Internet telephony usually have a large amount of voice data. In order to better understand the user's points of interest or intention, etc., the service provider usually mines the core content of the voice data, so as to provide better services for the user. [0003] In the prior art, when mining the core content of voice data, it is usually necessary to manually listen to the voice data, and then manually summarize the core content based on personal understanding. [0004] However, the manual mining method adopted in the prior art is inefficient, and due to certain d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27G10L15/08
Inventor 王富田李健张连毅武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products