Core content mining method and equipment for large-scale voice data
A technology of voice data and core content, applied in the computer field, can solve problems such as low mining efficiency and inconsistent content, and achieve the effect of improving efficiency and accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0041] Example one
[0042] figure 1 It is a step flow chart of a method for mining core content of large-scale voice data provided by the first embodiment of the present invention, such as figure 1 As shown, the method can include:
[0043] Step 101: Convert a large-scale to-be-processed speech data set into a corresponding to-be-processed text data set.
[0044] In the embodiment of the present invention, the large-scale to-be-processed voice data set includes multiple pieces of to-be-processed voice data, and the corresponding to-be-processed text data set includes corresponding multiple pieces of to-be-processed text data. As an example, suppose that a large-scale to-be-processed speech data set includes 3 pieces of to-be-processed speech data, and the corresponding 3 pieces of to-be-processed text data are obtained after conversion, then these 3 pieces of to-be-processed text data constitute the to-be-processed text data set.
[0045] When the to-be-processed voice data set is co...
Example Embodiment
[0059] Example two
[0060] figure 2 It is a flowchart of the steps of another method for mining core content of voice data provided by the second embodiment of the present invention, such as figure 2 As shown, the method can include:
[0061] Step 201: Convert a large-scale to-be-processed speech data set into a corresponding to-be-processed text data set.
[0062] There are many formats of general voice data, such as: MP3 format, WMA format, VMA format, etc. Therefore, the format of the voice data to be processed may be different. In the embodiment of the present invention, the large-scale voice data set to be processed is converted Before corresponding to the text data set to be processed, the format of the voice data to be processed can be unified. For example, the format of all voice data to be processed can be unified into MP3 format, or unified into WMA format, etc., which can facilitate the large The conversion operation of the large-scale voice data set to be processed fu...
Example Embodiment
[0107] Example three
[0108] image 3 It is a large-scale voice data core content mining device provided in the third embodiment of the present invention, such as image 3 As shown, the device 30 may include:
[0109] The conversion module 301 is configured to convert the to-be-processed speech data set into a corresponding to-be-processed text data set;
[0110] The preprocessing module 302 is configured to preprocess the corresponding to-be-processed text data set to obtain the to-be-processed word text set corresponding to the to-be-processed text data set;
[0111] The clustering module 303 is configured to perform text clustering on the to-be-processed word text set by using a text clustering algorithm to obtain at least one corresponding category;
[0112] The determining module 304 is configured to determine the theme corresponding to the at least one category as the core content of the large-scale voice data set to be processed.
[0113] To sum up, the core content mining device...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap