Method and device for extracting colloquial sentences
A spoken language and sentence technology, applied in the field of information, can solve the problems of time-consuming and laborious, lack of spoken language corpus, disadvantageous corpus system, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0023] Figure 1A It is a flow chart of a colloquial sentence extraction method provided by Embodiment 1 of the present invention. This embodiment is applicable to various colloquial sentence extraction situations, and the method can be executed by the colloquial sentence extraction device provided by the embodiment of the present invention , the device can be implemented in the form of software and / or hardware, and the device can be integrated in any device that provides the function of extracting colloquial sentences, for example, it can be a computer, such as Figure 1A shown, including:
[0024] S110. Count the word frequencies of the words in the movie corpus and the mixed corpus respectively, and sort the words in the movie corpus and the mixed corpus according to the word frequencies.
[0025] Specifically, both the movie corpus and the mixed corpus are obtained from the Internet. Among them, since the movie corpus is derived from the dialogue in the movie, it can be sp...
Embodiment 2
[0072] Figure 2A A flowchart of a colloquial sentence extraction method provided by Embodiment 2 of the present invention. This embodiment is optimized on the basis of the above-mentioned embodiments, and provides optimized word frequency statistics of words in the movie corpus and the mixed corpus, and The processing method for sorting the words in the movie corpus and the mixed corpus according to the word frequency is specifically: according to the reference thesaurus and the jieba word segmentation component, respectively perform word segmentation operations on the sentences in the movie corpus and the mixed corpus to obtain the described Words in the movie corpus and the mixed corpus; counting the word frequency of the words in the movie corpus and the mixed corpus respectively; respectively sorting the words in the movie corpus and the mixed corpus according to the word frequency of the words from high to low.
[0073] Correspondingly, the method of this embodiment incl...
Embodiment 3
[0098] image 3 It is a schematic structural diagram of a colloquial sentence extraction device provided in Embodiment 3 of the present invention. This embodiment is applicable to various colloquial sentence extraction situations, and the method can be executed by the colloquial sentence extraction device provided in the embodiment of the present invention , the device can be implemented in the form of software and / or hardware, and the device can be integrated in any device that provides the function of extracting colloquial sentences, for example, it can be a computer, such as image 3 As shown, it specifically includes: a word frequency statistics module 31 , a spoken language corpus confirmation module 32 and a colloquial sentence extraction module 33 .
[0099] Word frequency statistical module 31, is used for counting the word frequency of word in the movie corpus and the mixed corpus respectively, and sorts the words in the movie corpus and the mixed corpus according to th...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


