Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for extracting colloquial sentences

A spoken language and sentence technology, applied in the field of information, can solve the problems of time-consuming and laborious, lack of spoken language corpus, disadvantageous corpus system, etc.

Active Publication Date: 2020-04-21
GUANGZHOU SHIYUAN ELECTRONICS CO LTD
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The method of user-defined spoken language corpus is time-consuming and laborious, and it has personal factors and lacks authority. The lack of a systematic spoken language corpus is not conducive to improving the entire corpus system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting colloquial sentences
  • Method and device for extracting colloquial sentences
  • Method and device for extracting colloquial sentences

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] Figure 1A It is a flow chart of a colloquial sentence extraction method provided by Embodiment 1 of the present invention. This embodiment is applicable to various colloquial sentence extraction situations, and the method can be executed by the colloquial sentence extraction device provided by the embodiment of the present invention , the device can be implemented in the form of software and / or hardware, and the device can be integrated in any device that provides the function of extracting colloquial sentences, for example, it can be a computer, such as Figure 1A shown, including:

[0024] S110. Count the word frequencies of the words in the movie corpus and the mixed corpus respectively, and sort the words in the movie corpus and the mixed corpus according to the word frequencies.

[0025] Specifically, both the movie corpus and the mixed corpus are obtained from the Internet. Among them, since the movie corpus is derived from the dialogue in the movie, it can be sp...

Embodiment 2

[0072] Figure 2A A flowchart of a colloquial sentence extraction method provided by Embodiment 2 of the present invention. This embodiment is optimized on the basis of the above-mentioned embodiments, and provides optimized word frequency statistics of words in the movie corpus and the mixed corpus, and The processing method for sorting the words in the movie corpus and the mixed corpus according to the word frequency is specifically: according to the reference thesaurus and the jieba word segmentation component, respectively perform word segmentation operations on the sentences in the movie corpus and the mixed corpus to obtain the described Words in the movie corpus and the mixed corpus; counting the word frequency of the words in the movie corpus and the mixed corpus respectively; respectively sorting the words in the movie corpus and the mixed corpus according to the word frequency of the words from high to low.

[0073] Correspondingly, the method of this embodiment incl...

Embodiment 3

[0098] image 3 It is a schematic structural diagram of a colloquial sentence extraction device provided in Embodiment 3 of the present invention. This embodiment is applicable to various colloquial sentence extraction situations, and the method can be executed by the colloquial sentence extraction device provided in the embodiment of the present invention , the device can be implemented in the form of software and / or hardware, and the device can be integrated in any device that provides the function of extracting colloquial sentences, for example, it can be a computer, such as image 3 As shown, it specifically includes: a word frequency statistics module 31 , a spoken language corpus confirmation module 32 and a colloquial sentence extraction module 33 .

[0099] Word frequency statistical module 31, is used for counting the word frequency of word in the movie corpus and the mixed corpus respectively, and sorts the words in the movie corpus and the mixed corpus according to th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and device for extracting colloquial statements. The method comprises the steps of conducting statistics on word frequencies of words and expressions in a film corpus and word frequencies of words and expressions in a mixed corpus separately, and ranking the words and expressions in the film corpus and the words and expressions in the mixed corpus according to the word frequencies; calculating the difference degree of the words and expressions in the film corpus and the words and expressions in the mixed corpus according to word frequency and rank information of the words and expressions, and determining a colloquial corpus according to the difference degree; extracting the colloquial statements in the mixed corpus according to the colloquial corpus. According to the method and device for extracting the colloquial statements, the colloquial corpus is determined through the respective statistics of the word frequencies and rank information of the words and expressions in the film corpus and in the mixed corpus, the colloquial statements in the mixed corpus are extracted by adopting the colloquial corpus, the problem is solved that in the prior art, self-defining a colloquial corpus by a user consumes time and labor, the efficiency of extracting the colloquial statements is effectively improved, and a whole colloquial corpus system is improved.

Description

technical field [0001] The embodiment of the present invention relates to the field of information technology, in particular to a method and device for extracting colloquial sentences. Background technique [0002] With the advancement of science and technology, the characteristics of large storage capacity of computers have been applied to the storage of language, and thus the corpus has been developed. [0003] Spoken language corpus is also the basic resource of language knowledge carried by computer. A complete spoken language corpus is used for language model construction, dictionary compilation, and text classification. , which is also a spoken language corpus constructed from colloquial sentences extracted word by word by users. [0004] The way for users to customize the spoken language corpus is time-consuming and laborious, and it has personal factors and lacks authority. The lack of a systematic spoken language corpus is not conducive to the improvement of the en...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/36
CPCG06F16/3344G06F16/3346G06F16/36
Inventor 李贤
Owner GUANGZHOU SHIYUAN ELECTRONICS CO LTD