Method for automatically extracting sentence template

A technology of automatic extraction and extraction method, which is applied in the field of studying the similarity of sentences and structures, and can solve the problems of easy omission, labor, and time-consuming.

Inactive Publication Date: 2008-07-16
IFLYTEK CO LTD
View PDF0 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the past, searching for templates was usually carried out manually. The disadvantage is that it is easy to m...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically extracting sentence template

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] The method of automatically extracting sentence templates includes the following steps:

[0024] (1) Sentence: Divide the text into several sentences according to punctuation marks; and put serial numbers in front of the sentences in order;

[0025] (2) Word segmentation: Use word segmentation technology to divide each sentence obtained from the sentence into small pieces based on each word;

[0026] (3) After the word segmentation is completed, divide the sentence into several groups according to the number of words in the sentence from more to less or from less to more;

[0027] (4) Template extraction: Apply the LCS algorithm to the same group of sentences to obtain the longest common subsequence. While obtaining the longest common subsequence, delete the longest common subsequence whose internal active part has a length of zero to obtain the sentence template.

Embodiment 2

[0029] The method of automatically extracting sentence templates includes the following steps:

[0030] (1) Sentence: Divide the text into several sentences according to punctuation marks; and put serial numbers in front of the sentences in order;

[0031] (2) Word segmentation: Use word segmentation technology to divide each sentence obtained from the sentence into small pieces based on each word;

[0032] (3) Template extraction: On the basis of the word segmentation result, the LCS algorithm is applied to the sentence to obtain the longest common subsequence, that is, the sentence template is obtained.

Embodiment 3

[0034] The method of automatically extracting sentence templates includes the following steps:

[0035] (1) Sentence: Divide the text into several sentences according to punctuation marks; and put serial numbers in front of the sentences in order;

[0036] (2) Word segmentation: Use word segmentation technology to divide each sentence obtained from the sentence into small pieces based on each word;

[0037] (3) After the word segmentation is completed, divide the sentence into several groups according to the number of words in the sentence from more to less or from less to more;

[0038] (4) Template extraction: Apply the LCS algorithm to the sentences in the same group to get the longest common subsequence, which is the sentence template.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for automatically extracting sentence templates which comprises the following steps that: a text is divided into a plurality of sentences according to the punctuation; serial numbers are marked in front of the sentences according to the sequence; each sentence obtained by sentence separation is divided into small blocks based on each word by using word separation technology; after the word separation is finished, the sentences are divided into a plurality of groups with ascending order or descending order according to the quantity of the words in the sentences; the sentence template can simply be obtained by applying the sentences in the same group with LCS algorithm to obtain a longest public subsequence. The invention can automatically and efficiently statisticize commonly used words and sentences from plenty of text information.

Description

Technical field [0001] The invention relates to a text analysis auxiliary technology, in particular to a method for studying the inherent similarity of sentences and structures from a batch of texts and abstracting them as templates. Background technique [0002] In Chinese studies, common words and sentences are often studied, while common sentences are more concerned, such as making some related products similar to 900 sentences in English, etc. How can we choose some good ones from the vast text? What about the sentence? Similar to 900 sentences in English, in fact, a good sentence means that it can contain common sentence patterns in the language. The commonly used sentence patterns can actually be abstracted into sentence templates. Moreover, for speech research, the selection of sentence templates is also very important. For example, when performing speech synthesis, abstracting common sentence patterns into templates and making them into corpus can greatly improve the synt...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 高毅徐波陈志刚胡国平赵志伟严峻吴晓如刘庆峰王仁华
Owner IFLYTEK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products