Automatic quotation extraction method and device with semantic integrity kept

An automatic extraction and citation technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of complete citation semantics, destructive integrity and truncation, and the effect of citation extraction cannot meet people's needs, etc.

Active Publication Date: 2014-09-17
吴涛军
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The obvious defect of this method is that the generated citations often do not have semantic integrity. It often happens that half of a sentence is included in the citation and the other half is not included in the citation, or even a word is truncated, which makes readers read. I don't know why
Moreover, in some cases, such integrity-breaking and truncated citations will affect users' ability to use them. For example, if the text contains information such as e-mail addresses, URL web page addresses, phone numbers, etc., and the citations truncate these information, then all Citations provided will not be of any real value
[0007] It can be seen that none of the existing citation extraction technologies can keep the semantic integrity of the citation under the premise of keeping the length of the citation within the threshold value, and avoid cutting off integral strings such as complete sentences, vocabulary, and email addresses. meet people's needs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic quotation extraction method and device with semantic integrity kept
  • Automatic quotation extraction method and device with semantic integrity kept
  • Automatic quotation extraction method and device with semantic integrity kept

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] Preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be pointed out that the purpose of describing the preferred embodiments is to more fully demonstrate the characteristics and beneficial effects of various aspects of the present invention. Therefore, the preferred embodiments are used as illustrations, and should not be construed as limiting the protection scope of the present invention. The protection scope of the present invention should be determined by the contents requested in the claims.

[0057] The present invention is a method and a device for realizing context extraction with complete semantic unit as a unit. First introduce the meaning of a complete semantic unit. A complete semantic unit is a text fragment with independent and complete semantics. A complete semantic unit is an inherent ideographic unit in natural language. For example, in Chinese, punctuation such as pe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an automatic quotation extraction method and device. Characters or character strings, serving as reading focuses, in a text can be used in the method and device to serve as centers to automatically extract contexts, the length of an extracted quotation is within the preserved length scope, it is kept that the extracted quotation has the semantic integrity, a section of semantic scene which is appropriate in length and integral in meaning and enables the selected characters or the selected character strings to serve as the reading focuses can be extracted from the text and formed, and a user can conveniently restore the correct meaning of the reading focuses in the contexts.

Description

technical field [0001] The present application relates to text analysis and extraction technology, and more specifically, to a method and device for automatic citation extraction while maintaining semantic integrity. Background technique [0002] In electronic structured documents, it is very necessary in many application scenarios to extract citation text centered on some keywords, phrases, sentences, etc. that are manually selected by the user or automatically selected based on predetermined rules (such as matching rules, etc.). function. For example, in the process of reading documents such as web pages, users can use marking tools to select the reading focus that they are interested in for reference in other reading; when users want to share these reading focuses through social networks such as Weibo, Only relying on the marked keywords, phrases and sentences is not enough for readers to restore the context of the reading focus, and cannot understand the meaning of the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 吴涛军
Owner 吴涛军
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products