Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for structuring corpus

A structured, corpus technology, applied in the field of information processing, can solve the problem of low efficiency in extracting content, and achieve the effect of rapid acquisition

Inactive Publication Date: 2015-08-19
PEKING UNIV FOUNDER GRP CO LTD +1
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The present invention provides a method and device for structuring corpus, which is used to solve the problem of low efficiency in extracting content from digital resources due to the storage method of expected files in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for structuring corpus
  • Method and device for structuring corpus
  • Method and device for structuring corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] An embodiment of the present invention provides a method for structuring a corpus, the method comprising: obtaining a corpus file corresponding to the corpus to be structured, and adding segmentation tags between different specific contents of the corpus file according to the font attribute information of the characters in the corpus file to generate an intermediate file; according to the corresponding relationship between the font attribute information set in the preset automatic structuring rules and the specific content, extract the character information corresponding to the specific content from the intermediate file; according to the set in the automatic structuring rules The hierarchical relationship of the different specific contents of the system will combine the extracted character information and upload it to the server, so that the server can store structured corpus files.

[0022] like figure 1 As shown, the embodiment of the present invention provides a met...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method of corpus structuralization and a device. The method of corpus structuralization and the device are applied to the technical field of information processing. The method includes that a corpus file corresponding to a corpus to be structuralized is obtained, dividing labels are added among different certain contents of the corpus file to form to be middle files according to character type property information of characters of the corpus file; character information corresponding to the certain contents is extracted from the middle files according to corresponding relations of the character type property information and the certain contents of a preset automatic structuralization rule; and the extracted character information is combined to be a structuralized corpus file according to hierarchical relations of the different certain contents in the automatic structuralization rule and is then uploaded to a server for storage. Structuralization of the corpus file is achieved by utilizing the method of corpus structuralization and the device, and therefore the requirement for precise retrieval is met.

Description

technical field [0001] The present invention relates to the technical field of information processing, in particular to a method and device for structuring corpus. Background technique [0002] In the current field of information publishing, a lot of information is published through paper media, and the minimum storage unit for archiving published documents is generally a document. When reprinting or needing to search for specific content of a certain document, it is necessary to query line by line. Unable to meet the advanced retrieval requirements for specific content within a document (such as text, annotations, proper nouns, etc.), and unable to meet the in-depth processing of some content in ancient books, for example, modifying or expanding annotations of ancient books based on archaeological discoveries; [0003] In addition, the characteristics of the ancient book corpus files left by this publishing method are that the content and style of the ancient books are mix...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 李凯翟因为黄冶
Owner PEKING UNIV FOUNDER GRP CO LTD