Corpus searching method and system

A corpus and phrase technology, applied in the field of corpus retrieval, can solve problems such as poor flexibility, low retrieval efficiency, and not supporting mixed retrieval, and achieve the effect of reducing the number of scans and improving retrieval speed

Inactive Publication Date: 2016-07-20
汇智明德(北京)教育科技有限公司
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problems of poor flexibility, low retrieval efficiency and unsupported hybrid retrieval of existing corpus retrieval methods, the present invention provides a corpus retrieval method, said method comprising:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus searching method and system
  • Corpus searching method and system
  • Corpus searching method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0030] see figure 2 , the embodiment of the present invention provides a kind of retrieval method of corpus, specifically comprises the following steps:

[0031] Step 101: Input retrieval conditions, and construct a retrieval syntax tree according to the retrieval conditions and preset construction rules.

[0032] Search criteria include lexical descriptions, phrases or a combination of words and phrases, where:

[0033] 1) Lexical description

[0034] The lexical description includes the original text of the vocabulary, part of speech subcategories or lexical variants, specifically:

[0035] a) Vocabulary original text

[0036] The original text of the vocabulary needs to be entered in the order of the grammatical structure and idiomatic usage of the language used.

[0037] b) Part of Speech Subcategories

[0038] The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a corpus searching method and system, and belongs to the technical field of corpus searching. The method comprises the following steps: constructing a searching syntax tree according to a searching condition and a construction rule; and traversing syntax trees in a corpus according to a level priority so as to search nodes matched with the searching syntax tree, and outputting a searching result. The system comprises a corpus database and a client, wherein the client is provided with an input module, a construction module, a traversing module, a searching module and an output module. According to the corpus searching method and system, the searching conditions of users can be described through a syntax tree form, and nonlinear traversing is carried out along the syntax trees of the corpus, so that the structural characteristics of the syntax trees of the corpus are fully utilized, the scanning frequency of the whole corpus is greatly reduced and the searching speed is effectively improved.

Description

technical field [0001] The invention relates to the technical field of corpus retrieval, in particular to a corpus retrieval method and system. Background technique [0002] A corpus is a collection of electronic texts collected in a sampled fashion, usually for the study of a language or variants of a language. For data-driven language or language variant research, in addition to the corpus, software tools are also needed to analyze and mine the content of the corpus. Software tools play a vital role in the development and application of the corpus. The current corpus processing tools include: WordSmith, MonoConcPro, Conordance, and online corpus SketchEngine, etc. The main functions of these tools include: vocabulary usage and syntax retrieval, automatic vocabulary annotation, manual annotation auxiliary tools, text organization, statistical analysis, etc. [0003] The most commonly used functions in corpus research are lexical usage and syntactic retrieval. The core of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/322G06F16/3344G06F40/211
Inventor 贾云龙
Owner 汇智明德(北京)教育科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products