Unlock instant, AI-driven research and patent intelligence for your innovation.

Data-driven corpus automatic construction method

A data-driven and construction method technology, applied in the field of corpus construction, can solve the problems of slow corpus optimization and slow acquisition of new corpus, and achieve the effect of rich content, clear classification and high degree of automation

Inactive Publication Date: 2019-11-05
福建奇点时空数字科技有限公司
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In the process of building the corpus, there is a problem that the acquisition of new corpus is slow, which leads to the slow speed of corpus optimization.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data-driven corpus automatic construction method
  • Data-driven corpus automatic construction method
  • Data-driven corpus automatic construction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0050]According to above-mentioned embodiment, the present invention also proposes a kind of data-driven corpus automatic construction system, comprises data acquisition module, feasibility analysis module, storage construction module, matching module and development module; The output end of data acquisition module and feasibility analysis The input end of the module is connected by communication, the output end of the feasibility analysis module is connected by communication with the input end of the storage building block, the output end of the storage building block is connected by communication with the input end of the matching module, the output end of the matching module is connected with the input end of the development module communication connection.

[0051] In an optional embodiment, the data acquisition module is used to acquire corpus.

[0052] In an optional embodiment, the feasibility analysis module is used to perform feasibility analysis on the acquired corp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A data-driven corpus automatic construction method comprises the following of obtaining that corpora, wherein the corpora come from data; performing feasibility analysis on the obtained corpus; classifying and storing the corpora according to the classification model of the corpora, and constructing a standard corpus and an extension corpus; the corpus matching module is used for accessing a corpus and performing corpus matching according to the classification model; and the corpus development module is used for developing a new corpus according to the standard corpus and the extension corpusobtained by the matching module. The corpus construction method has the advantages that the corpus can be rapidly and accurately constructed, the automation degree is high, the constructed corpus is clear in classification, and the content is rich and continuously expanded.

Description

technical field [0001] The invention relates to the field of corpus construction, in particular to a data-driven automatic construction method of a corpus. Background technique [0002] Three basic understandings about the corpus: the corpus stores the language materials that have actually appeared in the actual use of the language; the corpus is the basic resource of language knowledge carried by the computer as a carrier; the real corpus needs to be processed (analyzed and processed) , to be a useful resource. There are many types of corpus, and the main basis for determining the type is its research purpose and use, which can often be reflected in the principles and methods of corpus collection. Someone once divided the corpus into four types: (1) Heterogeneous (Heterogeneous): There is no specific corpus collection principle, and various corpora are collected and stored as they are; (2) Homogeneous (Homogeneous): only the corpus of the same type of content is collected;...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/31G06F16/33G06F16/35G06F17/27
CPCG06F16/31G06F16/3344G06F16/353
Inventor 肖清林
Owner 福建奇点时空数字科技有限公司