Corpus tagging method, apparatus and system

A corpus tagging and corpus technology, applied in the computer field, can solve the problems of increased memory usage of corpus dictionaries, reduced efficiency of automatic tagging, and low efficiency of manual corpus tagging

Inactive Publication Date: 2016-01-13
INSPUR QILU SOFTWARE IND
View PDF9 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing corpus annotation methods are mainly divided into two categories. One is to annotate the corpus by a full-time annotator. Due to the large amount of corpus that needs to be annotated, the efficiency of manual corpus annotation is low.
The other is to automatically label the corpus by using the labeling server with the help of the corpus dictionary. Although compared with manual labeling, this automatic labeling improves the labeling efficiency to a certain extent, but with the expansion of the corpus dictionary, the memory occupied by the corpus dictionary increases, resulting in Reduced automatic labeling efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus tagging method, apparatus and system
  • Corpus tagging method, apparatus and system
  • Corpus tagging method, apparatus and system

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0053] In one embodiment of the present invention, in order to reduce the memory occupied by entity word pairs formed by corpus fragments, after step 103, before step 104, further include: for each application server, control the current application server to output the current application server The key-value pairs corresponding to the corpus fragments in , where the key represents the target sentence, and the value represents the entity words corresponding to the corpus fragments in the current application server; each sub-corpus dictionary corresponding to each corpus fragment corresponding to the current sub-corpus dictionary The key-value pair is merged to form a set of entity words corresponding to the target sentence for the current subcorpus dictionary; the specific implementation manner of step 104: determine whether the first entity word in all entity word sets contains the second entity word.

[0054] In one embodiment of the present invention, in order to further im...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a corpus tagging method, apparatus and system. The corpus tagging method comprises: determining and loading a data dictionary, according to corpus types, splitting the data dictionary to form each sub-corpus dictionary, and assigning the each sub-corpus dictionary to a corresponding application server; determining a target statement; controlling the current application server of each application server to execute the sub-corpus dictionary according to the current application server, and carrying out entity word matching for the target statement; determining whether a first entity word in each matched entity word of the target statement comprises a second entity word, if so, only retaining the first entity word, and tagging the first entity word; and if not, separately tagging the first entity word and the second entity word, thereby effectively improving automatic tagging efficiency.

Description

technical field [0001] The present invention relates to the field of computers, in particular to a corpus labeling method, device and system. Background technique [0002] For the current Internet big data era, the importance of data is self-evident. Among them, natural language processing is an important technology for understanding data, and in natural language processing, corpus needs to be marked. The existing corpus annotation methods are mainly divided into two categories. One is to annotate the corpus by a full-time annotator. Due to the large amount of corpus that needs to be annotated, the efficiency of manual corpus annotation is low. The other is to automatically label the corpus with the help of the corpus dictionary using the labeling server. Although compared with manual labeling, this automatic labeling improves the labeling efficiency to a certain extent, but with the expansion of the corpus dictionary, the memory occupied by the corpus dictionary increases, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 刘福明杨培强
Owner INSPUR QILU SOFTWARE IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products