Supercharge Your Innovation With Domain-Expert AI Agents!

Corpus sample set construction method, computing equipment and computer storage medium

A technology of sample collection and corpus, applied in computing, natural language data processing, instruments, etc., can solve problems such as heavy workload, lack of corpus sample support, and lack of corpus sample coverage, and achieve the effect of improving efficiency

Pending Publication Date: 2020-12-18
ZHANGYUE TECH CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the corpus samples used by the existing named entity recognition technology do not cover the e-book field, resulting in the lack of corpus sample support for named entity recognition based on the e-book field
If the method of manual labeling is adopted, the workload will be huge, and it will take a lot of labor cost and time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus sample set construction method, computing equipment and computer storage medium
  • Corpus sample set construction method, computing equipment and computer storage medium
  • Corpus sample set construction method, computing equipment and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0074] In view of the lack of corpus samples in the prior art based on named entity recognition in the field of e-books, the present invention provides a method for constructing a corpus sample set. This method uses a common corpus for sample labeling, and trains the corpus recognition model according to the sample labeling results; then, introduces book corpus, uses the trained corpus recognition model to carry out sample labeling ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a corpus sample set construction method, computing equipment and a computer storage medium. The method comprises the steps of S1, identifying corpora of a corpus to obtain an initial corpus sample set and a corpus identification model trained by using the initial corpus sample set; S2, recognizing a book corpus by utilizing a corpus recognition model to obtain a book corpussample; S3, adding the book corpus samples into a corpus sample set; S4, training a corpus recognition model by utilizing the updated corpus sample set; and repeating the steps S2 to S4 until the updated corpus sample set meets a first preset condition. According to the scheme, the corpus of the initial data source realizes corpus construction and learning of zero samples, the constructed corpussample set is suitable for the field of e-books, and corpus samples have the characteristics of diversity and accuracy.

Description

technical field [0001] The invention relates to the technical field of e-book processing, in particular to a method for constructing a corpus sample set, a computing device and a computer storage medium. Background technique [0002] Named Entity Recognition (NER for short) refers to identifying entities with specific meanings in text, mainly including names of people, places, institutions, proper nouns, etc. NER technology is an important basic tool for information extraction, question answering systems, syntax analysis, machine translation and other application fields. The recognition of named entities requires sample annotation of a large amount of corpus as a sample set for model training. [0003] In the field of e-book processing, book search is a routine function. Users often search for characters in books, place names in books, etc., so it is necessary to accurately extract the names of characters in books and place names in books from books. The basis and premise ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/216
Inventor 柳燕煌
Owner ZHANGYUE TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More