Construction method, system, device and storage medium of named entity recognition corpus

A technology for named entity recognition and named entities, which is applied in the creation of semantic tools, natural language data processing, and unstructured text data retrieval. To achieve the effect of wide coverage, wide coverage and wide application

Active Publication Date: 2022-04-12
SUZHOU UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The named entity categories used by different corpora, as well as the labeling rules and the size of the corpus are different, and in order to ensure the quality of the corpus, these corpora need to be labeled by professionals, which not only limits the size and field of the corpus, but also It takes a lot of manpower and material resources
For example, the "People's Daily" corpus in January 1998 in the news field is not only outdated, but also has low accuracy when it is applied to other fields except the news field

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Construction method, system, device and storage medium of named entity recognition corpus
  • Construction method, system, device and storage medium of named entity recognition corpus
  • Construction method, system, device and storage medium of named entity recognition corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0042] The purpose of the present invention is to provide a named entity recognition corpus construction method, system, equipment and storage medium, which can automatically construct a Chinese named entity recognition corpus with the advantages of rich content and wide application fields.

[0043] In order to enable those skilled in the art to better understand the technical solution of the present invention, the present invention will be furt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for constructing a Chinese named entity recognition corpus. Based on a computer, Chinese Wikipedia is used as the corpus, and by extracting features of Chinese Wikipedia entries, the Chinese Wikipedia entries can be classified, and Chinese Wikipedia entity entries can be determined. And predict the type of named entity corresponding to the Chinese Wikipedia entity entry. Finally, a Chinese Wikipedia entity list containing named entities is constructed based on the type and redirection information. The Chinese named entity recognition corpus can be composed of all named entities in the Chinese Wikipedia entity list. It has the advantages of rich content and wide field coverage. Moreover, by applying the construction method, the Chinese named entity recognition corpus can be automatically constructed based on the computer, saving manpower and material resources. In addition, the invention also discloses a Chinese named entity recognition corpus construction system, equipment and a computer-readable storage medium, the effects are as above.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a method, system, device and storage medium for constructing a named entity recognition corpus. Background technique [0002] The purpose of information extraction is to extract entities and their interrelationships from unstructured free text, and transform them into structured expressions, so as to provide a data basis for the construction of knowledge bases. [0003] In the existing technology, the research on Chinese named entity recognition mainly uses high-quality manually labeled corpus, such as the "People's Daily" corpus in January 1998, the MSRA corpus of Microsoft Asia Research Institute, the CityU corpus of Hong Kong City University and the ACE2005 Chinese corpus, etc. . The named entity categories used by different corpora, as well as the labeling rules and the size of the corpus are different, and in order to ensure the quality of the co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/36G06F40/295
CPCG06F40/295
Inventor 钱龙华何云琪李雁群王红玲周国栋
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products