Construction method, system, device and storage medium of named entity recognition corpus

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology for named entity recognition and named entities, which is applied in the creation of semantic tools, natural language data processing, and unstructured text data retrieval. To achieve the effect of wide coverage, wide coverage and wide application

Active Publication Date: 2022-04-12

SUZHOU UNIV

View PDF2 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The named entity categories used by different corpora, as well as the labeling rules and the size of the corpus are different, and in order to ensure the quality of the corpus, these corpora need to be labeled by professionals, which not only limits the size and field of the corpus, but also It takes a lot of manpower and material resources

For example, the "People's Daily" corpus in January 1998 in the news field is not only outdated, but also has low accuracy when it is applied to other fields except the news field

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0041] The following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0042] The purpose of the present invention is to provide a named entity recognition corpus construction method, system, equipment and storage medium, which can automatically construct a Chinese named entity recognition corpus with the advantages of rich content and wide application fields.

[0043] In order to enable those skilled in the art to better understand the technical solution of the present invention, the present invention will be furt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for constructing a Chinese named entity recognition corpus. Based on a computer, Chinese Wikipedia is used as the corpus, and by extracting features of Chinese Wikipedia entries, the Chinese Wikipedia entries can be classified, and Chinese Wikipedia entity entries can be determined. And predict the type of named entity corresponding to the Chinese Wikipedia entity entry. Finally, a Chinese Wikipedia entity list containing named entities is constructed based on the type and redirection information. The Chinese named entity recognition corpus can be composed of all named entities in the Chinese Wikipedia entity list. It has the advantages of rich content and wide field coverage. Moreover, by applying the construction method, the Chinese named entity recognition corpus can be automatically constructed based on the computer, saving manpower and material resources. In addition, the invention also discloses a Chinese named entity recognition corpus construction system, equipment and a computer-readable storage medium, the effects are as above.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a method, system, device and storage medium for constructing a named entity recognition corpus. Background technique [0002] The purpose of information extraction is to extract entities and their interrelationships from unstructured free text, and transform them into structured expressions, so as to provide a data basis for the construction of knowledge bases. [0003] In the existing technology, the research on Chinese named entity recognition mainly uses high-quality manually labeled corpus, such as the "People's Daily" corpus in January 1998, the MSRA corpus of Microsoft Asia Research Institute, the CityU corpus of Hong Kong City University and the ACE2005 Chinese corpus, etc. . The named entity categories used by different corpora, as well as the labeling rules and the size of the corpus are different, and in order to ensure the quality of the co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F16/36G06F40/295

CPCG06F40/295

Inventor钱龙华何云琪李雁群王红玲周国栋

OwnerSUZHOU UNIV

Construction method, system, device and storage medium of named entity recognition corpus

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology