Construction method, system, device and storage medium of named entity recognition corpus

A technology for named entity recognition and named entities, which is applied in the creation of semantic tools, natural language data processing, and unstructured text data retrieval. To achieve the effect of wide coverage, wide coverage and wide application
CN108520065BActive Publication Date: 2022-04-12SUZHOU UNIV

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Patents(China)
Current Assignee / Owner
SUZHOU UNIV
Publication Date
2022-04-12

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a method for constructing a Chinese named entity recognition corpus. Based on a computer, Chinese Wikipedia is used as the corpus, and by extracting features of Chinese Wikipedia entries, the Chinese Wikipedia entries can be classified, and Chinese Wikipedia entity entries can be determined. And predict the type of named entity corresponding to the Chinese Wikipedia entity entry. Finally, a Chinese Wikipedia entity list containing named entities is constructed based on the type and redirection information. The Chinese named entity recognition corpus can be composed of all named entities in the Chinese Wikipedia entity list. It has the advantages of rich content and wide field coverage. Moreover, by applying the construction method, the Chinese named entity recognition corpus can be automatically constructed based on the computer, saving manpower and material resources. In addition, the invention also discloses a Chinese named entity recognition corpus construction system, equipment and a computer-readable storage medium, the effects are as above.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the technical field of natural language processing, in particular to a method, system, device and storage medium for constructing a named entity recognition corpus. Background technique

[0002] The purpose of information extraction is to extract entities and their interrelationships from unstructured free text, and transform them into structured expressions, so as to provide a data basis for the construction of knowledge bases.

[0003] In the existing technology, the research on Chinese named entity recognition mainly uses high-quality manually labeled corpus, such as the "People's Daily" corpus in January 1998, the MSRA corpus of Microsoft Asia Research Institute, the CityU corpus of Hong Kong City University and the ACE2005 Chinese corpus, etc. . The named entity categories used by different corpora, as well as the labeling rules and the size of the corpus are different, and in order to ensure the quality of the co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More