Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese named entity extraction method based on multi-annotation framework and fusion features

A named entity and feature fusion technology, applied in the fields of artificial intelligence and natural language processing, can solve problems such as difficult to identify entity boundaries, single labeling framework, and lack of word information utilization

Active Publication Date: 2021-07-30
NANJING UNIV
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Purpose of the invention: In view of the problems and deficiencies in the prior art above, the purpose of the present invention is to propose a Chinese named entity extraction method based on multiple annotation frameworks and fusion features, so as to solve the problem that the existing Chinese named entity extraction method has a single annotation framework. , resulting in the problem of being limited to a single annotation framework, and the lack of utilization of word information, resulting in the difficulty of identifying entity boundaries

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese named entity extraction method based on multi-annotation framework and fusion features
  • Chinese named entity extraction method based on multi-annotation framework and fusion features
  • Chinese named entity extraction method based on multi-annotation framework and fusion features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these embodiments are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention Modifications in equivalent forms all fall within the scope defined by the appended claims of this application.

[0027] The invention proposes a method for extracting Chinese named entities based on multiple annotation frames and fusion features, which solves the problems that existing Chinese named entity extraction methods are difficult to identify entity boundaries and are limited to a single annotation frame. Such as figure 1 As shown, the complete process of the present invention includes six parts: dictionary feature construction stage, pinyin feature construction stage, dict...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Chinese named entity extraction method based on a multi-annotation framework and fusion features. The Chinese named entity extraction method comprises the steps that: firstly, Chinese characters are coded based on a pre-training language model; then, word information and word segmentation mark information are introduced for each Chinese character through dictionary matching, and dictionary features are constructed; on the basis, according to the meaning of the Chinese character in the matched word, phonetic notation is carried out on the Chinese character by using Chinese pinyin software, and pinyin features are constructed; thirdly, the dictionary features and the pinyin features are fused into Chinese character codes on the basis of a click-ride attention mechanism, Chinese character semantic codes combining the dictionary features and the pinyin features are obtained, and the recognition capacity for the Chinese named entity boundary is improved; and finally, the advantages of sequence labeling and index labeling are combined, two labeling tasks are jointly learned through a multi-task learning model, and the accuracy of Chinese named entity extraction is improved.

Description

technical field [0001] The invention belongs to the field of artificial intelligence and natural language processing, and in particular relates to a method for extracting Chinese named entities based on a multi-label framework and fusion features. Background technique [0002] With the rapid development of Internet technology, the explosive growth of data information in various industries has promoted the development of intelligent analysis and mining services and innovative applications of big data in the industry, and further promoted the development of my country's digital economy. These data information contain a large amount of unstructured text. Extracting structured and effective information from these unstructured texts has become the focus of the industry, and it involves a basic task in the field of natural language processing: named entities extract. [0003] Early research work on named entity recognition was mainly based on dictionaries and rules. These methods...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35G06F40/216G06F40/242G06F40/295G06K9/62G06N3/04G06N3/08
CPCG06F16/3344G06F16/35G06F16/3346G06F40/216G06F40/242G06F40/295G06N3/08G06N3/047G06N3/044G06F18/2415G06F18/241Y02D10/00
Inventor 麦丞程刘健黄宜华
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products