Trinity character annotation Chinese lexical analysis method based on Bi-LSTM-CRF

A lexical analysis and trinity technology, applied in the field of lexical analysis, can solve problems such as labor-intensive and multi-manual feature engineering, and achieve the effect of reducing labor and improving efficiency

Pending Publication Date: 2021-02-12
ANYANG NORMAL UNIV
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The literature [1] and patent literature [2] mentioned above are all implemented by traditional machine learning modeling. These methods require more manual feature engineering and consume more manpower.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Trinity character annotation Chinese lexical analysis method based on Bi-LSTM-CRF
  • Trinity character annotation Chinese lexical analysis method based on Bi-LSTM-CRF
  • Trinity character annotation Chinese lexical analysis method based on Bi-LSTM-CRF

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] This embodiment provides a Bi-LSTM-CRF-based Chinese lexical analysis method for trinity character labeling, the method includes the following steps:

[0036] (1) The training corpus and test corpus of the model are constructed based on the idea of ​​trinity character tagging Chinese lexical analysis, specifically;

[0037] (11) The three sub-tasks of Chinese lexical analysis are all unified into the frame of word tagging, and the tagging of each word includes three types of information, namely, lexeme, part of speech, and named entity, in the form of "lexeme_part of speech or named entity category", the word tag consists of two parts, separated by an underscore, before the underscore is the lexeme information, after that is the part of speech or named entity category information, the tag "lexeme_part of speech or named entity category" of each word is called the The lexical information mark of the word;

[0038] Wherein, lexeme refers to the word-forming position that...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of lexical analysis, and discloses a trinity character annotation Chinese lexical analysis method based on Bi-LSTM-CRF, which comprises the steps of constructing a training corpus and a test corpus of a model based on an idea of trinity character annotation Chinese lexical analysis; building a Bi-LSTM-CRF model; inputting a training corpus, and trainingthe Bi-LSTM-CRF model through multiple iterations; segmenting an input Chinese text, and inputting the Chinese text into the trained model; determining a final lexical information marking sequence ofthe input Chinese text; and performing Chinese word segmentation, Chinese part-of-speech tagging and Chinese named entity recognition on the input text according to the lexical information tagging sequence to obtain a final Chinese lexical analysis result. According to the method, artificial feature engineering adopting traditional machine learning modeling is omitted, a representation learning method is introduced into machine learning, feature representation, extraction and selection can be automatically completed by the model, the efficiency is improved, and the Chinese lexical analysis precision is improved.

Description

technical field [0001] The invention belongs to the technical field of lexical analysis, and in particular relates to a Chinese lexical analysis method based on Bi-LSTM-CRF for trinity character labeling. Background technique [0002] In the field of Chinese information processing, Chinese lexical analysis is one of the important basic topics. It is not only the basis of deep Chinese information processing such as syntactic analysis, semantic analysis, and text understanding, but also a key link in applications such as machine translation, question answering systems, information extraction, and reading comprehension. Chinese lexical analysis mainly includes three sub-tasks of Chinese word segmentation, part-of-speech tagging and named entity recognition. There are two main concerns when analyzing the existing technical solutions below: (1) whether to process the three sub-tasks independently or three sub-tasks (2) Chinese lexical analysis modeling is based on traditional ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/284G06F40/295G06N3/04G06N3/08
CPCG06F40/211G06F40/284G06F40/295G06N3/049G06N3/08G06N3/044G06N3/045
Inventor 于江德胡顺义王希杰谷川赵红丹
Owner ANYANG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products