Entity recognition method based on semi-supervised learning and clustering

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for rail transit and entity recognition, applied in neural learning methods, text database clustering/classification, biological neural network models, etc. data and other issues, to achieve the effect of improving the extraction speed and accuracy, shortening the processing time, and increasing the query rate

Pending Publication Date: 2021-07-30

XIAN UNIV OF TECH

View PDF0 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The purpose of the present invention is to provide a rail transit entity recognition method based on semi-supervised and clustering, which can solve the problem that existing rail transit specification entity recognition methods need to mark a large amount of data, and when experts build ontology databases, fine-grained entity classification and labeling samples are limited Issues that lead to low accuracy of entity recognition results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0075] The object of the present invention provides a kind of rail transit specification named entity recognition method based on semi-supervised and clustering, concrete frame is as follows figure 1 shown. Experts build ontology databases in the field of rail transit, and manually label part of the data; use word2vec and BERT pre-training models to vectorize labeled entities; secondly, use hierarchical clustering methods to cluster entity word vectors, and entities defined by experts Category proofreading, finalized entity categories; data preprocessing and data training on the training data again, input the generated word vectors into the BiLSTM-CRF algorithm to train the named entity recognition model, and use the Softmax function to iteratively train and optimize the extracted entity features Entity recognition model; set the deep learning model as the server to test the effect of the entity recognition model, input the test data set into the model to output the entity cat...

example

[0130] Entity labeling of the rail transit specification corpus, the specific steps are as follows:

[0131] Step 11.3.1, taking the subway design specification "9.1.6 Stations should be equipped with barrier-free facilities" as an example, the training set output by the BERT model is vectorized, and each word in "Stations should be equipped with barrier-free facilities" is trained Get a 768-dimensional vector, get the initialization vector of each word, and then use the result as the input of the deep learning model.

[0132] In step 11.3.2, using the BiLSTM-CRF algorithm in deep learning, bidirectional LSTM considers both past features and future features, a forward input sequence, and a reverse input sequence to predict the semantics of words in context. For example, after inputting "station", BiLSTM will predict the probability that the next word is "ying", and then input "station should" to predict the probability of the next word "setting", which is a positive inpu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an entity recognition method based on semi-supervised learning and clustering. The method comprises the steps of: pre-defining entity categories through the ontology library to label the rail transit standard unstructured data; performing vectorization representation on label data by using word2vec, and then performing a hierarchical clustering algorithm on entity word vectors with labels; performing conjoint analysis on entity categories and clustering results, proofreading entity category definitions, and finally determining the entity types of the ontology library in the field of rail transit; and finally, rearranging a data set, and inputting generated word vectors into a BiLSTM-CRF deep learning model to train a named entity recognition model, wherein a Softmax function is used to carry out tag classification on recognized entity features, and an entity tag classification result is evaluated. According to the method, the entity extraction speed and accuracy in the rail transit specifications can be improved, so that the time for processing the rail transit specifications by automatic question and answer system and semantic network labeling is shortened, the query rate of employees in the building field on the rail transit specifications is improved, and the user experience degree is improved.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence natural language processing, and relates to a rail transit entity recognition method based on semi-supervised learning and clustering. Background technique [0002] In recent years, the development of artificial intelligence has become an important development direction of the industry, among which natural language processing is an important research direction in this field, and its research results have been applied in medical, legal, financial and other industries, greatly improving the level of intelligence in the field . However, there is also a large amount of text information in the field of rail transit, and there are very few related studies in this field. In the existing natural language processing research field, the existing related methods related to the information extraction of rail transit regulations are mainly aimed at English rail transit regulations, while the r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/295G06F16/35G06N3/04G06N3/08

CPCG06F40/295G06F16/353G06N3/08G06N3/044

Inventor 黑新宏董林靖朱磊

Owner XIAN UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Entity recognition method based on semi-supervised learning and clustering

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology