Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

1263 results about "Named-entity recognition" patented technology

Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Hybrid adaptation of named entity recognition

A machine translation method includes receiving a source text string and identifying any named entities. The identified named entities may be processed to exclude common nouns and function words. Features are extracted from the source text string relating to the identified named entities. Based on the extracted features, a protocol is selected for translating the source text string. A first translation protocol includes forming a reduced source string from the source text string in which the named entity is replaced by a placeholder, translating the reduced source string by machine translation to generate a translated reduced target string, while processing the named entity separately to be incorporated into the translated reduced target string. A second translation protocol includes translating the source text string by machine translation, without replacing the named entity with the placeholder. The target text string produced by the selected protocol is output.
Owner:XEROX CORP

Text named entity recognition method based on Bi-LSTM, CNN and CRF

The invention discloses a text named entity recognition method based on Bi-LSTM, CNN and CRF. The method includes the following steps: (1) using a convolutional nerve network to encode and convert information on text word character level to a character vector; (2) combining the character vector and word vector into a combination which, as an input, is transmitted to a bidirectional LSTM neural network to build a model for contextual information of every word; and (3) in the output end of the LSTM neural network, utilizing continuous conditional random fields to carry out label decoding to a whole sentence, and mark the entities in the sentence. The invention is an end-to-end model without the need of data pre-processing in the un-marked corpus with the exception of the pre-trained word vector, therefore the invention can be widely applied for statement marking of different languages and fields.
Owner:ZHEJIANG UNIV

Method and apparatus for named entity recognition in natural language

The present invention provides a method for recognizing a named entity included in natural language, comprising the steps of: performing gradual parsing model training with the natural language to obtain a classification model; performing gradual parsing and recognition according to the obtained classification model to obtain information on positions and types of candidate named entities; performing a refusal recognition process for the candidate named entities; and generating a candidate named entity lattice from the refusal-recognition-processed candidate named entities, and searching for a optimal path. The present invention uses a one-class classifier to score or evaluate these results to obtain the most reliable beginning and end borders of the named entities on the basis of the forward and backward parsing and recognizing results obtained only by using the local features.
Owner:PANASONIC CORP

Enquiry statement analytical method and system for information retrieval

The invention discloses a query sentence analyzing method based on understanding of natural languages and a system thereof, and belongs to the technical field of information retrieval. The query sentence analyzing method comprises the following steps: (1) automatic segmenting, named entity identification and part-of-speech tagging of an input Chinese query sentence are implemented; (2) syntax structure of the segmented sentence is analyzed so as to obtain a syntax structural tree, and meaning of each word is determined according to the sentence after the part-of-speech tagging; (3) according to the syntax structure and the meaning of each word, semantic roles of predicates in the sentence are tagged; and (4) according to the analyzed result of the sentence from the levels of syntactics, syntax and semantics, keywords are expanded and the keywords that can reflect user information retrieval requirements are extracted. The query sentence analyzing system of the invention comprises a syntactic analyzing module, a syntax analyzing module, a semantic analyzing module and a keyword extracting module. The query sentence analyzing method and system can greatly improve the accuracy of query results and provide desired query results for users.
Owner:PEKING UNIV

Question and answer method based on knowledge map

The invention provides a question and answer method based on a knowledge map. The question and answer method based on a knowledge map provided in the invention is realized by subject entity matching,relationship matching and answer determination. The subject entity matching mainly comprises naming entity identification and entity linking. The naming entity identification is aimed at identifying naming entities such as names of people, names of places, and names of organizations in natural language questions q. The entity linking corresponds the identified naming entity to a certain entity inthe knowledge base, that is, finding out an entity s in triples; Relationship matching is to understand the semantics expressed by question q through natural language understanding technology, and match the relationship p in the triples (s, p, o) in the search space in order to determine the semantics of the question and its corresponding relationship with the knowledge base. The candidate subjectentity is obtained through entity identification and entity linking, and the relationship matching can obtain the candidate relationship, thereby obtaining several candidate triples; the answer determination is to rank the candidate triples according to entity recognition score, relationship match score, etc. to determine the final answer.
Owner:BEIHANG UNIV

Social Network Model for Semantic Processing

A social network model, based on data relevant to a user, is used for semantic processing to enable improved entity recognition among text accessed by the user. An entity extraction module of the server, with reference to a general training corpus, general gazetteers, user-specific gazetteers, and entity models, parses text to identify entities. The entities may be, for example, people, organizations, or locations. A social network module of the server builds the social network model implicit in the data accessed by the user. The social network model includes the relationships between entities and an indication of the strength of each relationship. The social network module is also used to disambiguate names and unify entities based on the social network model.
Owner:VULCAN TECH

Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method

The invention discloses a Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method. The method includes the following steps that Chinese natural language processing is performed on a fact type question input by a user, word segmentation, part-of-speech tagging and identification and expanding of a named entity are achieved, and a semantic dependency tree is generated; a generalization template and a semantic analysis technology are used for acquiring time, space, a fact entity, a fact object and the like in an interrogative sentence, then semantic processing is performed, composition element attributes relevant to all events in the interrogative sentence and values of the attributes are extracted, a plurality of 'attribute-value' pairs are generated, to-be-answered elements are substituted by interrogatives, and a complex fact triple set is formed; after a triple where a to-be-answered part is located is combined with other relevant fact triple sets to form knowledge base query with conditional constraints, and query matching based on similarity calculation is performed in a knowledge base, a result is extracted from the knowledge base, and a final answer is obtained. Fast and accurate query response to the knowledge base is achieved.
Owner:NANJING UNIV

Method for named-entity recognition and verification

A method for named-entity (NE) recognition and verification is provided. The method can extract at least one to-be-tested segments from an article according to a text window, and use a predefined grammar to parse the at least one to-be-tested segments to remove ill-formed ones. Then, a statistical verification model is used to calculate the confidence measurement of each to-be-tested segment to determine where the to-be-tested segment has a named-entity or not. If the confidence measurement is less than a predefined threshold, the to-be-tested segment will be rejected. Otherwise, it will be accepted.
Owner:IND TECH RES INST

Named entity identification method, device and equipment and computer readable storage medium

The embodiment of the invention discloses a named entity identification method, device and equipment and a computer readable storage medium. The method comprises the steps that character vectors and word vectors of a to-be-identified text are acquired, and weighted sum is carried out on the character vectors and the word vectors to obtain a weighted sum result; the weighted sum result is input into a target two-way LSTM model for processing to obtain a text character sequence; and the text character sequence is input into a target CRF model for processing to obtain a named entity identification result of the to-be-identified text. After the character vectors and word vectors of the to-be-identified text are acquired, weighted sum is carried out on the character vectors and the word vectors, dynamic weight information is used better, a relationship among words of a context is fully considered by adopting a two-way LSTM model, two-way information is fully adopted, then processing is carried out combined with a CRF model, thereby improving the accuracy rate of the named entity identification.
Owner:TENCENT TECH CHENGDU

Chinese entity relation extraction method based on keyword and verb dependency

The invention discloses a Chinese entity relation extraction method based on keyword and verb dependency. Taking large-scale unstructured free text as target text, firstly, the text is segmented and keywords are extracted to form a text keyword thesaurus. Then the text is subjected to sentence segmentation, word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and entity corpus is constructed by combining named entity thesaurus and keyword thesaurus. According to the characteristics of Chinese sentence structure, syntactic structure and the dependency betweenwords, the entity-relation syntactic rules are constructed from verbs, and then each sentence in the text is matched with the relation syntactic rules. Finally, the relation triple is output and theset of text relation triple is obtained. The invention can make the entity relation extraction of the large-scale Chinese text more effective and more accurate.
Owner:SHANGHAI DATATOM INFORMATION TECH CO LTD

Chinese named entity recognition method based on BERT-BiGRU-CRF

The invention discloses a Chinese named entity recognition method based on BERT-BiGRU-CRF. The method comprises three stages: in the first stage, preprocessing mass text corpora, and pre-training a BERT language model; in the second stage, preprocessing the named entity recognition corpus, and encoding the named entity recognition corpus through the trained BERT language model; and at the third stage, inputting the encoded corpus into a BiGRU+CRF model for training, and performing named entity recognition on the to-be-recognized statement by using the trained model. Construction of the Chinesenamed entity recognition method based on BERT-BiGRU-CRF is carried out, semantic representation of characters is enhanced through a BERT pre-training language model, semantic vectors are dynamicallygenerated according to contexts of the characters, and the ambiguity of the characters is effectively represented. Compared with a method based on fine tuning of a language model, the method has the advantages that training parameters are reduced, and the training time is saved.
Owner:WUHAN UNIV

Named entities recognition method based on bidirectional LSTM and CRF

The invention discloses a named entities recognition method based on bidirectional LSTM and CRF. The named entities recognition method based on the bidirectional LSTM and CRF is improved and optimizedbased on the traditional named entities recognition algorithm in the prior art. The named entities recognition method based on the bidirectional LSTM and CRF comprises the following steps: (1) preprocessing a text, extracting phrase information and character information of the text; (2) coding the text character information by means of the bidirectional LSTM neural network to convert the text character information into character vectors; (3) using the glove model to code the text phrase information into word vectors; (4) combining the character vectors and the word vectors into a context information vector and putting the context information vector into the bidirectional LSTM neural network; and (5) decoding the output of the bidirectional LSTM with a linear chain condition random field to obtain a text annotation entity. The invention uses a deep neural network to extract text features and decodes the textual features with the condition random field, therefore, the text feature information can be effectively extracted and good effects can be achieved in the entity recognition tasks of different languages.
Owner:南京安链数据科技有限公司

Method and device for establishing medical knowledge graph, and auxiliary diagnosis method

The invention discloses a method and device for establishing a medical knowledge graph, and an auxiliary diagnosis method. The method for establishing the medical knowledge graph comprises the steps that a user dictionary is established according to a medical database; electronic medical record data is processed, and named entity recognition is conducted; correlation relations are established for each recognized entity; and the medical knowledge graph is established according to the correlation relations. The auxiliary diagnosis method based on the medical knowledge graph comprises the steps that a patient's chief complaint data and inspection data are acquired and processed, so that a symptom entity and a sign entity of the patient can be obtained; a disease entity correlated with the symptom entity and the sign entity is searched in the medical knowledge graph, and a posterior probability of each disease entity in a set composed of the corresponding symptom entity and the sign entity is computed respectively; and the disease entity with the maximum posterior probability and data corresponding to correlated nodes of the disease entity are output. According to the invention, intelligent auxiliary diagnosis is provided for clinical medical science, so that working burdens of medical workers are relieved; medical stress is relieved; and occurrence rate of medical accidents is reduced.
Owner:HEFEI UNIV OF TECH

Method and system for automatically constructing knowledge maps for mass unstructured texts

The invention belongs to the technical field of computer software, and discloses a method and a system for automatically constructing knowledge maps for mass unstructured texts. The method comprises the steps of: abstracting a named entity recognition problem into a sequence labeling problem by giving a sentence and labeling each word in the sequence of sentences; designing effective features according to the training data, learning various classification models, and using trained classifiers to predict relationships; linking multiple existing knowledge to create a large-scale and unified knowledge network from the top; and capturing and integrating entity information from three online encyclopedias, open websites, related knowledge bases, or search engine logs. According to the method andthe system for automatically constructing knowledge maps for mass unstructured texts, the construction speed of the knowledge maps can be greatly improved, the time efficiency is improved, and the human resource cost is reduced by more than 30%. In addition, the method and the system have better domain portability, and the construction of the knowledge map can be quickly implemented by only optimizing the entities and relationship extraction algorithms in the invention.
Owner:GLOBAL TONE COMM TECH

Named-entity recognition model training method and named-entity recognition method and device

An embodiment of the invention provides a named-entity recognition model training method and a named-entity recognition method and device. The method used for training a recurrent neutral network (RNN) named-entity recognition model includes: acquiring multiple labeled sample data, wherein each sample datum includes a text string and multiple term segment labeled data thereof, and each term segment labeled datum includes segmented terms separated from the text string and a named-entity attribute tag in the text string; mapping the segmented terms in the labeled sample data to be term vectors, taming the sample data as training samples, training the RNN named-entity recognition model, and learning parameters of the RNN named-entity recognition model. By the named-entity recognition model training method and the name-entity recognition method and device, the trained model has better generalization ability, the named entity in the natural language tests can be recognized rapidly, and recognition accuracy of the named entity is improved.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Named entity recognition

Named entity recognition is described, for example, to detect an instance of a named entity in a web page and classify the named entity as being an organization or other predefined class. In various examples, named entity recognition results are used to augment text from which the named entity was recognized; the augmentation may comprise information retrieval results about the named entity mention. In various embodiments, labeled training sentences in many different languages and for many different classes, are obtained to train machine learning components of a multi-lingual, multi-class, named entity recognition system. In examples, labeled training sentences are obtained from at least two sources, a first source using a multi-lingual or monolingual corpus of inter-linked documents and a second source using machine translation training data. In examples, labeled training sentences from the two sources are selectively sampled for training the named entity recognition system.
Owner:MICROSOFT TECH LICENSING LLC

Named entity identification method capable of combining attention mechanism and multi-target cooperative training

The invention provides a named entity identification method capable of combining an attention mechanism and multi-target cooperative training. The method comprises the following steps that: (1) carrying out a preprocessing operation on training data, and through character hierarchy mapping, obtaining the character vector representation of a sentence; (2) inputting the character vector representation obtained in (1) into a bidirectional LSTM (Long Short Term Memory) network, and obtaining the character vector representation of each word; (3) through word hierarchy mapping, obtaining the word vector representation of each sentence; (4) through the attention mechanism, splitting the word vector representation obtained in (3) with the character vector representation obtained in (1), and transmitting into the bidirectional LSTM network to obtain the semantic characteristic vector of the sentence; and (5) aiming at the semantic characteristic vector obtained in (4), carrying out entity annotation on each word by a conditional random field, and decoding to obtain an entity tag.
Owner:SUN YAT SEN UNIV

Multi-task named entity recognition and confrontation training method for medical field

The invention discloses a multi-task named entity recognition and confrontation training method for medical field. The method includes the following steps of (1) collecting and processing data sets, so that each row is composed of a word and a label; (2) using a convolutional neural network to encode the information at the word character level, obtaining character vectors, and then stitching withword vectors to form input feature vectors; (3) constructing a sharing layer, and using a bidirection long-short-term memory nerve network to conduct modeling on input feature vectors of each word ina sentence to learn the common features of each task; (4) constructing a task layer, and conducting model on the input feature vectors and the output information in (3) through a bidirection long-short-term network to learn private features of each task; (5) using conditional random fields to decode labels of the outputs of (3) and (4); (6) using the information of the sharing layer to train a confrontation network to reduce the private features mixed into the sharing layer. According to the method, multi-task learning is performed on the data sets of multiple disease domains, confrontation training is introduced to make the features of the sharing layer and task layer more independent, and the task of training multiple named entity recognition simultaneously in a specific domain is accomplished quickly and efficiently.
Owner:ZHEJIANG UNIV

Conditional random fields (CRF)-based relation extraction system

A system for extracting information from text, the system including parsing functionality operative to parse a text using a grammar, the parsing functionality including named entity recognition functionality operative to recognize named entities and recognition probabilities associated therewith and relationship extraction functionality operative to utilize the named entities and the probabilities to determine relationships between the named entities, and storage functionality operative to store outputs of the parsing functionality in a database.
Owner:DIGITAL TROWEL ISRAEL

Dependency semantic-based Chinese unsupervised open entity relationship extraction method

The invention relates to a dependency semantic-based Chinese unsupervised open entity relationship extraction method. The method comprises the following steps of preprocessing an input text: performing Chinese word segmentation, part-of-speech tagging and dependency grammar analysis on the input text; performing named entity identification on the input text; arbitrarily selecting two entities from identified entities to form candidate entity pairs; searching for a dependency path between two entities in the candidate entity pairs; and analyzing whether a syntactic structure mapped by the dependency path is matched with a normal form of a dependency semantic normal form set or not, if yes, extracting words or phrases from the residual part of the input text according to the matched normal form to serve as relational words, forming a relational triple by the extracted relational words and the candidate entity pairs, and if not, performing normal form matching of a next group of the candidate entity pairs; and outputting the relational triple. Compared with the prior art, the method has the advantages that the calculation complexity is low; the extraction efficiency is high; distance position limitation is overcome; a simple sentence also can be extracted and the like.
Owner:TONGJI UNIV

Geographical science domain named entity recognition method

ActiveCN107133220AEntity recognition implementationCorrect mislabeling issueSemantic analysisSpecial data processing applicationsDomain nameConditional random field
The invention discloses a geographical science domain named entity recognition method, which is used for recognizing geographical science core term entities and geographical location entities. The method mainly comprises three steps of (1) establishing a geographical science domain dictionary, and using a new word discovery algorithm to identify new words in the geographical science domain in an unsupervised way; (2) training and testing based on a conditional random field (CRF) model and a multichannel convolutional neural network (MCCNN) model; (3) carrying out error correcting and fusion on entities recognized by the models by using a rule-based method. According to the geographical science domain named entity recognition method, the new words of the domain are identified as the dictionary in an unsupervised way by using the new word discovery algorithm, so that the work distinguishing effect is improved. The semantic vectors of the words are learnt from large-scale unmarked data in an unsupervised way, and basic characteristics of the words are synthesized and are taken as the input characteristics of the MCCNN model, so that manual selection and construction of the characteristics are avoided. The predicting results of the two models are fused by means of a custom rule, so that the problem of error marking in a recognition process can be corrected.
Owner:SOUTHEAST UNIV

Creating a document index from a flex- and Yacc-generated named entity recognizer

Methods of constructing a document index including named entity information generated by at least one tool associated with parsing computer programs are presented. The methods include using a lexical analyzer generator, e.g. Flex, and / or a parser generator, e.g. Yacc, to generate named entity recognizers. The named entity recognizers are used to identify named entities in documents, in particular, very large document sets such as web pages available on the Internet. The identified named entities are stored as named entity annotations in the document index. Also, methods of performing searches using the document index are presented. The searches are performed based on queries that can be received on an application programming interface (API). Relevant documents are obtained using the named entity annotations, which can be returned across the API. Also presented are associated computer readable media.
Owner:MICROSOFT TECH LICENSING LLC

Online traditional Chinese medicine text named entity identifying method based on deep learning

The invention discloses an online traditional Chinese medicine text named entity identifying method based on deep learning. The method includes the steps that online traditional Chinese medicine text data are obtained through a web crawler, and named entities of the obtained online traditional Chinese medicine text data are labeled with existing terminological dictionaries and human assistance; a word2vec tool is used for carrying out learning on large-scale label-free linguistic data, and word vectors with fixed length are obtained and used for forming a corresponding glossary; word segmentation is carried out on the online traditional Chinese medicine text data, words are converted into the word vectors with the fixed length by searching for the glossary, the word vectors serve as input of a convolutional neural network, and a blank character is used for filling when sentence length is insufficient; output of the convolutional neural network serves as input of a bidirectional long-short-time memory recurrent neural network, and an identification result of the online traditional Chinese medicine text data words to be identified is output. Compared with a traditional method for named entity identifying, the method reduces complexity and workload of feature extraction, simplifies the processing process and remarkably improves identification efficiency.
Owner:SOUTH CHINA UNIV OF TECH

Industry comment data fine grain sentiment analysis method

The invention relates to an industry comment data fine grain sentiment analysis method. The industry comment data fine grain sentiment analysis method is applied to Internet data analysis and comprises obtaining comment data of e-commerce industry goods and preprocessing the comment data; establishing initial industry sentiment word libraries and computing distribution of words under different sentiment polarities through 1-gram and 2-gram; performing Chinese word segmentation on the comment data; based on the sentiment word libraries established through the 1-gram and the 2-gram, utilizing combined sentiment models to perform word modeling to obtain the probability distribution of the words which belong to different topics under different sentiment distributions; utilizing context information to re-determine the sentiment alignment of sentiment words in sentences; performing named entity identification and extracting comment characteristics through conditional random fields to compute the sentiment alignment of comment words of the comment characteristics. The industry comment data fine grain sentiment analysis method computes the sentiment of the comment words through the two dimensions of topic and sentiment to achieve fine grain sentiment analysis on the industry comment data, thereby achieving high precision and interpretability of analysis results.
Owner:中科嘉速(北京)信息技术有限公司

Text information associating and clustering collecting processing method based on domain knowledge model

The invention provides a text information associating and clustering collecting processing method based on a domain knowledge model. The method comprises the following steps that a text information training set is searched, stemming preprocessing is conducted, and feature word vectors of a text participle sequence of the information training set are extracted through Chinese named entity identification and domain dictionary query modes; representative feature words of a target event are extracted through topic graph model learning training, and a weighted value of topic associating affiliation is calculated; a feature word set is built according to the topic associating affiliation weighted value, calculated through training, of the feature words, and an event topic word template is built; feature word vectors of a participle sequence accessed to text in real time are extracted through the Chinese named entity identification and domain dictionary query modes; the similarity distance of the feature word vectors and all the target event knowledge templates is calculated; the association relationship of multiple texts to the same topic target event is determined according to the similarity threshold, and classification reorganization is conducted by means of a similarity distance ordering rule.
Owner:10TH RES INST OF CETC

Entity relationship recognition method and apparatus

The present invention relates to an entity relationship recognition method and apparatus. The method comprises obtaining a statement sequence from a target text in a corpus, and performing named entity recognition and dependency grammar marker on the statement sequence to obtain a marked text sentence; matching and retrieving the marked text sentence on basis of an entity relationship seed to obtain a training example; replacing the entity relationship seed word in the training example with predetermined identification, processing the training example after replacement combined with the named entity recognition and the dependency grammar marker, and generating a candidate rule; fuzzifying the candidate rule to obtain fuzzy rules; determining whether the fuzzy rules comprise a new rule; and retrieving the corpus according to the fuzzy rules to obtain a seed set when the fuzzy rules comprise the new rule, and using the obtained seed set as an entity relationship recognition result. Manual participation can be effectively reduced, dependence on the calibrated corpus is reduced, a new entity relationship can be found timely, and the entity relationship recognition method and apparatus are self-adaptive to entity relationship mining in different fields.
Owner:LETV HLDG BEIJING CO LTD +1

Electronic medical record text named entity recognition method based on pre-trained language model

The invention belongs to the technical field of medical information data processing, and particularly relates to an electronic medical record text named entity recognition method based on a pre-training language model, which comprises the following steps: collecting an electronic medical record text from a public data set as an original text, and preprocessing the original text; labeling the preprocessed original text entity based on the standard medical term set to obtain a labeled text; inputting the annotation text into a pre-training language model to obtain a training text represented bya word vector; constructing a BiLSTM-CRF sequence labeling model, and learning the training text to obtain a trained labeling model; and taking the trained labeling model as an entity recognition model, and inputting a test text to output a labeled category label sequence. According to the method, text features and semantic information in the deep language model are obtained through training in the super-large-scale Chinese corpus, a better semantic compression effect can be provided, the problem that manual annotation is tedious and complex is avoided, the method does not depend on dictionaries and rules, and the recall ratio and accuracy of named entity recognition are improved.
Owner:SUZHOU INST OF BIOMEDICAL ENG & TECH CHINESE ACADEMY OF SCI

Chinese electronic medical record named entity recognition method

InactiveCN109871538ARich grammatical featuresReduce labeling errorsSpecial data processing applicationsMedical recordPart of speech
The invention discloses a Chinese electronic medical record named entity identification method. The method comprises the following steps: 1) constructing a common vocabulary dictionary; 2) simple part-of-speech tagging; 3) constructing a text and part-of-speech vector mapping table; 4) training a prediction model of the named entity; and 5) predicting the label of the named entity. According to the method, the part-of-speech characteristics are added to improve the boundary distinguishability of the named entity and the common vocabularies, so that the boundary accuracy of the named entity isimproved. At the same time, a self-attention mechanism is introduced into the bidirectional LSTM-CRF model, and the relevancy between the input at each moment and other components in the sentence is calculated, so that the long dependency problem is relieved, and the named entity recognition accuracy is improved.
Owner:SOUTH CHINA UNIV OF TECH

Bi-LSTM-based named entity identification method

The invention relates to a Bi-LSTM-based named entity identification method. The method comprises the steps that 1, a training corpus for named entity identification is tagged to form a tagged corpus;2, words and characters in the tagged corpus are converted into vectors; 3, a Bi-LSTM-based named entity identification model is built through the vectors of the words and the characters, and parameters of the Bi-LSTM-based named entity identification model are trained; and 4, named entity identification prediction is conducted on to-be-predicted data through the trained named entity identification model. According to the method, by adopting the vectors based on the words and the characters, features of the characters and the words can be obtained simultaneously, and meanwhile the unknown word problem can be avoided; and in addition, compared with a traditional pure CRF model algorithm, by adopting a bidirectional long short-term memory (Bi-LSTM) neural network, the method has the advantage that more character and word features can be absorbed, and therefore the entity identification precision can be improved.
Owner:北京知道未来信息技术有限公司

Network text named entity recognition method based on neural network probability disambiguation

The invention discloses a network text named entity recognition method based on neural network probability disambiguation. The method includes: performing word segmentation on an unlabeled corpus, utilizing Word2Vec to extract a word vector, converting a sample corpus into a word feature matrix, windowing, building a deep neural network for training, adding a softmax function into an output layer of the neural network, and performing normalization to acquire a probability matrix of named entity type corresponding to each word; re-windowing the probability matrix, and utilizing a condition random field model for disambiguation to acquire final named entity annotation. A word vector increment learning method without changing structure of the neural network is provided according to the characteristic that network words and new words exist, and a probability disambiguation method is adopted to deal with the problem that network texts are nonstandard in grammatical structure and contain a lot of wrongly written or mispronounced characters, so that the method has high accuracy in network text named entity recognition tasks.
Owner:CHINA UNIV OF MINING & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products