Social media named entity identification method based on affix perception

A technology of social media and named entities, applied in neural learning methods, biological neural network models, instruments, etc., can solve problems such as irregular noun abbreviations, non-conforming grammar, performance degradation, etc., and achieve the effect of enriching semantic representation and improving effects

Inactive Publication Date: 2020-05-15
SOUTH CHINA UNIV OF TECH
View PDF12 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although these methods have achieved good performance on news texts, when faced with social media data, due to the inherent characteristics of social media, such as informal expressions, irregular noun abbreviations, ungrammatical expressions, With more unregistered words, etc., the performance of such methods will drop sharply

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Social media named entity identification method based on affix perception
  • Social media named entity identification method based on affix perception
  • Social media named entity identification method based on affix perception

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0020] This embodiment provides a method for identifying a social media named entity based on affix perception, the flow chart of the method is as follows figure 1 shown, including the following steps:

[0021] (1) Collect social media datasets that have marked named entities, each piece of data contains original text and marked named entities.

[0022] The collected social media data set is used as a training set.

[0023] (2) Preprocess the text in the data set, and construct the index vector representation of the text at the word level and the index vector representation of the text at the character level.

[0024] Specifically, the pretreatment includes:

[0025] Replace all lowercase letters in the text with their corresponding uppercase letters;

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a social media named entity identification method based on affix perception. The method comprises the following steps: collecting a social media data set marked with named entities; capturing embedded representation, character level representation and affix feature representation of the words, and fusing the embedded representation, the character level representation and the affix feature representation of the words to serve as final representation of the words; inputting the obtained final representation of the word into a bidirectional convolutional neural network anda conditional random field, predicting a label sequence and calculating a loss value; training the model by adopting a random gradient descent algorithm according to the obtained loss value; and inputting the text into the trained model, and identifying the named entity in the text. According to the method, semantic representation of the words is enriched, the problem of unregistered words in thesocial media data is relieved, and the named entity recognition effect is improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a social media named entity recognition method based on affix perception. Background technique [0002] In today's world, with the vigorous development of the mobile Internet, people publish information on social media all the time, which constitutes a huge amount of social media data. Compared with traditional newswire manuscript data, data on social media is more time-sensitive. These data contain rich information and have gradually become a potential source of information for many applications, such as news hotspot tracking, user public opinion analysis, and potential violent incidents. Early warning, etc. Therefore, how to mine potential information from social media data has become an important task. Entity extraction is a fundamental task of information extraction, and building a powerful entity extraction system for these applications is essen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/242G06F16/31G06N3/04G06N3/08
CPCG06F16/316G06N3/084G06N3/045
Inventor 蔡毅吴志威
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products