Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Neural network-based scholarship user portrait information extraction method and model

A neural network and information extraction technology, applied in the field of portrait information extraction, can solve problems such as inability to achieve scholar user information extraction, poor versatility, etc., and achieve the effect of solving long-term dependencies.

Active Publication Date: 2019-04-19
SOUTH CHINA UNIV OF TECH
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The extraction method of manually written rules needs to write specific rules for each type of page, and depends on specific domain knowledge, so the versatility is very poor; although traditional machine learning methods have improved this problem to a certain extent, they still cannot Ways to handle text nodes with long distances in web pages well
As Web pages become more and more diverse and complex, these problems are becoming more and more prominent, and the existing methods cannot well realize the information extraction of scholars and users.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural network-based scholarship user portrait information extraction method and model
  • Neural network-based scholarship user portrait information extraction method and model
  • Neural network-based scholarship user portrait information extraction method and model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0045] Such as figure 1 As shown, a neural network-based method for extracting scholar user portrait information, the neural network is a Bi-LSTM-CRF neural network, comprising the following steps:

[0046] S1. Filter out the text information in the webpage through text preprocessing, delete blanks and comment characters, and extract the simplified content body; the webpage text is embedded in the HTML format tag, and the short text node corresponds to an entity, and the text of the tag is The node is used as the basic extraction unit;

[0047] S2. Construct thesaurus tables for all text nodes and their characters of the identified webpage, and convert the text nodes of each label into an n-dimensional vector, expressed as a word vector w;

[0048] S3. Extract the cont...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a neural network-based scholarship user portrait information extraction method, which comprises the following steps of: carrying out text preprocessing on a webpage, and carrying out structure adjustment and entity labeling on the webpage; constructing Word library tables for all the text nodes and characters thereof of the identified webpage; Extracting context features of the text nodes and the front and rear nodes, and then training a text node sequence to obtain a word vector h containing node sequence context information; And performing decoding calculation on theword vector output, correspondingly obtaining the score of each word vector on the target tag, calculating the tag probability distribution of the node sequence, decoding a model output result, obtaining an optimal prediction tag sequence when the target function is minimum, completing model construction, and performing model training. According to the method, end-to-end training of the model canbe realized, and the problem of long-term dependency between target extraction entities is effectively solved by utilizing the sequence memory characteristic of the LSTM network.

Description

technical field [0001] The invention relates to the field of image information extraction, in particular to a neural network-based method and model for extracting user image information of scholars. Background technique [0002] With the rapid development of applications such as the Internet, mobile Internet, and the Internet of Things, the amount of global data has increased significantly, and user portraits have become one of the most important applications in the context of big data technology. As the first step in user portrait extraction, user information extraction lays the foundation for subsequent user portrait mining and analysis, and largely determines the accuracy and completeness of the final model. In recent years, with the development of big data technology, many studies on user information extraction models have emerged. In terms of information extraction for scholars, most of the current research is to abstract it into a sequence labeling (Sequence Label) pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/9535G06F16/335G06F17/27G06N3/04G06N3/08
CPCG06N3/08G06F40/295G06N3/045Y02D10/00
Inventor 林伟伟游德光吴梓明温昂展
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products