Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Resume extraction method based on deep neural network

A technology of deep neural network and resume, applied in the direction of neural learning method, biological neural network model, neural architecture, etc., can solve the problems of unclear words and words, difficult to learn, rich vocabulary features, etc., to reduce complexity and word segmentation effects, maintenance and extraction with ease

Active Publication Date: 2019-04-16
DONGGUAN UNIV OF TECH
View PDF7 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For the information element extraction system of resumes, most of the existing methods are based on rule template extraction, which has defects: 1. The effect of word segmentation in the early stage is not good, and the quality of word representation will directly affect the final information element labeling and recognition results At present, in the Chinese environment, Chinese word segmentation must be performed first. The effect of the early word segmentation will directly affect the subsequent named entity recognition process. However, because there is no obvious boundary between words, the early word segmentation has always been in the industry. It is a bottleneck problem; 2. In the Chinese word group, words have strong flexibility, which makes the vocabulary huge, and at the same time, the vocabulary features are rich and difficult to learn, and the keyword is regarded as a vocabulary combination, which makes the vocabulary role very complicated, such as key The components of words may be segmented into other non-keywords, that is to say, the method of obtaining features after word segmentation greatly increases the complexity of machine learning; 3. Traditional resume information extraction is mainly based on rule templates. Customized rules can only be aimed at a specific resume in a specific format, and it seems powerless in the face of a large number of complicated resume texts. Not only do you need to constantly add, modify and maintain existing rules, but you also need to deal with conflicts between rules ; 4. The current traditional rule extraction first needs to identify and locate the information elements concerned, and then according to linguistic features (such as part-of-speech features or subject-verb-object position information) and related format information (such as paragraph information, punctuation information) ) etc. to customize the corresponding extraction rules. Such extraction rules first require professional domain knowledge and familiarity with the corresponding linguistic features, and the dictionary library must be continuously updated and maintained. At the same time, if the information element in the dictionary library does not exist, it will be omission, so poor versatility
There is also a resume parsing method based on deep learning in the prior art. Chinese invention patent application specification CN106569998A discloses a text named entity recognition method based on Bi-LSTM, CNN and CRF. The information at the character level of the word is encoded and converted into a character vector; the character vector and the word vector are combined and passed as input to the bidirectional LSTM neural network to model the context information of each word; at the output of the LSTM neural network, use The continuous conditional random field is used to decode the label of the entire sentence, and mark the entities in the sentence; Chinese invention patent application specification CN108664474A A resume analysis method based on deep learning, which includes the following steps: data preprocessing: uniformly convert the resume into Text format, determine the content segment label of the resume, and mark the data of the resume text by row; model training: use the neural network to express the resume text by row as a fixed-length vector, after obtaining the row vector, according to the row The content of the resume is segmented by the vector; information extraction: after the content is segmented, the label field is extracted from the specified content segment to obtain relevant information; The context information of each word is modeled in the bidirectional LSTM neural network, and the word segmentation of the text can affect the accuracy of judging the context information, and it will also affect the results of the subsequent named entity recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Resume extraction method based on deep neural network
  • Resume extraction method based on deep neural network
  • Resume extraction method based on deep neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the application. It can be understood that the terms "first", "second" and the like used in the present invention can be used to describe various elements herein, but these elements are not limited by these terms. These terms are only used to distinguish one element from another element.

[0022] figure 1 It shows that a method for extracting resumes based on a deep neural network in this embodiment includes the following steps: Step S1, data preprocessing: obtaining resume data text, characterizing the obtained resume data text, and obtaining word vector features and word sequences feature, obtain the word vector data s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a resume extraction method based on a deep neural network, and the method comprises the steps: data preprocessing: obtaining a resume data text, carrying out the word division, obtaining a word vector feature and a word sequence feature, and obtaining a word vector data set and a word sequence data set; training a deep neural network: training to obtain a deep neural network training model, simultaneously taking the word vector data set and the word sequence data set as feature input of the deep neural network training model, taking semantic features obtained through training as output features, and performing entity labeling by utilizing the output semantic features to obtain entity labels; label matching analysis: according to a corresponding extraction rule in apre-trained information element extraction rule base, matching information element phrases in the marked resume data text, and returning information element labels and information element phrase pairs of the resume data text; According to the method, the word vectors and the word sequences are used as input characteristics, and the recognition accuracy is improved by combining a deep neural network and a rule-based text analysis technology.

Description

technical field [0001] The invention relates to the technical field of text processing, in particular to a resume extraction method based on a deep neural network. Background technique [0002] With the rapid development of modern information technology and storage technology and the rapid spread of the Internet, people will frequently come into contact with various text information in daily life, and text information has become the most transmitted data part of the Internet. In the era of big data, what people lack is not information, but to obtain useful information that people care about from the massive and complicated information. For the information element extraction system of resumes, most of the existing methods are based on rule template extraction, which has defects: 1. The effect of word segmentation in the early stage is not good, and the quality of word representation will directly affect the final information element labeling and recognition results At presen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06N3/04G06N3/08
CPCG06F40/279Y02D10/00
Inventor 张剑章志
Owner DONGGUAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products