Resume information extraction method and system

An information extraction and resume technology, applied in the field of information extraction, can solve the problems of no classification and exact answers, lower accuracy of supervised classification methods, word segmentation affecting information extraction results, etc., to save training time and reduce manual work. Quantity, good effect of semi-supervised training effect

Active Publication Date: 2020-03-17
DONGGUAN UNIV OF TECH +1
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] 1. Because traditional deep learning is a kind of batch learning, all data needs to be prepared before each learning, and the model must be retrained for each learning, which will consume a lot of time and space, thereby affecting efficiency;
[0008] 2. The current traditional information extraction technology must perform word segmentation on the text, and then perform feature vectorization on the words before proceeding to the next step.
However, in Chinese texts, there is no clear dividing line between words, so the quality of word segmentation will affect the final information extraction results;
[0009] 3. At present, the supervised classification method is widely used and is the mainstream of classification technology. However, the accuracy of supervised classification is directly affected by the number of training samples and the quality of labeling. It takes a lot of manpower and time to train or label data. Otherwise, the amount of data Less and poorly labeled data will greatly reduce the accuracy of the supervised classification method; while the purpose of the unsupervised classification method is to make a similar comparison of the underlying structure or distribution in the sample, stacking similar objects together, and there is no clear Classification and exact answers, although this method saves manpower and time, it has great uncertainty;
The features learned by these semi-supervised training methods may not be the features that the model really needs, resulting in overfitting.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Resume information extraction method and system
  • Resume information extraction method and system
  • Resume information extraction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be described in further detail below in conjunction with accompanying drawings and examples.

[0033] see Figure 1-2 As shown, the present invention relates to a resume information extraction method, comprising the following steps:

[0034] A. Obtain resume data;

[0035] B. Use BERT Chinese pre-training model and data augmentation technology to convert resume data into resume text and classify according to its sentence features;

[0036] C. Use the BERT+BiGRU+CNN+CRF model to perform named entity recognition on the classified resume text sentences, and then extract the required information elements;

[0037] D. Store the extracted information elements in the database, and output the corresponding information in a structured manner.

[0038] In order to reduce the impact of word segmentation on the processing results of the above technical solution, in step B, the sentences in the resume text are directly converted into vectors as the inpu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a resume information extraction method and system. The resume information extraction method comprises the steps: A, obtaining resume data; B, converting the resume data into resume texts by utilizing a BERT Chinese pre-training model and a data augmentation technology, and classifying the resume texts according to sentence characteristics of the resume texts; C, conductingnamed entity recognition on the classified resume text sentences through a BERT + BiGRU + CNN + CRF model, and then extracting needed information elements; and D, storing the extracted information elements in a database, and outputting corresponding information in a structured manner. The resume information extraction system is mainly composed of a resume acquisition module, an input module, a classification module, an information element extraction module, a storage module and an output module. The resume information extraction method uses an incremental learning method, and uses a clause mode in data preprocessing of the classification model, so that the language model can adjust parameters by incrementally inputting new training data on the basis of inheriting past parameters, and theresume information extraction method has better continuity and generalization ability.

Description

technical field [0001] The invention relates to the technical field of information extraction, in particular to a resume information extraction method and system. Background technique [0002] With the rapid development of modern information technology and storage technology and the rapid spread of the Internet, people will frequently come into contact with various text information in daily life, and text information has become the most transmitted data part of the Internet. In the face of massive data, how to extract and organize useful parts is an urgent display problem. Therefore, people have proposed information extraction technology, which uses automation technology to find the information that is really needed from massive data, and text information extraction technology generally refers to extracting information such as entities, relationships, and events contained in natural language texts, and extracting them. A text processing technique that is structured and stor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/25G06F16/35G06F16/36
CPCG06F16/36G06F16/258G06F16/35Y02D10/00
Inventor 张剑苏彦源章志
Owner DONGGUAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products