Resume information extraction method based on cascading sequence annotation

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of sequence labeling and information extraction, which is applied to instruments, text database query, unstructured text data retrieval, etc. It can solve the problem of not considering the structure of resume text blocks, and achieve the effect of solving the problem of confusion.

Active Publication Date: 2020-11-20

THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP

View PDF3 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The current information extraction technology is only aimed at the extraction of shorter text fragments, and cannot handle the extraction of long text fragments in units of sentences, nor does it consider the block structure of the resume text itself

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0051] Such as figure 1 As shown, the present invention provides a method for extracting resume information based on stacked sequence annotation, comprising the following steps:

[0052] Step 1, use pdfminer to analyze the resume file in pdf format, and parse the rich-text resume into a text representation in common format;

[0053] Step 2, data labeling during training: use remote supervised data to back-label and merge similar items during the labeling process;

[0054] Step 3, divide the resume information into blocks: divide the resume into 4 blocks, and train the classifier to divide the text into blocks;

[0055] Step 4, using the two-layer sequence labeling model to realize information extraction at the sentence level and short text segment level.

[0056] Step 1 includes:

[0057] PDF is a kind of rich text, which needs to be parsed into ordinary plain text format first. The parsing process will involve issues of column division, section division and line breaking. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a resume information extraction method based on cascading sequence annotation. The method comprises the following steps of 1, analyzing a pdf resume by using a pdfminer, and converting an original pdf into a multi-line text representation; wherein the process mainly solves the problems of disordered sequence and wrong broken lines; step 2, training process data marking, utilizing remotely-supervised data back marking and combining similar items in the marking process; step 3, resume information block division, for sentences obtained through pdfminer, judging the block where each sentence is located according to the classification of each sentence; and step 4, realizing information extraction at a sentence level and a short text fragment level by utilizing the double-layer sequence labeling model. The method is advantaged in that filtering is subsequently realized by utilizing resume block information, so the recall rate is effectively improved, and meanwhile, accuracy is not greatly reduced; through four stages, extraction of the resume information can be effectively realized.

Description

technical field [0001] The invention relates to a resume information extraction method based on cascading sequence annotation. Background technique [0002] The extraction of key information from a resume includes four categories: attribute information, education experience, work experience, and project experience. Specific attribute information includes: name, date of birth, gender, telephone number, highest education level, place of origin, settled city and county, and political status; education experience includes: graduate school, degree, graduation time; work experience includes: work unit, work content , position, working hours; project experience includes: project name, project responsibility, project time. Among these 18 types of information, job content and project responsibility are extracted at the key sentence level, and other attributes are extracted from relatively short text fragments. [0003] The current information extraction technology is only aimed at ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/33G06F16/35G06F40/205G06F40/289G06K9/62

CPCG06F16/3344G06F16/35G06F40/205G06F40/289G06F18/214

Inventor徐建郭培胜徐琳李晓冬

OwnerTHE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP

Resume information extraction method based on cascading sequence annotation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology