SVM text classification-based accurate resume parsing method

A parsing method and text classification technology, which is applied in the field of accurate resume parsing based on SVM text classification, can solve problems such as lack of parsing result information, loss of useful content information, and parsing result errors, achieving high block accuracy and avoiding information The effect of loss, avoiding parsing errors

Active Publication Date: 2017-11-24
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF5 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are generally two big disadvantages in doing this: first, if a resume keyword appears in the middle of a large piece of text, it will disconnect the original content, and the analysis result of this method will be It will be completely wrong, resulting in the lack of information in the analysis results, and the robustness of the algorithm is poor.
Second, if all files are converted into text when uploading, the original resume format information will be lost, and correspondingly a lot of useful content information will be lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • SVM text classification-based accurate resume parsing method
  • SVM text classification-based accurate resume parsing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

[0031] Such as figure 1 As shown, the implementation steps of the present invention are specifically as follows:

[0032] Introduction to each part of the method:

[0033] ●Resume format conversion technology

[0034] In order to avoid the problems caused by the current parsing technology purely relying on pattern matching, the present invention first cuts the uploaded resume into large sections. A general resume will be divided into several basic modules such as basic information, education experience, work experience, and project experience, and the font, font size or color of these titles will generally be different in content, and these differences can be reflected in the XML format of the resume. The XML format file will add a label to each line of the document. The content of the label includes font, font size, color, etc., which can be used ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an SVM text classification-based accurate resume parsing method comprising the following steps: (1) Microsoft office is operated under net framework, resume files in all kinds of formats are converted into a PDF format, and then the files are converted into an xml format from the PDF format; (2) labels of all resume text lines in the xml format are extracted and corresponding feature vectors are generated; (3) all the resume text lines are marked, an SVM approach is adopted for classification based training according to marked values and the feature vectors corresponding to all the resume text lines, and a classifier is obtained; (4) all resumes are cut according to the obtained classifier, extracted information is parsed in a partitioned manner, and therefore all the resumes can be parsed accurately.

Description

technical field [0001] The invention relates to a resume accurate analysis method based on SVM text classification, which is natural language processing, pattern recognition, AC automaton search technology and .net operation Microsoft word technology, and is a resume accurate analysis method integrating multiple technologies. Background technique [0002] At present, the general method of uploading resume analysis solutions on human resource websites in the market is as follows: convert the files uploaded by users into plain text format, list the dictionary of field names that need to be parsed, and then look up the words in these dictionaries in the resume. The word will return the content within a certain range in the future as the analysis result of this field. There are generally two big disadvantages in doing this: first, if a resume keyword appears in the middle of a large piece of text, it will disconnect the original content, and the analysis result of this method wi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06F17/30G06Q10/10
CPCG06F16/35G06Q10/1053G06V30/414
Inventor 毕翔薛云志刘张宇
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products