Paper fragmentation information extraction method based on machine learning
An information extraction and machine learning technology, applied in the field of information extraction, can solve problems such as poor effect, numerous, complex academic paper formats, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0022] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the embodiments and accompanying drawings.
[0023] Such as figure 1 As shown, it is a paper fragmentation information extraction method based on machine learning, including the following steps:
[0024] Step 10 adopts XPDF to extract the text content, picture and table of PDF, and preserves as xml form;
[0025] Unify the pdf documents into the library. Convert files in word, ppt, pdf and other formats into pdf format in a unified way, so that unified conversion of pdf in the database into xml format. figure 2 It is a unified structure of the database, in which the unique identifier of the attribute is the unique identifier of each article, the title is the title name of each article, and the status of the fragmentation task is the status of identifying the article. This algorithm mainly...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


