A Machine Learning-Based Method for Structuring Tabular Data

A tabular data and machine learning technology, applied in structured data retrieval, unstructured text data retrieval, database management system, etc., can solve problems such as misjudgment and inability to recognize table headers, and achieve the effect of structured data processing

Active Publication Date: 2021-04-06
南京烽火星空通信发展有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] There are also problems with this method. For vertical headers and multiple headers in one header, there will also be misjudgments. For tables that do not know the data, the headers cannot be recognized.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Machine Learning-Based Method for Structuring Tabular Data
  • A Machine Learning-Based Method for Structuring Tabular Data
  • A Machine Learning-Based Method for Structuring Tabular Data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0054]The present invention designs a form data structuring method based on machine learning, which is used for structuring the data items in the electronic form to be processed. In specific practical applications, the following steps A to J are performed.

[0055] Step A. Count the number of objects in each cell in the spreadsheet with a preset number of samples, obtain each object and its corresponding quantity, build a dictionary table, and then use the following steps Ⅰ to Ⅱ to target the dictionary The table is updated, then go to step B.

[0056] Step Ⅰ. Obtain the maximum quantity value corresponding to each object in the dictionary table, and then enter step Ⅱ.

[0057] Step II. For each object in the dictionary table, perform the following steps II-1 to Step II-2 to update the number corresponding to the object...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a table data structuring method based on machine learning, which performs quantitative statistics on the objects in each cell in a large number of sample electronic forms to form a dictionary table, and combines the number of occurrences of objects in each cell in the electronic form to be processed , and its corresponding quantity in the dictionary table, obtain the score of each cell in the spreadsheet to be processed, take the score of each cell as the minimum unit, and realize the header row or The header column is obtained, thereby obtaining each header item, and then extracting and structuring data items based on each header item, which solves the problem of relying on rules, only recognizing horizontal headers, and failing to recognize multiple tables in the prior art. The shortcomings of headers are used to realize the data structure processing of spreadsheets accurately and efficiently.

Description

technical field [0001] The invention relates to a table data structuring method based on machine learning, and belongs to the technical field of table data structuring. Background technique [0002] Electronic form is the most commonly used computer software tool. In the prior art, to a Sheet (electronic form) whose content is unknown, the data item of each cell can only be read after the file is opened. The steps are as follows: [0003] (1) Open the Excel file using the interface; [0004] (2) Use the interface to read the Sheet in the Excel file; [0005] (3) Use the interface to read the cells in the Sheet. [0006] During the execution of the above method, since the meaning of each data item is not known, the data structure cannot be completed. Because the meaning of the data item is described by the header of the table, the data cannot be understood without knowing the header of the table. Therefore, in order to complete the structuring of tabular data, some work u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/25G06F16/35G06F16/36
CPCG06F16/258G06F16/35G06F16/374
Inventor 廖闻剑李曙光宋万军姜广栋杨万刚尹若成
Owner 南京烽火星空通信发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products