Table data structuring method based on machine learning

A tabular data and machine learning technology, applied in structured data retrieval, unstructured text data retrieval, database management systems, etc., can solve problems such as misjudgment and unrecognized headers, and achieve the effect of data structured processing

Active Publication Date: 2019-10-22
南京烽火星空通信发展有限公司
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0016] There are also problems with this method. For vertical headers and multiple headers in one header, there will also be misjudgments. For tables that do not know the data, the headers cannot be recognized.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Table data structuring method based on machine learning
  • Table data structuring method based on machine learning
  • Table data structuring method based on machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0054]The present invention designs a form data structuring method based on machine learning, which is used for structuring the data items in the electronic form to be processed. In specific practical applications, the following steps A to J are performed.

[0055] Step A. Count the number of objects in each cell in the spreadsheet with a preset number of samples, obtain each object and its corresponding quantity, build a dictionary table, and then use the following steps Ⅰ to Ⅱ to target the dictionary The table is updated, then go to step B.

[0056] Step Ⅰ. Obtain the maximum quantity value corresponding to each object in the dictionary table, and then enter step Ⅱ.

[0057] Step II. For each object in the dictionary table, perform the following steps II-1 to Step II-2 to update the number corresponding to the object...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a table data structuring method based on machine learning. The method includes: carrying out auantity statistics on objects in each cell in a large number of sample spreadsheets; constructing dictionary tables; according to the appearing times of the objects in each cell in the to-be-processed spreadsheet and the number of the objects corresponding to the dictionary table,obtaining the score of each cell in the to-be-processed spreadsheet; taking the score of each cell as the minimum unit; carrying out row and column comparison to obtain header rows or header columnsin the to-be-processed spreadsheet; obtaining the header items, then extracting and structuring the data items on the basis of the header items. The defects that in the prior art, only transverse headers are recognized and multiple headers cannot be recognized depending on rules are overcome, and data structured processing of the spreadsheet is accurately and efficiently achieved.

Description

technical field [0001] The invention relates to a table data structuring method based on machine learning, and belongs to the technical field of table data structuring. Background technique [0002] Electronic form is the most commonly used computer software tool. In the prior art, to a Sheet (electronic form) whose content is unknown, the data item of each cell can only be read after the file is opened. The steps are as follows: [0003] (1) Open the Excel file using the interface; [0004] (2) Use the interface to read the Sheet in the Excel file; [0005] (3) Use the interface to read the cells in the Sheet. [0006] During the execution of the above method, since the meaning of each data item is not known, the data structure cannot be completed. Because the meaning of the data item is described by the header of the table, the data cannot be understood without knowing the header of the table. Therefore, in order to complete the structuring of tabular data, some work u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/25G06F16/35G06F16/36
CPCG06F16/258G06F16/35G06F16/374
Inventor 廖闻剑李曙光宋万军姜广栋杨万刚尹若成
Owner 南京烽火星空通信发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products