Data extraction method and system of CSV-format (comma-separated value format) files

A format file and data extraction technology, applied in the direction of electrical digital data processing, special data processing applications, natural language data processing, etc., to achieve the effects of strong scalability, efficient and easy-to-understand methods and systems, and simple methods and systems

Inactive Publication Date: 2018-08-14
EAST CHINA NORMAL UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1. There may be a line break in a column of cells in each record

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data extraction method and system of CSV-format (comma-separated value format) files
  • Data extraction method and system of CSV-format (comma-separated value format) files
  • Data extraction method and system of CSV-format (comma-separated value format) files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0040] Such as figure 1 As shown, the present invention provides a method for extracting data based on a UTF-8 encoded CSV format file, comprising the following steps: S110, traversing each character of the CSV format file in order; S120, only when the character is When the end character and the quotation marks mark are closed, it is determined that the CSV format file is extracted successfully. If the character is the file end cha...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data extraction method and system of CSV-format (comma-separated value format) files. The method comprises the steps of traversing each character of a CSV-format file in sequence; confirming that the CSV-format file is successfully extracted only when a file end character occurs and a quotation mark identifier is closed; confirming that the format of the CSV-format file is incorrect if the file end character occurs but the quotation mark identifier is not closed. The method and system are simple, efficient and easily comprehensible; various special conditions are centrally handled in several steps, and the method and system are convenient to apply to the development of various software systems, may be applied across programming languages, and are extensible to other text coding formats.

Description

technical field [0001] The invention relates to the technical field of domain name trusteeship, in particular to a method and system for extracting data in CSV format based on UTF-8 encoding. Background technique [0002] In the field of domain name hosting, registrars will transmit CSV files to the registry, and the registry needs to extract and verify the field values ​​​​of these CSV files, and the encoding format of the files is specified in UTF-8 format. UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, also known as Unicode. Created in 1992 by Ken Thompson. It has now been standardized as RFC 3629. UTF-8 encodes Unicode characters with 1 to 4 bytes. When used on a web page, the simplified and traditional Chinese characters and other languages ​​(such as English, Japanese, and Korean) can be displayed on a unified page. [0003] For Comma-Separated Values ​​(CSV) text, its files store tabular data (numbers and text) in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22
CPCG06F40/126
Inventor 黄滟鸿熊家文史建琦何积丰李昂
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products