Unlock instant, AI-driven research and patent intelligence for your innovation.

Data extraction method and device, server and storage medium

A data extraction and data value technology, applied in the computer field, can solve the problems of low extraction efficiency, low data extraction efficiency, complex data extraction process, etc., to simplify the process of extracting data storage, and improve the efficiency and accuracy of data extraction.

Inactive Publication Date: 2019-07-19
BEIJING BANGCLE TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] For the diverse raw data in JSON format collected by crawlers, due to the complicated matching rules of regular expressions, the data extraction process is complicated, the extraction efficiency is low, and the error rate is high
The JSONPath method can only perform batch extraction of data with the same name in multiple levels in the JSON data, but cannot perform batch extraction of data with different names, making the data extraction efficiency low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data extraction method and device, server and storage medium
  • Data extraction method and device, server and storage medium
  • Data extraction method and device, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the relevant application, rather than limit the application. It should also be noted that, for ease of description, only the parts relevant to the application are shown in the drawings.

[0026] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The application will be described in detail below with reference to the accompanying drawings and embodiments.

[0027] It can be understood that the data collected by crawlers is usually the original data in JSON format (in the form of KEY-VALUE key-value pairs). However, only a part of the original data is valuable, and only part of the valuable data in the original data needs to be extracted, and corr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data extraction method and device, a server and a storage medium, and the method comprises the steps: obtaining a corresponding relation list of an original data name and a target data name and original data, and enabling the original data to comprise the original data name and a data value corresponding to the original data name; based on the corresponding relation list,extracting a data value corresponding to the original data name from the original data; and storing the extracted data value, wherein the stored data value is used as target data corresponding to thetarget data name. According to the data extraction method and device provided by the embodiment of the invention, batch extraction of the target data is realized by utilizing the corresponding relation list of the original data name and the target data name, and the data extraction efficiency and accuracy are improved.

Description

[0001] technology neighborhood [0002] The present application generally relates to the field of computer technology, and specifically relates to a data extraction method, device, server and storage medium. Background technique [0003] When extracting data from web pages, crawlers are used to extract, clean and store useful target data in a structured manner from raw data. Commonly used extraction methods include regular expressions, and JSONPath. [0004] A regular expression is a logical formula for string operations, that is, a "rule string" is formed by using predefined specific characters and combinations of specific characters, and the "rule string" is used to express the filtering logic of strings . By means of regular matching and adding matching rules, the data to be extracted can be matched. JSONPath is an information extraction class library, which is a tool for extracting specified information from JavaScript Object Notation (JSON), and extracts JSON data in m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951
Inventor 阚志刚陈彪赵震邓凌峰吴杨彭文波
Owner BEIJING BANGCLE TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More