Big data cleaning method used for digital library

A digital library and big data technology, applied in the field of data cleaning, can solve problems such as prone to errors, low efficiency, and wrong attribute modification, and achieve the effects of ensuring correctness and accuracy, reducing cleaning costs, and reducing errors

Inactive Publication Date: 2018-08-14
安徽千云度信息技术有限公司
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a big data cleaning method for digital libraries, by defining a data cleaning scheme, preprocessing data, filling missing values, repairing inconsistent data, modifying attribute errors, cleaning duplicate data, and cleaning data Backflow to the target data source solves the problem of the existing database system. In the batch data cleaning, the user cannot actively participate, and there is a lack of interaction with the user during the cleaning process. The user cannot control the process and cannot process the data in the process. Abnormal, error-prone, low efficiency, traditional systems are difficult to meet the needs of users, and the cost of equipment is too high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data cleaning method used for digital library
  • Big data cleaning method used for digital library

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0019] see figure 1 Shown, the present invention is a kind of big data cleaning method that is used for digital library, comprises big data of library, and the method for cleaning big data of library is following steps: SS01 needs analysis, big data category analysis, Task definition, obtain data cleaning plan; SS02 preprocess the data, detect incomplete data, logical error data, abnormal data, redundant data, and make statistics on the detection results; SS03 dete...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big data cleaning method used for a digital library, and relates to the technical field of data cleaning. The method comprises the following steps of defining data cleaning schemes; preprocessing data; determining the type of dirty data and the corresponding cleaning scheme; performing missing value filling; performing inconsistent data restoration; automatically detecting attribute errors in a data set; cleaning repeated data; and enabling clean data to flow back to a target data source. By defining the data cleaning schemes by a user, preprocessing the data, performing the missing value filling, performing the inconsistent data restoration, modifying the attribute errors, cleaning the repeated data and enabling the clean data to flow back to the target data source, the interaction between the data and the user in the cleaning process is improved; the user controls the cleaning process in real time; and exceptions in the cleaning process are processed, so that the errors are reduced, the efficiency is improved, the data cleaning cost is reduced, the data quality is improved, the correctness and accuracy of data mining are ensured, and a high-quality mining result is obtained.

Description

technical field [0001] The invention belongs to the technical field of data cleaning, and in particular relates to a big data cleaning method used in digital libraries. Background technique [0002] With the advent of the era of big data, people's demand for intelligent information is more urgent, which poses new challenges for library management and services. With the help of the Internet and mobile devices, the abundant electronic books on the Internet have gradually gained people's favor, while the traditional paper books have gradually been left out. In this case, the construction and management of the library should also keep pace with the times, and fully combine its own rich data resources with the high degree of sharing of the Internet to promote the construction of the library's database. [0003] Big data mining and analysis can discover the relevant relationship of data, and generate the relationship and law of big data value. The library’s big data comes from a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2465G06F16/215G06F2216/03
Inventor 杨良军
Owner 安徽千云度信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products