Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A data cleaning and indexing method

A data cleaning and indexing technology, applied in the field of data cleaning based on the multi-dimensional data fusion analysis system of the cloud platform, can solve the problems of poor retrieval effect, resource consumption, resource waste, etc., to achieve good cleaning conversion effect and clear data attributes , the effect of improving efficiency

Active Publication Date: 2020-05-01
河南信安通信技术股份有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, most of the data comes from various platforms or the summary of data from all parties. Since these data are not organized and organized before, all the data is in a fragmented state. If you still use ordinary data collection, data cleaning , the way of data retrieval, since these retrieval tools are ready-made tools, it is impossible to further improve and modify the tools, so basically all these data cannot be summarized on the basis of existing retrieval tools. , which has caused a lot of trouble and a waste of resources for the enterprise, because only by trying on the basis of the existing retrieval methods can it be found that the content cannot be used or the retrieval effect is not good. This process itself is a waste of resources. process, so continue to make targeted improvements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be further described below through specific embodiments.

[0023] A data cleaning and indexing method, characterized in that: the specific process is as follows:

[0024] 1) Data cleaning and importing:

[0025] For the cleaning of structured data, in the process of data cleaning and importing, in the existing RDBMS database, the part with less key data is synchronized to the distributed database cluster of the data center by means of online synchronization through the cleaning tool; The large part is transmitted to the distributed database cluster in the data center by means of files, or called in real time through the interface.

[0026] The information entered on the WEB platform is directly synchronized to the distributed data cluster in the data center after being processed by the cleaning tool.

[0027] For the cleaning of unstructured data, related audio, video, pictures, etc. and other large files in each system are exchanged and pro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data cleaning and indexing method, which belongs to the field of data retrieval. The specific process is: 1) data cleaning and importing: including cleaning of structured data and cleaning of unstructured data; 2) establishment of metadata : (1) Metadata acquisition, integrating different metadata from multiple sources in the data center through the metadata acquisition process, using the database as a metadata knowledge base for unified storage and management; (2) Metadata publishing, establishing a set of metadata Release process to manage the release of metadata; (3) Metadata access, establish a set of metadata access authority granting and management process mechanism to control the effective access of legal users to metadata data; 3) Index construction: through metadata access , to obtain the data attributes on the distributed data cluster of the data center. The overall data attributes of the present invention are very clear, and the distinction between structured and unstructured data is very clear. Targeted selection of cleaning tools for different data structure types enables it to achieve the best cleaning and conversion effect, greatly improving data integration conversion efficiency.

Description

technical field [0001] The invention belongs to the field of data retrieval, and in particular relates to a data cleaning and indexing method based on a cloud platform multi-dimensional data fusion analysis system. Background technique [0002] At present, in terms of enterprise platform construction, it is mainly based on multi-party data, such as manually imported data, including txt format, excel format, csv format, etc., synchronized data, data obtained from other business systems in real time, etc., through The analysis of the data, the directional summary of the data, for later retrieval, but with the continuous expansion of the scale of the enterprise, the continuous growth of the business volume, the multi-party data generated by itself is also increasing sharply, and the traditional data is used for the general data. method or purchase a ready-made retrieval method to perform data cleaning and retrieval. However, most of the data comes from various platforms or the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G06F16/22G06F16/25G06F16/28G06F16/31
CPCG06F16/215G06F16/2228G06F16/25G06F16/284G06F16/31
Inventor 张国杰邵晓艳郭晓丽郭学明
Owner 河南信安通信技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products