Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data cleaning and indexing method

A data cleaning and indexing technology, applied in text database indexing, structured data retrieval, electronic digital data processing, etc., can solve the problems of resource waste, resource consumption, summarization, use, etc., and achieve the effect of intuitive interface and convenient operation

Active Publication Date: 2017-05-03
河南信安通信技术股份有限公司
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, most of the data comes from various platforms or the summary of data from all parties. Since these data are not organized and organized before, all the data is in a fragmented state. If you still use ordinary data collection, data cleaning , the way of data retrieval, since these retrieval tools are ready-made tools, it is impossible to further improve and modify the tools, so basically all these data cannot be summarized on the basis of existing retrieval tools. , which has caused a lot of trouble and a waste of resources for the enterprise, because only by trying on the basis of the existing retrieval methods can it be found that the content cannot be used or the retrieval effect is not good. This process itself is a waste of resources. process, so continue to make targeted improvements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be further described below through specific embodiments.

[0023] A data cleaning and indexing method, characterized in that: the specific process is as follows:

[0024] 1) Data cleaning and importing:

[0025] For the cleaning of structured data, in the process of data cleaning and importing, in the existing RDBMS database, the part with less key data is synchronized to the distributed database cluster of the data center by means of online synchronization through the cleaning tool; The large part is transmitted to the distributed database cluster in the data center by means of files, or called in real time through the interface.

[0026] The information entered on the WEB platform is directly synchronized to the distributed data cluster in the data center after being processed by the cleaning tool.

[0027] For the cleaning of unstructured data, related audio, video, pictures, etc. and other large files in each system are exchanged and pro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data cleaning and indexing method, belonging to the field of data retrieval. The data cleaning and indexing method comprises the following specific processes of: (1), cleaning and importing data: cleaning structured data, and cleaning unstructured data; (2), establishing metadata: 1, obtaining the metadata, integrating different metadata in multiple sources of a data centre through a metadata obtaining process, and performing unified storage management by using a database as a metadata knowledge base; 2, publishing the metadata, and managing publishing of the metadata by establishing a set of metadata publishing process; and 3, accessing the metadata, creating a set of metadata access permission conferring and managing process mechanism, and controlling effective access of a valid user to metadata data; and (3), constructing an index: obtaining data attributes on a distributed data cluster of the data centre through metadata access. The overall data attributes are very clear; the structured and unstructured data are distinguished definitely; for different data structural types, a cleaning tool can be selected in a targeted manner; therefore, the best cleaning conversion effect can be realized; and the data integration conversion efficiency can be greatly increased.

Description

technical field [0001] The invention belongs to the field of data retrieval, and in particular relates to a data cleaning and indexing method based on a cloud platform multi-dimensional data fusion analysis system. Background technique [0002] At present, in terms of enterprise platform construction, it is mainly based on multi-party data, such as manually imported data, including txt format, excel format, csv format, etc., synchronized data, data obtained from other business systems in real time, etc., through The analysis of the data, the directional summary of the data, for later retrieval, but with the continuous expansion of the scale of the enterprise, the continuous growth of the business volume, the multi-party data generated by itself is also increasing sharply, and the traditional data is used for the general data. method or purchase a ready-made retrieval method to perform data cleaning and retrieval. However, most of the data comes from various platforms or the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/215G06F16/2228G06F16/25G06F16/284G06F16/31
Inventor 张国杰邵晓艳郭晓丽郭学明
Owner 河南信安通信技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products