Improved method for sorting data set through sorting keywords
A keyword and data set technology, applied in the field of big data, can solve the problems of cumbersome data cleaning steps, low cleaning efficiency of duplicate records, etc., to increase the probability of being identified as duplicate records, and increase the initial clustering to adjacent locations. The opportunity of space, the effect of improving cleaning efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Examples
Embodiment
[0029] Improved methods for sorting datasets by sort keys, including,
[0030] Step 1, preprocessing;
[0031] Step 2, duplicate record detection, realize duplicate record detection through field matching and record matching;
[0032] Step 3, clustering of duplicate records at the database level, the algorithm for detecting duplicate records at the database level clusters the duplicate records in the entire data set;
[0033] Step 4, use external source files to correct the errors in the sorting keywords and unify the data format;
[0034] Step 5, sorting the words in the sorting keywords;
[0035] Step 6, conflict handling, merge or delete the detected duplicate records in the same duplicate record cluster according to the rules, and only keep the correct record.
[0036] The preprocessing of step 1 includes,
[0037] Step 11, attribute selection, select an attribute for record matching;
[0038] Step 12, preliminary clustering, sorting the records in the database;
[0...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com