Unlock instant, AI-driven research and patent intelligence for your innovation.

Patent data cleaning method and system based on AdaBoost algorithm

A data cleaning and patented technology, applied in the field of data processing, can solve problems such as inability to effectively process data sets, performance cannot meet requirements, data quality degradation, etc., to avoid overfitting, low error rate, and high accuracy rate. Effect

Inactive Publication Date: 2018-04-13
HEBEI UNIV OF ENG
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the explosive growth of data volume, the task of data cleaning has also become more and more difficult. With the rapid development of national conditions, data collection, different data sources, real-time data updates, and data aggregation are constantly following. Scale, so it can easily lead to higher error rates from different aspects of the data, which in turn leads to lower data quality
[0005] (2) The performance of traditional data cleaning methods cannot meet the requirements when dealing with today's large data volumes, and thus cannot effectively process huge data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Patent data cleaning method and system based on AdaBoost algorithm
  • Patent data cleaning method and system based on AdaBoost algorithm
  • Patent data cleaning method and system based on AdaBoost algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to explain in detail the technical content, structural features, achieved goals and effects of the technical solution, the following will be described in detail in conjunction with specific embodiments and accompanying drawings.

[0038] see figure 1 , a flow chart of a preferred embodiment of the present invention, a patented data cleaning method based on the AdaBoost algorithm, which includes the following steps,

[0039] S1. Collect patent data from the patent database, and put the collected patent data sources into the database to be cleaned.

[0040] In this embodiment, the Derwent database is used as the basic data source, and the field of data collection is the iron and steel industry. Summarize search terms, IPC classification numbers, Derwent manual codes, etc. as basic search methods, formulate search strategies, and extract patent data related to the iron and steel industry. A total of about 270,000 pieces of patent data are retrieved for data index...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of data processing, and more specifically relates to a patent data cleaning method and a patent data cleaning system based on an AdaBoost algorithm. The method comprises the following steps of S1, collecting patent data from a patent database, and putting the collected patent data into a database to be cleaned; S2, performing data analysis on the patentdata source in the database to be cleaned, and determining attribute information of the patent data; S3, defining a cleaning rule, and making different cleaning rules according to the different errortypes of the patent data source; S4, primarily cleaning the patent data source according to the cleaning rule; S5, using the AdaBoost algorithm to deeply clean the primarily cleaned patent data; S6,verifying a cleaning result, judging whether a cleaning requirement is met, if yes, turning to the step S6, and otherwise, turning to the step S2; and S7, flowing the clean data back, and replacing the original patent data with the cleaned patent data.

Description

technical field [0001] The invention belongs to the technical field of data processing, and more specifically relates to a patented data cleaning method and system based on the AdaBoost algorithm. Background technique [0002] In today's society, with the advent of the information age, the demand for data has increased, making data processing more and more complex. The most important step for mining and analyzing massive amounts of data is data cleaning. Different types of errors need to be identified during the data cleaning process. Manual operations alone are half the effort. Combining the AdaBoosting algorithm to identify and classify data for analysis and detection can greatly improve efficiency and complete large-capacity data cleaning tasks. [0003] At present, the main problems faced by patent data cleaning are: [0004] (1) As my country has gradually become a patent powerhouse, there are more and more patent data, and the demand of various enterprises is also gra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62G06N3/08G06Q50/18
CPCG06F16/215G06F16/2455G06N3/08G06Q50/184G06F18/24147
Inventor 郎利影王田雨
Owner HEBEI UNIV OF ENG