Public resource transaction data-oriented cleaning and duplicate removal method and system

A technology for trading data and public resources, applied in the field of data processing, to achieve the effect of reducing data volume, improving accuracy, and improving algorithm performance

Active Publication Date: 2019-09-03
GLODON CO LTD
View PDF13 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the data processing and analysis industry lacks data cleaning technology for the characteristics of public resourc

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Public resource transaction data-oriented cleaning and duplicate removal method and system
  • Public resource transaction data-oriented cleaning and duplicate removal method and system
  • Public resource transaction data-oriented cleaning and duplicate removal method and system

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0072] Through the data processing method provided by the present invention, such as Figure 2-3 As shown, from Henan Bidding Network (http: / / www.hnzbw.cn / newshow1.asp?id=1920507&l=1) and Xuchang Municipal Government Procurement Network (http: / / www.hngp.gov.cn / henan / content?infoId=1540448944097797&channelCod e=H710202&bz=0) respectively extracted two titles related to "Xuchang Caowei Ancient City Development and Construction Co., Ltd. "Xuchang Caowei Ancient City South Gate, South Street Central Axis Wireless Wi-Fi and Block Security" project transaction result announcement" The contents of the bidding announcements are recorded as text data.

[0073] Using the above two text data records (bidding announcement content) as the input of the above steps S120-S130, the similarity value of the text data records obtained is 0.81, which is greater than the preset threshold value of 0.6, and is respectively output after the NER processing in the step S140 "Person name: Wang Qiang, Q...

example 2

[0075]By the data processing method provided by the present invention, from Beijing Construction Engineering Information Network (http: / / www.bcactc.com / home / gcxx / zbjggs_show.aspx?gcbh=230F0SG201800046) and Beijing Construction Engineering Information Network (http: / / www.bcactc.com / home / gcxx / zbjggs_show.aspx?gcbh=230F0JL201800020) extracted two data records as text data records whose title involves the project of "Special Service Fire Station Project in the Southern New District of Beijing Economic-Technological Development Zone".

[0076] The above-mentioned two text data records (the content of the winning bid publicity) are used as the input of the above-mentioned steps S120-S130, and its similarity value is 0.875, which is greater than the preset threshold value of 0.6. After the NER processing in the step S140, the output "personal name:; organization Name: Beijing Economic-Technological Development Zone Infrastructure Office Beijing Economic-Technological Development Zone...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a public resource transaction data-oriented cleaning and duplicate removal method and system. The texts corresponding to public resource transaction data are stored in a dataset in a text data record form, wherein the data sets are grouped according to a preset rule, the number of the text data records in each group is controlled, and the data similarity among the text data records in each group is calculated based on the longest common subsequence. When the data similarity between the two text data records is larger than a preset threshold value, the named entity information of the two text data records is further compared, and when the named entity information of the two text data records is the same, it is judged that the two text data records belong to the repeated data, and otherwise it is judged that the two text data records belong to the non-repeated data. The repeated information in the public resource transaction data is determined in a multi-dimensional cross validation mode, so that the misjudgment of the repeated data can be further prevented on the basis of improving the text processing performance.

Description

technical field [0001] The present invention relates to a data processing technology, in particular to a transaction data cleaning method and system, in particular to a data cleaning and deduplication method and system for public resource transaction data. Bid announcements and other information, data processing and data deduplication and cleaning before transactions. Background technique [0002] Public resource transactions refer to the franchise of municipal public utilities, the right to operate logistics social services of administrative institutions, the right to operate outdoor billboards, the auction of public property for anti-smuggling fines and confiscations, the leasing of real estate and office buildings, operating licenses for rental cars, and the auspicious number of automobiles. Trade and provide consulting and services for non-profit, monopolistic and proprietary social public resources controlled by public resource management departments such as license pla...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/215G06F16/951
CPCG06F16/215G06F16/951
Inventor 刘全超祝华令付永晖
Owner GLODON CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products