Unlock instant, AI-driven research and patent intelligence for your innovation.

Mass spatio-temporal data cleaning method and mass spatio-temporal data cleaning device

A spatio-temporal data and mass technology, applied in the field of data communication, can solve problems such as difficulty in ensuring the consistency of time attributes, difficulty in checking the legitimacy of spatio-temporal attributes, data cleaning methods and devices that cannot adapt to mass data processing, etc., to ensure consistent time Sexuality, the effect of guaranteeing legitimacy

Active Publication Date: 2017-04-19
NORTH CHINA UNIVERSITY OF TECHNOLOGY
View PDF13 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to solve the problems that the current data cleaning methods and devices cannot adapt to the processing of massive data, the consistency of time attributes is difficult to ensure, and the legitimacy of spatiotemporal attributes is difficult to check, and to provide a data cleaning method and device for massive spatiotemporal data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass spatio-temporal data cleaning method and mass spatio-temporal data cleaning device
  • Mass spatio-temporal data cleaning method and mass spatio-temporal data cleaning device
  • Mass spatio-temporal data cleaning method and mass spatio-temporal data cleaning device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] A method for cleaning massive spatio-temporal data of the present invention mainly includes three steps, combining figure 1 Detailed description. in,

[0035] Time-based clustering calculation, to obtain the timestamp distribution of data items in the original data, this step can be designed and implemented using distributed computing frameworks such as Hadoop MapReduce or Spark.

[0036] In a specific implementation, the above-mentioned acquisition of time stamp distribution can be implemented as a calculation operation; taking the card swiping data of the Beijing municipal transportation card as an example, the data item contains two time attributes: the time stamp of getting on the bus and the time stamp of getting off the bus. Each time attribute can obtain its respective time distribution through a time-based clustering calculation job; this time-based clustering calculation can be implemented as a Hadoop MapReduce job, the input of the job is the file storing the...

Embodiment 2

[0041] The invention combines figure 2 A device for cleaning massive spatio-temporal data is described. The device includes three parts: a time-based clustering calculation module, a rule-based filtering module and a distributed file system. in,

[0042] The time-based cluster calculation module realizes the time-based cluster calculation, realizes the value range judgment of the given confidence degree, and determines the value range of the data item timestamp in the original data; this module can be used in Hadoop MapReduce or Build on the server of Spark distributed computing environment; In a specific embodiment, the building step with Hadoop environment cluster can be realized in the following way:

[0043] (1) Planning machines to form a cluster, with 1 or 2 machines as management nodes and at least 3 machines as computing nodes;

[0044] (2) Configure the network name. For each machine, set the unique Host name in the machine network, and they can ping each other; it...

Embodiment 3

[0052] The invention combines image 3 A method for cleaning massive spatio-temporal data is described. This method is taken as an example of a preferred embodiment of the present invention. The time-based clustering calculation is a calculation step for massive bus card swiping data. in,

[0053] (1) scan the file storing the original data, if the file scanning is not over, turn to (2), otherwise turn to (8);

[0054] (2) scan the next data item;

[0055] (3) Extract the date in the boarding time stamp;

[0056] (4) Extract the date in the time stamp of getting off the bus;

[0057] (5) counting of boarding date;

[0058] (6) Count the date of getting off the bus;

[0059] (7) Statistical date distribution;

[0060] (8) END.

[0061] Among them, steps (2)-(4) can be implemented by designing Map tasks using the Hadoop MapReduce distributed computing framework; steps (5)-(7) can be implemented by designing Reduce tasks using the Hadoop MapReduce distributed computing fra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a mass spatio-temporal data cleaning method and a mass spatio-temporal data cleaning device. The mass spatio-temporal data cleaning method comprises the steps of performing time-based clustering calculation on the spatio-temporal data, and obtaining time stamp distribution of data items in original data; determining a preset-confidence time stamp value domain on the data items, and determining the value domain of the time stamp of the data item in original data; performing data filtering based on a rule; according to a spatio-temporal rule of a service domain, performing data item validity determining, if validity is determined, extracting the data item to the result, and otherwise, eliminating the data item. The mass spatio-temporal data cleaning method and the mass spatio-temporal data cleaning device have advantages of ensuring time consistency of the mass data, realizing a simple and reliable spatio-temporal attribute validity verification method, and improving data processing efficiency.

Description

technical field [0001] The invention relates to the technical field of data communication, in particular to a method and device for cleaning massive spatio-temporal data. Background technique [0002] With the continuous development of big data and Internet of Things technology, massive and real-time data generated in many business fields are continuously accumulated. Data analysis is currently a research hotspot in many fields. The first step is data preprocessing. Data preprocessing can effectively improve data quality and provide more targeted available data for the data mining core, which can not only save a lot of time and space, and the mining results obtained can better play a role in decision-making and prediction. Data from sensors in the Internet of Things environment is a typical type of spatiotemporal data because it usually contains time stamp and geographic location attributes. At the same time, since the data in the real world is often incomplete, noisy and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/1748G06F16/182
Inventor 丁维龙赵卓峰曹娅琪
Owner NORTH CHINA UNIVERSITY OF TECHNOLOGY