Mass spatio-temporal data cleaning method and mass spatio-temporal data cleaning device
A spatio-temporal data and mass technology, applied in the field of data communication, can solve problems such as difficulty in ensuring the consistency of time attributes, difficulty in checking the legitimacy of spatio-temporal attributes, data cleaning methods and devices that cannot adapt to mass data processing, etc., to ensure consistent time Sexuality, the effect of guaranteeing legitimacy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0034] A method for cleaning massive spatio-temporal data of the present invention mainly includes three steps, combining figure 1 Detailed description. in,
[0035] Time-based clustering calculation, to obtain the timestamp distribution of data items in the original data, this step can be designed and implemented using distributed computing frameworks such as Hadoop MapReduce or Spark.
[0036] In a specific implementation, the above-mentioned acquisition of time stamp distribution can be implemented as a calculation operation; taking the card swiping data of the Beijing municipal transportation card as an example, the data item contains two time attributes: the time stamp of getting on the bus and the time stamp of getting off the bus. Each time attribute can obtain its respective time distribution through a time-based clustering calculation job; this time-based clustering calculation can be implemented as a Hadoop MapReduce job, the input of the job is the file storing the...
Embodiment 2
[0041] The invention combines figure 2 A device for cleaning massive spatio-temporal data is described. The device includes three parts: a time-based clustering calculation module, a rule-based filtering module and a distributed file system. in,
[0042] The time-based cluster calculation module realizes the time-based cluster calculation, realizes the value range judgment of the given confidence degree, and determines the value range of the data item timestamp in the original data; this module can be used in Hadoop MapReduce or Build on the server of Spark distributed computing environment; In a specific embodiment, the building step with Hadoop environment cluster can be realized in the following way:
[0043] (1) Planning machines to form a cluster, with 1 or 2 machines as management nodes and at least 3 machines as computing nodes;
[0044] (2) Configure the network name. For each machine, set the unique Host name in the machine network, and they can ping each other; it...
Embodiment 3
[0052] The invention combines image 3 A method for cleaning massive spatio-temporal data is described. This method is taken as an example of a preferred embodiment of the present invention. The time-based clustering calculation is a calculation step for massive bus card swiping data. in,
[0053] (1) scan the file storing the original data, if the file scanning is not over, turn to (2), otherwise turn to (8);
[0054] (2) scan the next data item;
[0055] (3) Extract the date in the boarding time stamp;
[0056] (4) Extract the date in the time stamp of getting off the bus;
[0057] (5) counting of boarding date;
[0058] (6) Count the date of getting off the bus;
[0059] (7) Statistical date distribution;
[0060] (8) END.
[0061] Among them, steps (2)-(4) can be implemented by designing Map tasks using the Hadoop MapReduce distributed computing framework; steps (5)-(7) can be implemented by designing Reduce tasks using the Hadoop MapReduce distributed computing fra...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


