Large-scale parallel coordinate data simplifying method based on document embedding model

A technology of parallel coordinates and data simplification, applied in the information field, can solve problems such as difficulty in maintaining continuous features of parallel coordinate systems, difficulty in observing data hierarchy category features, loss of visual continuity, etc., to simplify visual abstraction, reduce visual confusion, The effect of reducing size

Pending Publication Date: 2021-03-16
ZHEJIANG UNIV OF FINANCE & ECONOMICS
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the traditional sampling algorithm is difficult to maintain the continuous features in the parallel coordinate system, and there are certain limitations.
For example, the sampling lens method proposed by Ellis et al. can alleviate the problem of data overlap in visually confusing areas, but it is difficult to observe the hierarchical category features of the data, especially the contextual features formed between the data passing through the coordinate axes, which are easy to hide and lose visual continuity. sex

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale parallel coordinate data simplifying method based on document embedding model
  • Large-scale parallel coordinate data simplifying method based on document embedding model
  • Large-scale parallel coordinate data simplifying method based on document embedding model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0018] The large-scale parallel coordinate data simplification method based on the document embedding model of the present invention specifically includes the following steps:

[0019] Step (1): Cluster the data on each attribute axis in the parallel coordinate system, treat the same clusters on different coordinate axes in the parallel coordinate system as the same word, and treat each piece of data interspersed in the parallel coordinate system A line is regarded as a sentence composed of words, and the sentences corresponding to all data lines are synthesized into a corpus based on contextual feature perception. The specific instructions are as follows:

[0020] In the present invention, methods such as kernel density estimation (KDE, kernel density estimation), K-means, and Gaussian Mixture Model (GMM, Gaussian Mixture Model) can be sele...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a large-scale parallel coordinate data simplification method based on a document embedding model. The method comprises the steps: carrying out the clustering of data on each attribute axis in a parallel coordinate system, and enabling the same clusters on different coordinate axes in the parallel coordinate system to be regarded as the same word, regarding each data line inserted in the parallel coordinate system as a sentence composed of words, and synthesizing sentences corresponding to all the data lines into a corpus; utilizing a Doc2Vec document embedding model totrain a corpus, wherein each sentence in the corpus is expressed as a high-dimensional vector; projecting the obtained high-dimensional vector to a two-dimensional space and sampling, and finally drawing a data line corresponding to a sampling point in a parallel coordinate system to obtain a simplified parallel coordinate system. According to the method, the continuous semantic association features between the data in the parallel coordinate system are captured through the document embedding model, wherein the features are effectively maintained in the simplification process, so that the simplified parallel coordinate not only can reduce visual redundancy, but also can display the implicit continuous association features in the data to the maximum extent.

Description

technical field [0001] The invention relates to a document expression method based on a document embedding model and a method for simplifying large-scale parallel coordinates, and belongs to the field of information technology. Background technique [0002] Parallel coordinates use the geometric layout of line segments to present multi-dimensional attribute data. Its unique geometric distribution characteristics and superior visual expression performance make it widely used in the exploration and analysis of multi-dimensional data. However, as the scale of multidimensional data increases, a large number of data lines in the parallel coordinate system intersect, overlap and overlap, which seriously interferes with the user's cognition of the original multidimensional data. [0003] Filtering, binding, and sampling are the main methods to solve the visual confusion of large-scale parallel coordinates. Filtering can flexibly select the attribute range of the coordinate axis, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/216G06F40/30G06F16/33G06F16/35G06K9/62
CPCG06F40/211G06F40/216G06F40/30G06F16/3346G06F16/3344G06F16/35G06F18/23
Inventor 周志光马煜明汤馥莲刘玉华
Owner ZHEJIANG UNIV OF FINANCE & ECONOMICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products