A Pivot-Based Routine for Improved Parent-Finding in Hybrid MDS †

Abstract

The problem of exploring or visualising data of high dimensionality is central to many tools for information visualisation. Through representing a data set in terms of inter-object proximities, multidimensional scaling may be employed to generate a configuration of objects in low-dimensional space in such a way as to preserve high-dimensional relationships. An algorithm is presented here for a heuristic hybrid model for the generation of such configurations. Building on a model introduced in 2002, the algorithm functions by means of sampling, spring model and interpolation phases. The most computationally complex stage of the original algorithm involved the execution of a series of nearest-neighbour searches. In this paper, we describe how the complexity of this phase has been reduced by treating all high-dimensional relationships as a set of discretised distances to a constant number of randomly selected items: pivots. In improving this computational bottle-neck, the algorithmic complexity is reduced from O(N√N) to O(N^5/4 ). As well as documenting this improvement, the paper describes evaluation with a data set of 108,000 13-dimensional items and a set of 23,141 17-dimensional items. Results illustrate that the reduction in complexity is reflected in significantly improved run times and that no negative impact is made upon the quality of layout produced.

Keywords

multidimensional scaling MDS hybrid algorithms force-directed placement spring models pivots near-neighbour search

References

Morrison

Chalmers

Improving hybrid MDS with pivot-based searching. Proceedings of the IEEE Symposium on Information Visualization 2003, IEEE Computer Society: New York, 2003; 85–90.

Andrews

Kienreich

Sabol

Becker

Droschl

Kappe

Granitzer

Auer

Tochtermann

The infosky visual explorer: Exploiting hierarchical structure and document similarities. Information Visualization 2002; 1: 166–181.

Rodden

Basalaj

Sinclair

Wood

Does organisation by similarity assist image browsing?. Proceedings of the SIGCHI conference on Human Factors in Computing Systems, ACM Press: New York, 2001; 190–197.

Amenta

Klinger

Visualizing sets of evolutionary trees. Proceedings of the IEEE Symposium on Information Visualization 2002, IEEE, IEEE Computer Society: New York, 2002; 71–74.

Koren

Carmel

Harel

ACE: a fast multiscale eigenvectors computation for drawing huge graphs. Proceedings of the IEEE Symposium on Information Visualization 2002, IEEE Computer Society: New York, 2002; 137–144.

Fruchterman

TMJ

Reingold

. Graph drawing by force-directed placement. Software – Practice and Experience 1991; 21 (11): 1129–1164.

Chalmers

A linear iteration time layout algorithm for visualising high-dimensional data. Proceedings of IEEE Visualization 1996, IEEE Computer Society Press: New York, 1996; 127–132.

Morrison

Ross

Chalmers

A hybrid layout algorithm for subquadratic multidimensional scaling. Proceedings of the IEEE Symposium on Information Visualization 2002, IEEE, IEEE Computer Society: New York, 2002; 152–158.

Morrison

Ross

Chalmers

Fast multidimensional scaling through sampling, springs and interpolation. Information Visualization 2003; 2 (1): 68–77.

10.

Oja

A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 1982; 15: 267–273.

11.

Borg

Groenen

PJF

. Modern Multidimensional Scaling Theory and Applications. Springer-Verlag: New York, 1997.

12.

Torgerson

. Multidimensional scaling: I. Theory and method. Psychometrika 1952; 17: 401–419.

13.

Shepard

. Multidimensional scaling with an unknown distance function. i. Psychometrika 1962; 27 (2): 125–140.

14.

Kruskal

. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 1964; 29 (1): 1–27.

15.

Carroll

Chang

Analysis of individual differences in multidimensional scaling via an n-way generalization of 'Eckart-Young' decomposition. Psychometrika 1970; 35: 283–319.

16.

Kruskal

Wish

Multidimensional Scaling. Sage Publications: Beverly Hills, CA, 1978.

17.

Frick

Ludwig

Mehldau

A fast adaptive layout algorithm for undirected graphs. In: Tamassia

Tollis

(Eds). Proceedings of the DIMACS Int. Work. Graph Drawing GD, Berlin, Germany: Springer-Verlag, 1994; 388–403.

18.

Eades

A heuristic for graph drawing. Congressus Numerantium 1984; 42: 149–160.

19.

Ross

Chalmers

A visual workspace for hybrid multidimensional scaling algorithms. Proceedings of the IEEE Symposium on Information Visualization 2003, IEEE Computer Society: New York, 2003; 91–96.

20.

Gionis

Indyk

Motwani

Similarity search in high dimensions via hashing. Proceedings of 25th International Conference on Very Large Data Bases 1999; 518–529.

21.

Chávez

Navarro

Baeza-Yates

Marroquín

. Searching in metric spaces. ACM Computing Surveys 2001; 33 (3): 273–321.

22.

Brin

Near neighbor search in large metric spaces. Proceedings of the 21st Conference on Very Large Databases 1995; 574–584.

23.

Ciaccia

Patella

Zezula

M-tree: an efficient access method for similarity search in metric spaces. Proceedings of the 23rd Conference on Very Large Data Bases 1997; 426–435.

24.

Aurenhammer

Voronoi diagrams: A survey of a fundamental geometric data structure. ACM Computing Surveys 1991; 23 (3): 345–405.

25.

Burkhard

Keller

. Some approaches to best-match file searching. Communications of the ACM 1973; 16 (4): 230–236.

26.

Vidal

An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recognition Letters 1986; 4: 145–157.

27.

Bourgain

On lipschitz embedding of finite metric spaces into hilbert space. Israel Journal of Mathematics 1985; 52: 46–52.

28.

Minsky

Papert

Perceptrons. MIT Press: Cambridge, 1969.

29.

Indyk

Motwani

Approximate nearest neighbors: towards removing the curse of dimensionality. Proceedings of the 30th Annual ACM Symposium on Theory of Computing, ACM Press: New York, 1998; 604–613.

30.

Maneewongvatana

Mount

. On the efficiency of nearest neighbor searching with data clustered in lower dimensions. International Conference on Computational Science 2001; 842–851.

31.

Chávez

Marroquín

Navarro

. Fixed queries array: A fast and economical data structure for proximity searching. Multimedia Tools and Applications 2001; 14 (2): 113–135.

32.

Baillie

Jose

. Audio-based event detection for sports video. Proceedings of the International Conference of Image and Video Retrieval, Springer: Berlin, 2003.

33.

Rabiner

Juang

B-H.

Fundamentals of Speech Recognition. Prentice-Hall, Inc.: NJ, 1993.

34.

MacQueen

Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematics and Probability, University of California Press: Berkeley, 1967; 281–297.

35.

Morrison

Ross

Chalmers

Combining and comparing clustering and layout algorithms. Technical Report 148, Department of Computing Science, University of Glasgow, November 2002.

36.

Ross

Chalmers

A visual workspace for constructing hybrid multidimensional scaling algorithms and coordinating multiple views. Information Visualization 2003; 2 (4): 247–257.