Accelerating Density-Based Subspace Clustering in High-Dimensional Data

Lauer, Tobias; Prinzbach, Jürgen; Kiefer, Nicolas

doi:10.1109/ICDMW53433.2021.00064

Accelerating Density-Based Subspace Clustering in High-Dimensional Data

Tobias Lauer, Jürgen Prinzbach, Nicolas Kiefer

Subspace clustering aims to find all clusters in all subspaces of a high-dimensional data space. We present a massively data-parallel approach that can be run on graphics processing units. It extends a previous density-based method that scales well with the number of dimensions. Its main computational bottleneck consists of (sequentially) generating a large number of minimal cluster candidates inSubspace clustering aims to find all clusters in all subspaces of a high-dimensional data space. We present a massively data-parallel approach that can be run on graphics processing units. It extends a previous density-based method that scales well with the number of dimensions. Its main computational bottleneck consists of (sequentially) generating a large number of minimal cluster candidates in each dimension and using hash collisions in order to find matches of such candidates across multiple dimensions. Our approach parallelizes this process by removing previous interdependencies between consecutive steps in the sequential generation process and by applying a very efficient parallel hashing scheme optimized for GPUs. This massive parallelization gives up to 70x speedup for the bottleneck computation when it is replaced by our approach and run on current GPU hardware. We note that depending on data size and choice of parameters, the parallelized part of the algorithm can take different percentages of the overall runtime of the clustering process, and thus, the overall clustering speedup may vary significantly between different cases. However, even in our ”worst-case” test, a small dataset where the computation makes up only a small fraction of the overall clustering time, our parallel approach still yields a speedup of more than 3x for the complete run of the clustering process. Our method could also be combined with parallelization of other parts of the clustering algorithm, with an even higher potential gain in processing speed.…

Metadaten
Document Type:	Conference Proceeding
Conference Type:	Konferenzartikel
Zitierlink:	https://opus.hs-offenburg.de/5154
Bibliografische Angaben
Title (English):	Accelerating Density-Based Subspace Clustering in High-Dimensional Data
Conference:	IEEE International Conference on Data Mining Workshops (ICDMW), 7-10 December 2021, Auckland, New Zealand
Author:	Tobias Lauer Staff Member GND, Jürgen Prinzbach GND, Nicolas Kiefer
Date of Publication (online):	2022/01/20
Year of first Publication:	2021
Publisher:	IEEE
First Page:	474
Last Page:	481
Parent Title (English):	Proceedings : 21st IEEE International Conference on Data Mining Workshops : ICDMW 2021
ISBN:	978-1-6654-2427-1 (Elektronisch)
ISBN:	978-1-6654-2428-8 (Print on Demand)
ISSN:	2375-9259 (Online)
ISSN:	2375-9232 (Print on Demand)
DOI:	https://doi.org/10.1109/ICDMW53433.2021.00064
Language:	English
Inhaltliche Informationen
Institutes:	Fakultät Elektrotechnik, Medizintechnik und Informatik (EMI) (ab 04/2019)
Institutes:	Bibliografie
DDC classes:	600 Technik, Medizin, angewandte Wissenschaften
Tag:	Clustering; Data Mining; GPU Computing; Parallelization; Subspace Clustering; machine learning
Formale Angaben
Relevance:	Konferenzbeitrag: h5-Index < 30
Open Access:	Closed Access
Licence (German):	Urheberrechtlich geschützt

Open Access

Accelerating Density-Based Subspace Clustering in High-Dimensional Data

Export metadata

Additional Services

Statistics