Refine
Document Type
Conference Type
- Konferenzartikel (3)
Language
- English (3)
Has Fulltext
- no (3)
Is part of the Bibliography
- yes (3)
Keywords
- Clustering (1)
- Data Mining (1)
- GPU Computing (1)
- Parallelization (1)
- Subspace Clustering (1)
- machine learning (1)
Institute
Open Access
- Open Access (2)
- Closed Access (1)
Subspace clustering aims to find all clusters in all subspaces of a high-dimensional data space. We present a massively data-parallel approach that can be run on graphics processing units. It extends a previous density-based method that scales well with the number of dimensions. Its main computational bottleneck consists of (sequentially) generating a large number of minimal cluster candidates in each dimension and using hash collisions in order to find matches of such candidates across multiple dimensions. Our approach parallelizes this process by removing previous interdependencies between consecutive steps in the sequential generation process and by applying a very efficient parallel hashing scheme optimized for GPUs. This massive parallelization gives up to 70x speedup for
the bottleneck computation when it is replaced by our approach and run on current GPU hardware. We note that depending on data size and choice of parameters, the parallelized part of the algorithm can take different percentages of the overall runtime of the clustering process, and thus, the overall clustering speedup may vary significantly between different cases. However, even
in our ”worst-case” test, a small dataset where the computation makes up only a small fraction of the overall clustering time, our parallel approach still yields a speedup of more than 3x for the complete run of the clustering process. Our method could also be combined with parallelization of other parts of the clustering algorithm, with an even higher potential gain in processing speed.
This paper describes the concept and some results of the project "Menschen Lernen Maschinelles Lernen" (Humans Learn Machine Learning, ML2) of the University of Applied Sciences Offenburg. It brings together students of different courses of study and practitioners from companies on the subject of Machine Learning. A mixture of blended learning and practical projects ensures a tight coupling of machine learning theory and application. The paper details the phases of ML2 and mentions two successful example projects.
In public transportation, the motor pool often consists of various different vehicles bought over a duration of many years. Sometimes, they even differ within one batch bought at the same time. This poses a considerable challenge in the storage and allocation of spare parts, especially in the event of damage to a vehicle. Correctly assigning these parts before the vehicle reaches the workshop could significantly reduce both the downtime and, therefore, the actual costs for companies. In order to achieve this, the current software uses a simple probability calculation. To improve the performance, the data of specific companies was analysed, preprocessed and used with several modelling techniques to classify and, therefore, predict the spare parts to be used in the event of a faulty vehicle. We summarize our experience running through the steps of the Cross Industry Standard Process for Data Mining and compare the performance to the previously used probability. Gradient Boosting Trees turned out to be the best modeling technique for this special case.