Refine
Document Type
- Conference Proceeding (6)
- Article (reviewed) (2)
- Report (1)
Conference Type
- Konferenzartikel (6)
Has Fulltext
- no (9)
Is part of the Bibliography
- yes (9)
Keywords
- Data Mining (2)
- GPU Computing (2)
- Subspace Clustering (2)
- Analytical Query (1)
- Clustering (1)
- Data mining (1)
- Datenbank (1)
- Datenbanksystem (1)
- Filter Dimension (1)
- GPU computing (1)
Institute
Open Access
- Closed Access (4)
- Open Access (4)
- Bronze (1)
Machine learning (ML) has become highly relevant in applications across all industries, and specialists in the field are sought urgently. As it is a highly interdisciplinary field, requiring knowledge in computer science, statistics and the relevant application domain, experts are hard to find. Large corporations can sweep the job market by offering high salaries, which makes the situation for small and medium enterprises (SME) even worse, as they usually lack the capacities both for attracting specialists and for qualifying their own personnel. In order to meet the enormous demand in ML specialists, universities now teach ML in specifically designed degree programs as well as within established programs in science and engineering. While the teaching almost always uses practical examples, these are somewhat artificial or outdated, as real data from real companies is usually not available. The approach reported in this contribution aims to tackle the above challenges in an integrated course, combining three independent aspects: first, teaching key ML concepts to graduate students from a variety of existing degree programs; second, qualifying working professionals from SME for ML; and third, applying ML to real-world problems faced by those SME. The course was carried out in two trial periods within a government-funded project at a university of applied sciences in south-west Germany. The region is dominated by SME many of which are world leaders in their industries. Participants were students from different graduate programs as well as working professionals from several SME based in the region. The first phase of the course (one semester) consists of the fundamental concepts of ML, such as exploratory data analysis, regression, classification, clustering, and deep learning. In this phase, student participants and working professionals were taught in separate tracks. Students attended regular classes and lab sessions (but were also given access to e-learning materials), whereas the professionals learned exclusively in a flipped classroom scenario: they were given access to e-learning units (video lectures and accompanying quizzes) for preparation, while face-to-face sessions were dominated by lab experiments applying the concepts. Prior to the start of the second phase, participating companies were invited to submit real-world problems that they wanted to solve with the help of ML. The second phase consisted of practical ML projects, each tackling one of the problems and worked on by a mixed team of both students and professionals for the period of one semester. The teams were self-organized in the ways they preferred to work (e.g. remote vs. face-to-face collaboration), but also coached by one of the teaching staff. In several plenary meetings, the teams reported on their status as well as challenges and solutions. In both periods, the course was monitored and extensive surveys were carried out. We report on the findings as well as the lessons learned. For instance, while the program was very well-received, professional participants wished for more detailed coverage of theoretical concepts. A challenge faced by several teams during the second phase was a dropout of student members due to upcoming exams in other subjects.
This paper describes the concept and some results of the project "Menschen Lernen Maschinelles Lernen" (Humans Learn Machine Learning, ML2) of the University of Applied Sciences Offenburg. It brings together students of different courses of study and practitioners from companies on the subject of Machine Learning. A mixture of blended learning and practical projects ensures a tight coupling of machine learning theory and application. The paper details the phases of ML2 and mentions two successful example projects.
Subspace clustering aims to find all clusters in all subspaces of a high-dimensional data space. We present a massively data-parallel approach that can be run on graphics processing units. It extends a previous density-based method that scales well with the number of dimensions. Its main computational bottleneck consists of (sequentially) generating a large number of minimal cluster candidates in each dimension and using hash collisions in order to find matches of such candidates across multiple dimensions. Our approach parallelizes this process by removing previous interdependencies between consecutive steps in the sequential generation process and by applying a very efficient parallel hashing scheme optimized for GPUs. This massive parallelization gives up to 70x speedup for
the bottleneck computation when it is replaced by our approach and run on current GPU hardware. We note that depending on data size and choice of parameters, the parallelized part of the algorithm can take different percentages of the overall runtime of the clustering process, and thus, the overall clustering speedup may vary significantly between different cases. However, even
in our ”worst-case” test, a small dataset where the computation makes up only a small fraction of the overall clustering time, our parallel approach still yields a speedup of more than 3x for the complete run of the clustering process. Our method could also be combined with parallelization of other parts of the clustering algorithm, with an even higher potential gain in processing speed.
Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.
This paper describes the use of the single-linkage hierarchical clustering method in outlier detection for manufactured metal work pieces. The main goal of the study is to group defects that occur 5 mm into a work piece from the edge, i.e., the border of the metal work piece. The goal is to remove defects outside the area of interest as outliers. According to the assumptions made for the performance criteria, the single-linkage method has achieved better results compared to other agglomeration methods.
Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset where, a subspace is the subset of dimensions of the data. But exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, thus, parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage, firstly, the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation has shown linear speedup. Secondly, we are developing an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.
In online analytical processing (OLAP), filtering elements of a given dimensional attribute according to the value of a measure attribute is an essential operation, for example in top-k evaluation. Such filters can involve extremely large amounts of data to be processed, in particular when the filter condition includes “quantification” such as ANY or ALL, where large slices of an OLAP cube have to be computed and inspected. Due to the sparsity of OLAP cubes, the slices serving as input to the filter are usually sparse as well, presenting a challenge for GPU approaches which need to work with a limited amount of memory for holding intermediate results. Our CUDA solution involves a hashing scheme specifically designed for frequent and parallel updates, including several optimizations exploiting architectural features of Nvidia’s Fermi and Kepler GPUs.