Refine
Document Type
- Conference Proceeding (45)
- Article (unreviewed) (17)
- Article (reviewed) (8)
- Report (1)
Conference Type
- Konferenzartikel (45)
Language
- English (71)
Is part of the Bibliography
- yes (71)
Keywords
- Deep Leaning (11)
- Machine Learning (8)
- Robustness (4)
- Generative Adversarial Network (3)
- deep learning (3)
- image classification (3)
- Aliasing (2)
- CNNs (2)
- Computer Vision (2)
- Geophysik (2)
Institute
Open Access
- Open Access (51)
- Bronze (13)
- Closed Access (12)
- Closed (8)
- Diamond (8)
- Gold (3)
- Grün (2)
- Hybrid (1)
Generative convolutional deep neural networks, e.g. popular GAN architectures, are relying on convolution based up-sampling methods to produce non-scalar outputs like images or video sequences. In this paper, we show that common up-sampling methods, i.e. known as up-convolution or transposed convolution, are causing the inability of such models to reproduce spectral distributions of natural training data correctly. This effect is independent of the underlying architecture and we show that it can be used to easily detect generated data like deepfakes with up to 100% accuracy on public benchmarks. To overcome this drawback of current generative models, we propose to add a novel spectral regularization term to the training optimization objective. We show that this approach not only allows to train spectral consistent GANs that are avoiding high frequency errors. Also, we show that a correct approximation of the frequency spectrum has positive effects on the training stability and output quality of generative networks.
Multiple Object Tracking (MOT) is a long-standing task in computer vision. Current approaches based on the tracking by detection paradigm either require some sort of domain knowledge or supervision to associate data correctly into tracks. In this work, we present an unsupervised multiple object tracking approach based on visual features and minimum cost lifted multicuts. Our method is based on straight-forward spatio-temporal cues that can be extracted from neighboring frames in an image sequences without superivison. Clustering based on these cues enables us to learn the required appearance invariances for the tracking task at hand and train an autoencoder to generate suitable latent representation. Thus, the resulting latent representations can serve as robust appearance cues for tracking even over large temporal distances where no reliable spatio-temporal features could be extracted. We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking.
Deep generative models have recently achieved impressive results for many real-world applications, successfully generating high-resolution and diverse samples from complex datasets. Due to this improvement, fake digital contents have proliferated growing concern and spreading distrust in image content, leading to an urgent need for automated ways to detect these AI-generated fake images.
Despite the fact that many face editing algorithms seem to produce realistic human faces, upon closer examination, they do exhibit artifacts in certain domains which are often hidden to the naked eye. In this work, we present a simple way to detect such fake face images - so-called DeepFakes. Our method is based on a classical frequency domain analysis followed by basic classifier. Compared to previous systems, which need to be fed with large amounts of labeled data, our approach showed very good results using only a few annotated training samples and even achieved good accuracies in fully unsupervised scenarios. For the evaluation on high resolution face images, we combined several public datasets of real and fake faces into a new benchmark: Faces-HQ. Given such high-resolution images, our approach reaches a perfect classification accuracy of 100% when it is trained on as little as 20 annotated samples. In a second experiment, in the evaluation of the medium-resolution images of the CelebA dataset, our method achieves 100% accuracy supervised and 96% in an unsupervised setting. Finally, evaluating a low-resolution video sequences of the FaceForensics++ dataset, our method achieves 91% accuracy detecting manipulated videos.
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant m argin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient 1×1 convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning 3×3 convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.
Harnessing the overall benefits of the latest advancements in artificial intelligence (AI) requires the extensive collaboration of academia and industry. These collaborations promote innovation and growth while enforcing the practical usefulness of newer technologies in real life. The purpose of this article is to outline the challenges faced during cross-collaboration between academia and industry. These challenges are also inspected with the help of an ongoing project titled “Quality Assurance of Machine Learning Applications” (Q-AMeLiA), in which three universities cooperate with five industry partners to make the product risk of AI-based products visible. Further, we discuss the hurdles and the key challenges in machine learning (ML) technology transformation from academia to industry based on robustness, simplicity, and safety. These challenges are an outcome of the lack of common standards, metrics, and missing regulatory considerations when state-of-the-art (SOTA) technology is developed in academia. The use of biased datasets involves ethical concerns that might lead to unfair outcomes when the ML model is deployed in production. The advancement of AI in small and medium sized enterprises (SMEs) requires more in terms of common tandardization of concepts rather than algorithm breakthroughs. In this paper, in addition to the general challenges, we also discuss domain specific barriers for five different domains i.e., object detection, hardware benchmarking, continual learning, action recognition, and industrial process automation, and highlight the steps necessary for successfully managing the cross-sectoral collaborations between academia and industry.
Diffracted waves carry high resolution information that can help interpreting fine structural details at a scale smaller than the seismic wavelength. Because of the low signal-to-noise ratio of diffracted waves, it is challenging to preserve them during processing and to identify them in the final data. It is, therefore, a traditional approach to pick manually the diffractions. However, such task is tedious and often prohibitive, thus, current attention is given to domain adaptation. Those methods aim to transfer knowledge from a labeled domain to train the model, and then infer on the real unlabeled data. In this regard, it is common practice to create a synthetic labeled training dataset, followed by testing on unlabeled real data. Unfortunately, such procedure may fail due to the existing gap between the synthetic and the real distribution since quite often synthetic data oversimplifies the problem, and consequently the transfer learning becomes a hard and non-trivial procedure. Furthermore, deep neural networks are characterized by their high sensitivity towards cross-domain distribution shift. In this work, we present deep learning model that builds a bridge between both distributions creating a semi-synthetic datatset that fills in the gap between synthetic and real domains. More specifically, our proposal is a feed-forward, fully convolutional neural network for imageto-image translation that allows to insert synthetic diffractions while preserving the original reflection signal. A series of experiments validate that our approach produces convincing seismic data containing the desired synthetic diffractions.
In this preliminary report, we present a simple but very effective technique to stabilize the training of CNN based GANs. Motivated by recently published methods using frequency decomposition of convolutions (e.g. Octave Convolutions), we propose a novel convolution scheme to stabilize the training and reduce the likelihood of a mode collapse. The basic idea of our approach is to split convolutional filters into additive high and low frequency parts, while shifting weight updates from low to high during the training. Intuitively, this method forces GANs to learn low frequency coarse image structures before descending into fine (high frequency) details. Our approach is orthogonal and complementary to existing stabilization methods and can simply plugged into any CNN based GAN architecture. First experiments on the CelebA dataset show the effectiveness of the proposed method.
In this preliminary report, we present a simple but very effective technique to stabilize the training of CNN based GANs. Motivated by recently published methods using frequency decomposition of convolutions (eg Octave Convolutions), we propose a novel convolution scheme to stabilize the training and reduce the likelihood of a mode collapse. The basic idea of our approach is to split convolutional filters into additive high and low frequency parts, while shifting weight updates from low to high during the training. Intuitively, this method forces GANs to learn low frequency coarse image structures before descending into fine (high frequency) details. Our approach is orthogonal and complementary to existing stabilization methods and can simply plugged into any CNN based GAN architecture. First experiments on the CelebA dataset show the effectiveness of the proposed method.
Despite the success of convolutional neural networks (CNNs) in many computer vision and image analysis tasks, they remain vulnerable against so-called adversarial attacks: Small, crafted perturbations in the input images can lead to false predictions. A possible defense is to detect adversarial examples. In this work, we show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images. We propose two novel detection methods: Our first method employs the magnitude spectrum of the input images to detect an adversarial attack. This simple and robust classifier can successfully detect adversarial perturbations of three commonly used attack methods. The second method builds upon the first and additionally extracts the phase of Fourier coefficients of feature-maps at different layers of the network. With this extension, we are able to improve adversarial detection rates compared to state-of-the-art detectors on five different attack methods. The code for the methods proposed in the paper is available at github.com/paulaharder/SpectralAdversarialDefense
Due to the rapidly increasing storage consumption worldwide, as well as the expectation of continuous availability of information, the complexity of administration in today’s data centers is growing permanently. Integrated techniques for monitoring hard disks can increase the reliability of storage systems. However, these techniques often lack intelligent data analysis to perform predictive maintenance. To solve this problem, machine learning algorithms can be used to detect potential failures in advance and prevent them. In this paper, an unsupervised model for predicting hard disk failures based on Isolation Forest is proposed. Consequently, a method is presented that can deal with the highly imbalanced datasets, as the experiment on the Backblaze benchmark dataset demonstrates.
Recent studies have shown remarkable success in image-to-image translation for attribute transfer applications. However, most of existing approaches are based on deep learning and require an abundant amount of labeled data to produce good results, therefore limiting their applicability. In the same vein, recent advances in meta-learning have led to successful implementations with limited available data, allowing so-called few-shot learning.
In this paper, we address this limitation of supervised methods, by proposing a novel approach based on GANs. These are trained in a meta-training manner, which allows them to perform image-to-image translations using just a few labeled samples from a new target class. This work empirically demonstrates the potential of training a GAN for few shot image-to-image translation on hair color attribute synthesis tasks, opening the door to further research on generative transfer learning.
Seismic data processing relies on multiples attenuation to improve inversion and interpretation. Radon-based algorithms are often used for multiples and primaries discrimination. Deep learning, based on convolutional neural networks (CNNs), has shown encouraging applications for demultiple that could mitigate Radon-based challenges. In this work, we investigate new strategies to train a CNN for multiples removal based on different loss functions. We propose combined primaries and multiples labels in the loss for training a CNN to predict primaries, multiples, or both simultaneously. Moreover, we investigate two distinctive training methods for all the strategies: UNet based on minimum absolute error (L1) training, and adversarial training (GAN-UNet). We test the trained models with the different strategies and methods on 400 synthetic data. We found that training to predict multiples, including the primaries …
Most machine learning methods require careful selection of hyper-parameters in order to train a high performing model with good generalization abilities. Hence, several automatic selection algorithms have been introduced to overcome tedious manual (try and error) tuning of these parameters. Due to its very high sample efficiency, Bayesian Optimization over a Gaussian Processes modeling of the parameter space has become the method of choice. Unfortunately, this approach suffers from a cubic compute complexity due to underlying Cholesky factorization, which makes it very hard to be scaled beyond a small number of sampling steps. In this paper, we present a novel, highly accurate approximation of the underlying Gaussian Process. Reducing its computational complexity from cubic to quadratic allows an efficient strong scaling of Bayesian Optimization while outperforming the previous approach regarding optimization accuracy. First experiments show speedups of a factor of 162 in single node and further speed up by a factor of 5 in a parallel environment.
Engineering, construction and operation of complex machines involves a wide range of complicated, simultaneous tasks, which potentially could be automated. In this work, we focus on perception tasks in such systems, investigating deep learning approaches for multi-task transfer learning with limited training data. We show an approach that takes advantage of a technical systems’ focus on selected objects and their properties. We create focused representations and simultaneously solve joint objectives in a system through multi-task learning with convolutional autoencoders. The focused representations are used as a starting point for the data-saving solution of the additional tasks. The efficiency of this approach is demonstrated using images and tasks of an autonomous circular crane with a grapple.
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Current attack methods are able to manipulate the network's prediction by adding specific but small amounts of noise to the input. In turn, adversarial training (AT) aims to achieve robustness against such attacks and ideally a better model generalization ability by including adversarial samples in the trainingset. However, an in-depth analysis of the resulting robust models beyond adversarial robustness is still pending. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions, even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences. Data & Project website: https://github.com/GeJulia/robustness_confidences_evaluation
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Adversarial training (AT) is often considered as a remedy to train more robust networks. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences.
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise ($1\times 1$) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights.
Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms.
Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.