Refine
Document Type
- Conference Proceeding (45)
- Article (unreviewed) (16)
- Article (reviewed) (5)
- Book (2)
- Part of a Book (2)
- Doctoral Thesis (1)
- Report (1)
Conference Type
- Konferenzartikel (43)
- Konferenz-Abstract (1)
- Sonstiges (1)
Has Fulltext
- no (72) (remove)
Is part of the Bibliography
- yes (72) (remove)
Keywords
- Deep Leaning (9)
- Machine Learning (8)
- Robustness (4)
- Data Science (3)
- Generative Adversarial Network (3)
- image classification (3)
- Aliasing (2)
- CNNs (2)
- Computer Vision (2)
- Künstliche Intelligenz (2)
Institute
- IMLA - Institute for Machine Learning and Analytics (72) (remove)
Open Access
- Open Access (49)
- Bronze (15)
- Closed Access (11)
- Closed (10)
- Diamond (9)
- Grün (2)
- Gold (1)
- Hybrid (1)
In many application areas, Deep Reinforcement Learning (DRL) has led to breakthroughs. In Curriculum Learning, the Machine Learning algorithm is not randomly presented with examples, but in a meaningful order of increasing difficulty. This has been used in many application areas to further improve the results of learning systems or to reduce their learning time. Such approaches range from learning plans created manually by domain experts to those created automatically. The automated creation of learning plans is one of the biggest challenges.In this work, we investigate an approach in which a trainer learns in parallel and analogously to the student to automatically create a learning plan for the student during this Double Deep Reinforcement Learning (DDRL). Three Reward functions, Friendly, Adversarial, and Dynamic based on the learner’s reward are compared. The domain for evaluation is kicking with variable distance, direction and relative ball position in the SimSpark simulated soccer environment.As a result, Statistic Curriculum Learning (SCL) performs better than a random curriculum with respect to training time and result quality. DDRL reaches a comparable quality as the baseline and outperforms it significantly in shorter trainings in the distance-direction subdomain reducing the number of required training cycles by almost 50%.
The mathematical representations of data in the Spherical Harmonic (SH) domain has recently regained increasing interest in the machine learning community. This technical report gives an in-depth introduction to the theoretical foundation and practical implementation of SH representations, summarizing works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. In extension, these methods are then generalized from scalar SH representations to Vectorial Harmonics (VH), providing the same capabilities for 3d vector fields on spheres.
Team description papers of magmaOffenburg are incremental in the sense that each year we address a different topic of our team and the tools around our team. In this year’s team description paper we focus on the architecture of the software. It is a main factor for being able to keep the code maintainable even after 15 years of development. We also describe how we make sure that the code follows this architecture.
Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack.
Project website: https://github.com/paulgavrikov/adversarial_solarization
Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling
(2023)
Convolutional neural networks encode images through a sequence of convolutions, normalizations and non-linearities as well as downsampling operations into potentially strong semantic embeddings. Yet, previous work showed that even slight mistakes during sampling, leading to aliasing, can be directly attributed to the networks' lack in robustness. To address such issues and facilitate simpler and faster adversarial training, [12] recently proposed FLC pooling, a method for provably alias-free downsampling - in theory. In this work, we conduct a further analysis through the lens of signal processing and find that such current pooling methods, which address aliasing in the frequency domain, are still prone to spectral leakage artifacts. Hence, we propose aliasing and spectral artifact-free pooling, short ASAP. While only introducing a few modifications to FLC pooling, networks using ASAP as downsampling method exhibit higher native robustness against common corruptions, a property that FLC pooling was missing. ASAP also increases native robustness against adversarial attacks on high and low resolution data while maintaining similar clean accuracy or even outperforming the baseline.
Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality
(2023)
Diffusion models recently have been successfully applied for the visual synthesis of strikingly realistic appearing images. This raises strong concerns about their potential for malicious purposes. In this paper, we propose using the lightweight multi Local Intrinsic Dimensionality (multiLID), which has been originally developed in context of the detection of adversarial examples, for the automatic detection of synthetic images and the identification of the according generator networks. In contrast to many existing detection approaches, which often only work for GAN-generated images, the proposed method provides close to perfect detection results in many realistic use cases. Extensive experiments on known and newly created datasets demonstrate that the proposed multiLID approach exhibits superiority in diffusion detection and model identification.Since the empirical evaluations of recent publications on the detection of generated images are often mainly focused on the "LSUN-Bedroom" dataset, we further establish a comprehensive benchmark for the detection of diffusion-generated images, including samples from several diffusion models with different image sizes.The code for our experiments is provided at https://github.com/deepfake-study/deepfake-multiLID.
Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms.
Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.
Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient 1×1 convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning 3×3 convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise ($1\times 1$) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights.
This paper presents the new Deep Reinforcement Learning (DRL) library RL-X and its application to the RoboCup Soccer Simulation 3D League and classic DRL benchmarks. RL-X provides a flexible and easy-to-extend codebase with self-contained single directory algorithms. Through the fast JAX-based implementations, RL-X can reach up to 4.5x speedups compared to well-known frameworks like Stable-Baselines3.
Neural networks have a number of shortcomings. Amongst the severest ones is the sensitivity to distribution shifts which allows models to be easily fooled into wrong predictions by small perturbations to inputs that are often imperceivable to humans and do not have to carry semantic meaning. Adversarial training poses a partial solution to address this issue by training models on worst-case perturbations. Yet, recent work has also pointed out that the reasoning in neural networks is different from humans. Humans identify objects by shape, while neural nets mainly employ texture cues. Exemplarily, a model trained on photographs will likely fail to generalize to datasets containing sketches. Interestingly, it was also shown that adversarial training seems to favorably increase the shift toward shape bias. In this work, we revisit this observation and provide an extensive analysis of this effect on various architectures, the common L_2-and L_-training, and Transformer-based models. Further, we provide a possible explanation for this phenomenon from a frequency perspective.
Seismic data processing relies on multiples attenuation to improve inversion and interpretation. Radon-based algorithms are often used for multiples and primaries discrimination. Deep learning, based on convolutional neural networks (CNNs), has shown encouraging applications for demultiple that could mitigate Radon-based challenges. In this work, we investigate new strategies to train a CNN for multiples removal based on different loss functions. We propose combined primaries and multiples labels in the loss for training a CNN to predict primaries, multiples, or both simultaneously. Moreover, we investigate two distinctive training methods for all the strategies: UNet based on minimum absolute error (L1) training, and adversarial training (GAN-UNet). We test the trained models with the different strategies and methods on 400 synthetic data. We found that training to predict multiples, including the primaries …
Due to its performance, the field of deep learning has gained a lot of attention, with neural networks succeeding in areas like Computer Vision (CV), Neural Language Processing (NLP), and Reinforcement Learning (RL). However, high accuracy comes at a computational cost as larger networks require longer training time and no longer fit onto a single GPU. To reduce training costs, researchers are looking into the dynamics of different optimizers, in order to find ways to make training more efficient. Resource requirements can be limited by reducing model size during training or designing more efficient models that improve accuracy without increasing network size.
This thesis combines eigenvalue computation and high-dimensional loss surface visualization to study different optimizers and deep neural network models. Eigenvectors of different eigenvalues are computed, and the loss landscape and optimizer trajectory are projected onto the plane spanned by those eigenvectors. A new parallelization method for the stochastic Lanczos method is introduced, resulting in faster computation and thus enabling high-resolution videos of the trajectory and secondorder information during neural network training. Additionally, the thesis presents the loss landscape between two minima along with the eigenvalue density spectrum at intermediate points for the first time.
Secondly, this thesis presents a regularization method for Generative Adversarial Networks (GANs) that uses second-order information. The gradient during training is modified by subtracting the eigenvector direction of the biggest eigenvalue, preventing the network from falling into the steepest minima and avoiding mode collapse. The thesis also shows the full eigenvalue density spectra of GANs during training.
Thirdly, this thesis introduces ProxSGD, a proximal algorithm for neural network training that guarantees convergence to a stationary point and unifies multiple popular optimizers. Proximal gradients are used to find a closed-form solution to the problem of training neural networks with smooth and non-smooth regularizations, resulting in better sparsity and more efficient optimization. Experiments show that ProxSGD can find sparser networks while reaching the same accuracy as popular optimizers.
Lastly, this thesis unifies sparsity and neural architecture search (NAS) through the framework of group sparsity. Group sparsity is achieved through ℓ2,1-regularization during training, allowing for filter and operation pruning to reduce model size with minimal sacrifice in accuracy. By grouping multiple operations together, group sparsity can be used for NAS as well. This approach is shown to be more robust while still achieving competitive accuracies compared to state-of-the-art methods
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant m argin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
An important step in seismic data processing to improve inversion and interpretation is multiples attenuation. Radon-based algorithms are often used for discriminating primaries and multiples. Recently, deep learning (DL), based on convolutional neural networks (CNNs) has shown promising results in demultiple that could mitigate the challenges of Radon-based methods. In this work, we investigate new different strategies to train a CNN for multiples removal based on different loss functions. We propose combined primaries and multiples labels in the loss for training a CNN to predict primaries, multiples, or both simultaneously. We evaluate the performance of the CNNs trained with the different strategies on 400 clean and noisy synthetic data, considering 3 metrics. We found that training a CNN to predict the multiples and then subtracting them from the input image is the most effective strategy for demultiple. Furthermore, including the primaries labels as a constraint during the training of multiples prediction improves the results. Finally, we test the strategies on a field dataset. The CNNs trained with different strategies report competitive results on real data compared with Radon demultiple. As a result, effectively trained CNN models can potentially replace Radon-based demultiple in existing workflows.
Neural networks tend to overfit the training distribution and perform poorly on out-ofdistribution data. A conceptually simple solution lies in adversarial training, which introduces worst-case perturbations into the training data and thus improves model generalization to some extent. However, it is only one ingredient towards generally more robust models and requires knowledge about the potential attacks or inference time data corruptions during model training. This paper focuses on the native robustness of models that can learn robust behavior directly from conventional training data without out-of-distribution examples. To this end, we study the frequencies in learned convolution filters. Clean-trained models often prioritize high-frequency information, whereas adversarial training enforces models to shift the focus to low-frequency details during training. By mimicking this behavior through frequency regularization in learned convolution weights, we achieve improved native robustness to adversarial attacks, common corruptions, and other out-of-distribution tests. Additionally, this method leads to more favorable shifts in decision-making towards low-frequency information, such as shapes, which inherently aligns more closely with human vision.
In this paper, we describe a first publicly available fine-grained product recognition dataset based on leaflet images. Using advertisement leaflets, collected over several years from different European retailers, we provide a total of 41.6k manually annotated product images in 832 classes. Further, we investigate three different approaches for this fine-grained product classification task, Classification by Image, by Text, as well as by Image and Text. The approach "Classification by Text" uses the text extracted directly from the leaflet product images. We show, that the combination of image and text as input improves the classification of visual difficult to distinguish products. The final model leads to an accuracy of 96.4% with a Top-3 score of 99.2%. We release our code at https://github.com/ladwigd/Leaflet-Product-Classification.
Seismic data processing involves techniques to deal with undesired effects that occur during acquisition and pre-processing. These effects mainly comprise coherent artefacts such as multiples, non-coherent signals such as electrical noise, and loss of signal information at the receivers that leads to incomplete traces. In this work, we employ a generative solution, since it can explicitly model complex data distributions and hence, yield to a better decision-making process. In particular, we introduce diffusion models for multiple removal. To that end, we run experiments on synthetic and on real data, and we compare the deep diffusion performance with standard algorithms. We believe that our pioneer study not only demonstrates the capability of diffusion models, but also opens the door to future research to integrate generative models in seismic workflows.
It is common practice to apply padding prior to convolution operations to preserve the resolution of feature-maps in Convolutional Neural Networks (CNN). While many alternatives exist, this is often achieved by adding a border of zeros around the inputs. In this work, we show that adversarial attacks often result in perturbation anomalies at the image boundaries, which are the areas where padding is used. Consequently, we aim to provide an analysis of the interplay between padding and adversarial attacks and seek an answer to the question of how different padding modes (or their absence) affect adversarial robustness in various scenarios.
This study focuses on the autonomous navigation and mapping of indoor environments using a drone equipped only with a monocular camera and height measurement sensors. A visual SLAM algorithm was employed to generate a preliminary map of the environment and to determine the drone's position within the map. A deep neural network was utilized to generate a depth image from the monocular camera's input, which was subsequently transformed into a point cloud to be projected into the map. By aligning the depth point cloud with the map, 3D occupancy grid maps were constructed by using ray tracing techniques to get a precise depiction of obstacles and the surroundings. Due to the absence of IMU data from the low-cost drone for the SLAM algorithm, the created maps are inherently unscaled. However, preliminary tests with relative navigation in unscaled maps have revealed potential accuracy issues, which can only be overcome by incorporating additional information from the given sensors for scale estimation.
Artificial Intelligence (AI) can potentially transform many aspects of modern society in various ways, including automation of tasks, personalization of products and services, diagnosis of diseases and their treatment, transportation, safety, and security in public spaces, etc. Recently, AI technology has been transforming the financial industry, offering new ways to analyse data and automate processes, reduce costs, increase efficiency, and provide more personalized services to customers. However, it also raised important ethical and regulatory questions that need to be addressed by the industry and society as a whole. The aim of the Erasmus+ project Transversal Skills in Applied Artificial Intelligence - TSAAI (KA220-HED - Cooperation Partnerships in higher education) has been to establish a training platform that will incorporate teaching guidelines based on a curriculum covering different areas of application of AI technology. In this work, we will focus on applying AI models in the financial and insurance sectors.
Following their success in visual recognition tasks, Vision Transformers(ViTs) are being increasingly employed for image restoration. As a few recent works claim that ViTs for image classification also have better robustness properties, we investigate whether the improved adversarial robustness of ViTs extends to image restoration. We consider the recently proposed Restormer model, as well as NAFNet and the "Baseline network" which are both simplified versions of a Restormer. We use Projected Gradient Descent (PGD) and CosPGD for our robustness evaluation. Our experiments are performed on real-world images from the GoPro dataset for image deblurring. Our analysis indicates that contrary to as advocated by ViTs in image classification works, these models are highly susceptible to adversarial attacks. We attempt to find an easy fix and improve their robustness through adversarial training. While this yields a significant increase in robustness for Restormer, results on other networks are less promising. Interestingly, we find that the design choices in NAFNet and Baselines, which were based on iid performance, and not on robust generalization, seem to be at odds with the model robustness.
In this paper we present the concept of the "KI-Labor Südbaden" to support regional companies in the use of AI technologies. The approach is based on the "Periodic Table of AI" and is extended with both new dimensions for sustainability, and the impact of AI on the working environment. It is illustrated on the basis of three real-world use cases: 1. The detection of humans with lowresolution infrared (IR) images for collaborative robotics; 2. The use of machine data from specifically designed vehicles; 3. State-of-the-art Large Language Models (LLMs) applied to internal company documents. We explain the use cases, thereby demonstrating how to apply the Periodic Table of AI to structure AI applications.
Motivated by the recent trend towards the usage of larger receptive fields for more context-aware neural networks in vision applications, we aim to investigate how large these receptive fields really need to be. To facilitate such study, several challenges need to be addressed, most importantly: (i) We need to provide an effective way for models to learn large filters (potentially as large as the input data) without increasing their memory consumption during training or inference, (ii) the study of filter sizes has to be decoupled from other effects such as the network width or number of learnable parameters, and (iii) the employed convolution operation should be a plug-and-play module that can replace any conventional convolution in a Convolutional Neural Network (CNN) and allow for an efficient implementation in current frameworks. To facilitate such models, we propose to learn not spatial but frequency representations of filter weights as neural implicit functions, such that even infinitely large filters can be parameterized by only a few learnable weights. The resulting neural implicit frequency CNNs are the first models to achieve results on par with the state-of-the-art on large image classification benchmarks while executing convolutions solely in the frequency domain and can be employed within any CNN architecture. They allow us to provide an extensive analysis of the learned receptive fields. Interestingly, our analysis shows that, although the proposed networks could learn very large convolution kernels, the learned filters practically translate into well-localized and relatively small convolution kernels in the spatial domain.
State-of-the-art models for pixel-wise prediction tasks such as image restoration, image segmentation, or disparity estimation, involve several stages of data resampling, in which the resolution of feature maps is first reduced to aggregate information and then sequentially increased to generate a high-resolution output. Several previous works have investigated the effect of artifacts that are invoked during downsampling and diverse cures have been proposed that facilitate to improve prediction stability and even robustness for image classification. However, equally relevant, artifacts that arise during upsampling have been less discussed. This is significantly relevant as upsampling and downsampling approaches face fundamentally different challenges. While during downsampling, aliases and artifacts can be reduced by blurring feature maps, the emergence of fine details is crucial during upsampling. Blurring is therefore not an option and dedicated operations need to be considered. In this work, we are the first to explore the relevance of context during upsampling by employing convolutional upsampling operations with increasing kernel size while keeping the encoder unchanged. We find that increased kernel sizes can in general improve the prediction stability in tasks such as image restoration or image segmentation, while a block that allows for a combination of small-size kernels for fine details and large-size kernels for artifact removal and increased context yields the best results.
Currently, many theoretical as well as practically relevant questions towards the transferability and robustness of Convolutional Neural Networks (CNNs) remain unsolved. While ongoing research efforts are engaging these problems from various angles, in most computer vision related cases these approaches can be generalized to investigations of the effects of distribution shifts in image data. In this context, we propose to study the shifts in the learned weights of trained CNN models. Here we focus on the properties of the distributions of dominantly used 3×3 convolution filter kernels. We collected and publicly provide a dataset with over 1.4 billion filters from hundreds of trained CNNs, using a wide range of datasets, architectures, and vision tasks. In a first use case of the proposed dataset, we can show highly relevant properties of many publicly available pre-trained models for practical applications: I) We analyze distribution shifts (or the lack thereof) between trained filters along different axes of meta-parameters, like visual category of the dataset, task, architecture, or layer depth. Based on these results, we conclude that model pre-training can succeed on arbitrary datasets if they meet size and variance conditions. II) We show that many pre-trained models contain degenerated filters which make them less robust and less suitable for fine-tuning on target applications. Data & Project website: https://github.com/paulgavrikov/cnn-filter-db.
Over the last years, Convolutional Neural Networks (CNNs) have been the dominating neural architecture in a wide range of computer vision tasks. From an image and signal processing point of view, this success might be a bit surprising as the inherent spatial pyramid design of most CNNs is apparently violating basic signal processing laws, i.e. Sampling Theorem in their down-sampling operations. However, since poor sampling appeared not to affect model accuracy, this issue has been broadly neglected until model robustness started to receive more attention. Recent work in the context of adversarial attacks and distribution shifts, showed after all, that there is a strong correlation between the vulnerability of CNNs and aliasing artifacts induced by poor down-sampling operations. This paper builds on these findings and introduces an aliasing free down-sampling operation which can easily be plugged into any CNN architecture: FrequencyLowCut pooling. Our experiments show, that in combination with simple and Fast Gradient Sign Method (FGSM) adversarial training, our hyper-parameter free operator substantially improves model robustness and avoids catastrophic overfitting. Our code is available at https://github.com/GeJulia/flc_pooling
Deep learning models are intrinsically sensitive to distribution shifts in the input data. In particular, small, barely perceivable perturbations to the input data can force models to make wrong predictions with high confidence. An common defense mechanism is regularization through adversarial training which injects worst-case perturbations back into training to strengthen the decision boundaries, and to reduce overfitting. In this context, we perform an investigation of 3 × 3 convolution filters that form in adversarially- trained models. Filters are extracted from 71 public models of the ℓ ∞ -RobustBench CIFAR-10/100 and ImageNet1k leaderboard and compared to filters extracted from models built on the same architectures but trained without robust regularization. We observe that adversarially-robust models appear to form more diverse, less sparse, and more orthogonal convolution filters than their normal counterparts. The largest differences between robust and normal models are found in the deepest layers, and the very first convolution layer, which consistently and predominantly forms filters that can partially eliminate perturbations, irrespective of the architecture.
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Current attack methods are able to manipulate the network's prediction by adding specific but small amounts of noise to the input. In turn, adversarial training (AT) aims to achieve robustness against such attacks and ideally a better model generalization ability by including adversarial samples in the trainingset. However, an in-depth analysis of the resulting robust models beyond adversarial robustness is still pending. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions, even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences. Data & Project website: https://github.com/GeJulia/robustness_confidences_evaluation
Many commonly well-performing convolutional neural network models have shown to be susceptible to input data perturbations, indicating a low model robustness. To reveal model weaknesses, adversarial attacks are specifically optimized to generate small, barely perceivable image perturbations that flip the model prediction. Robustness against attacks can be gained by using adversarial examples during training, which in most cases reduces the measurable model attackability. Unfortunately, this technique can lead to robust overfitting, which results in non-robust models. In this paper, we analyze adversarially trained, robust models in the context of a specific network operation, the downsampling layer, and provide evidence that robust models have learned to downsample more accurately and suffer significantly less from downsampling artifacts, aka. aliasing, than baseline models. In the case of robust overfitting, we observe a strong increase in aliasing and propose a novel early stopping approach based on the measurement of aliasing.
Aerosol particles play an important role in the climate system by absorbing and scattering radiation and influencing cloud properties. They are also one of the biggest sources of uncertainty for climate modeling. Many climate models do not include aerosols in sufficient detail due to computational constraints. To represent key processes, aerosol microphysical properties and processes have to be accounted for. This is done in the ECHAM-HAM (European Center for Medium-Range Weather Forecast-Hamburg-Hamburg) global climate aerosol model using the M7 microphysics, but high computational costs make it very expensive to run with finer resolution or for a longer time. We aim to use machine learning to emulate the microphysics model at sufficient accuracy and reduce the computational cost by being fast at inference time. The original M7 model is used to generate data of input–output pairs to train a neural network (NN) on it. We are able to learn the variables’ tendencies achieving an average R² score of 77.1%. We further explore methods to inform and constrain the NN with physical knowledge to reduce mass violation and enforce mass positivity. On a Graphics processing unit (GPU), we achieve a speed-up of up to over 64 times faster when compared to the original model.
Estimating the Robustness of Classification Models by the Structure of the Learned Feature-Space
(2022)
Over the last decade, the development of deep image classification networks has mostly been driven by the search for the best performance in terms of classification accuracy on standardized benchmarks like ImageNet. More recently, this focus has been expanded by the notion of model robustness, \ie the generalization abilities of models towards previously unseen changes in the data distribution. While new benchmarks, like ImageNet-C, have been introduced to measure robustness properties, we argue that fixed testsets are only able to capture a small portion of possible data variations and are thus limited and prone to generate new overfitted solutions. To overcome these drawbacks, we suggest to estimate the robustness of a model directly from the structure of its learned feature-space. We introduce robustness indicators which are obtained via unsupervised clustering of latent representations from a trained classifier and show very high correlations to the model performance on corrupted test data.
Many commonly well-performing convolutional neural network models have shown to be susceptible to input data perturbations, indicating a low model robustness. Adversarial attacks are thereby specifically optimized to reveal model weaknesses, by generating small, barely perceivable image perturbations that flip the model prediction. Robustness against attacks can be gained for example by using adversarial examples during training, which effectively reduces the measurable model attackability. In contrast, research on analyzing the source of a model’s vulnerability is scarce. In this paper, we analyze adversarially trained, robust models in the context of a specifically suspicious network operation, the downsampling layer, and provide evidence that robust models have learned to downsample more accurately and suffer significantly less from aliasing than baseline models.
Recently, RobustBench (Croce et al. 2020) has become a widely recognized benchmark for the adversarial robustness of image
classification networks. In it’s most commonly reported sub-task, RobustBench evaluates and ranks the adversarial robustness of trained neural networks on CIFAR10 under AutoAttack (Croce and Hein 2020b) with l∞ perturbations limited to ϵ = 8/255. With leading scores of the currently best performing models of around 60% of the baseline, it is fair to characterize this benchmark to be quite challenging. Despite it’s general acceptance in recent literature, we aim to foster discussion about the suitability of RobustBench as a key indicator for robustness which could be generalized to practical applications. Our line of argumentation against this is two-fold and supported by excessive experiments presented in this paper: We argue that I) the alternation of data by AutoAttack with l∞, ϵ = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples even by simple detection algorithms and human observers.
We also show that other attack methods are much harder to detect while achieving similar success rates. II) That results on low resolution data sets like CIFAR10 do not generalize well to higher resolution images as gradient based attacks appear to become even more detectable with increasing resolutions.
Recent work has investigated the distributions of learned convolution filters through a large-scale study containing hundreds of heterogeneous image models. Surprisingly, on average, the distributions only show minor drifts in comparisons of various studied dimensions including the learned task, image domain, or dataset. However, among the studied image domains, medical imaging models appeared to show significant outliers through "spikey" distributions, and, therefore, learn clusters of highly specific filters different from other domains. Following this observation, we study the collected medical imaging models in more detail. We show that instead of fundamental differences, the outliers are due to specific processing in some architectures. Quite the contrary, for standardized architectures, we find that models trained on medical data do not significantly differ in their filter distributions from similar architectures trained on data from other domains. Our conclusions reinforce previous hypotheses stating that pre-training of imaging models can be done with any kind of diverse image data.
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Adversarial training (AT) is often considered as a remedy to train more robust networks. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences.
In this paper, we propose a unified approach for network pruning and one-shot neural architecture search (NAS) via group sparsity. We first show that group sparsity via the recent Proximal Stochastic Gradient Descent (ProxSGD) algorithm achieves new state-of-the-art results for filter pruning. Then, we extend this approach to operation pruning, directly yielding a gradient-based NAS method based on group sparsity. Compared to existing gradient-based algorithms such as DARTS, the advantages of this new group sparsity approach are threefold. Firstly, instead of a costly bilevel optimization problem, we formulate the NAS problem as a single-level optimization problem, which can be optimally and efficiently solved using ProxSGD with convergence guarantees. Secondly, due to the operation-level sparsity, discretizing the network architecture by pruning less important operations can be safely done without any performance degradation. Thirdly, the proposed approach finds architectures that are both stable and well-performing on a variety of search spaces and datasets.
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
The term “attribute transfer” refers to the tasks of altering images in such a way, that the semantic interpretation of a given input image is shifted towards an intended direction, which is quantified by semantic attributes. Prominent example applications are photo realistic changes of facial features and expressions, like changing the hair color, adding a smile, enlarging the nose or altering the entire context of a scene, like transforming a summer landscape into a winter panorama. Recent advances in attribute transfer are mostly based on generative deep neural networks, using various techniques to manipulate images in the latent space of the generator. In this paper, we present a novel method for the common sub-task of local attribute transfers, where only parts of a face have to be altered in order to achieve semantic changes (e.g. removing a mustache). In contrast to previous methods, where such local changes have been implemented by generating new (global) images, we propose to formulate local attribute transfers as an inpainting problem. Removing and regenerating only parts of images, our “Attribute Transfer Inpainting Generative Adversarial Network” (ATI-GAN) is able to utilize local context information to focus on the attributes while keeping the background unmodified resulting in visually sound results.
Data Science
(2021)
Know-how für Data Scientists
• Übersichtliche und anwendungsbezogene Einführung
• Zahlreiche Anwendungsfälle und Praxisbeispiele aus unterschiedlichen Branchen
• Potenziale, aber auch mögliche Fallstricke werden aufgezeigt
Data Science steht derzeit wie kein anderer Begriff für die Auswertung großer Datenmengen mit analytischen Konzepten des Machine Learning oder der künstlichen Intelligenz. Nach der bewussten Wahrnehmung der Big Data und dabei insbesondere der Verfügbarmachung in Unternehmen sind Technologien und Methoden zur Auswertung dort gefordert, wo klassische Businss Intelligence an ihre Grenzen stößt.
Dieses Buch bietet eine umfassende Einführung in Data Science und deren praktische Relevanz für Unternehmen. Dabei wird auch die Integration von Data Science in ein bereits bestehendes Business-Intelligence-Ökosystem thematisiert. In verschiedenen Beiträgen werden sowohl Aufgabenfelder und Methoden als auch Rollen- und Organisationsmodelle erläutert, die im Zusammenspiel mit Konzepten und Architekturen auf Data Science wirken.
Diese 2., überarbeitete Auflage wurde um neue Themen wie Feature Selection und Deep Reinforcement Learning sowie eine neue Fallstudie erweitert.
Recently, adversarial attacks on image classification networks by the AutoAttack (Croce and Hein, 2020b) framework have drawn a lot of attention. While AutoAttack has shown a very high attack success rate, most defense approaches are focusing on network hardening and robustness enhancements, like adversarial training. This way, the currently best-reported method can withstand about 66% of adversarial examples on CIFAR10. In this paper, we investigate the spatial and frequency domain properties of AutoAttack and propose an alternative defense. Instead of hardening a network, we detect adversarial attacks during inference, rejecting manipulated inputs. Based on a rather simple and fast analysis in the frequency domain, we introduce two different detection algorithms. First, a black box detector that only operates on the input images and achieves a detection accuracy of 100% on the AutoAttack CIFAR10 benchmark and 99.3% on ImageNet, for epsilon = 8/255 in both cases. Second, a whitebox detector using an analysis of CNN feature maps, leading to a detection rate of also 100% and 98.7% on the same benchmarks.
Aerosol particles play an important role in the climate system by absorbing and scattering radiation and influencing cloud properties. They are also one of the biggest sources of uncertainty for climate modeling. Many climate models do not include aerosols in sufficient detail. In order to achieve higher accuracy, aerosol microphysical properties and processes have to be accounted for. This is done in the ECHAM-HAM global climate aerosol model using the M7 microphysics model, but increased computational costs make it very expensive to run at higher resolutions or for a longer time. We aim to use machine learning to approximate the microphysics model at sufficient accuracy and reduce the computational cost by being fast at inference time. The original M7 model is used to generate data of input-output pairs to train a neural network on it. By using a special logarithmic transform we are able to learn the variables tendencies achieving an average score of . On a GPU we achieve a speed-up of 120 compared to the original model.
Correlation Clustering, also called the minimum cost Multicut problem, is the process of grouping data by pairwise similarities. It has proven to be effective on clustering problems, where the number of classes is unknown. However, not only is the Multicut problem NP-hard, an undirected graph G with n vertices representing single images has at most edges, thus making it challenging to implement correlation clustering for large datasets. In this work, we propose Multi-Stage Multicuts (MSM) as a scalable approach for image clustering. Specifically, we solve minimum cost Multicut problems across multiple distributed compute units. Our approach not only allows to solve problem instances which are too large to fit into the shared memory of a single compute node, but it also achieves significant speedups while preserving the clustering accuracy at the same time. We evaluate our proposed method on the CIFAR10 …
A fundamental and still largely unsolved question in the context of Generative Adversarial Networks is whether they are truly able to capture the real data distribution and, consequently, to sample from it. In particular, the multidimensional nature of image distributions leads to a complex evaluation of the diversity of GAN distributions. Existing approaches provide only a partial understanding of this issue, leaving the question unanswered. In this work, we introduce a loop-training scheme for the systematic investigation of observable shifts between the distributions of real training data and GAN generated data. Additionally, we introduce several bounded measures for distribution shifts, which are both easy to compute and to interpret. Overall, the combination of these methods allows an explorative investigation of innate limitations of current GAN algorithms. Our experiments on different data-sets and multiple state-of-the-art GAN architectures show large shifts between input and output distributions, showing that existing theoretical guarantees towards the convergence of output distributions appear not to be holding in practice.
An Empirical Investigation of Model-to-Model Distribution Shifts in Trained Convolutional Filters
(2021)
We present first empirical results from our ongoing investigation of distribution shifts in image data used for various computer vision tasks. Instead of analyzing the original training and test data, we propose to study shifts in the learned weights of trained models. In this work, we focus on the properties of the distributions of dominantly used 3x3 convolution filter kernels. We collected and publicly provide a data set with over half a billion filters from hundreds of trained CNNs, using a wide range of data sets, architectures, and vision tasks. Our analysis shows interesting distribution shifts (or the lack thereof) between trained filters along different axes of meta-parameters, like data type, task, architecture, or layer depth. We argue, that the observed properties are a valuable source for further investigation into a better understanding of the impact of shifts in the input data to the generalization abilities of CNN models and novel methods for more robust transfer-learning in this domain.
Engineering, construction and operation of complex machines involves a wide range of complicated, simultaneous tasks, which potentially could be automated. In this work, we focus on perception tasks in such systems, investigating deep learning approaches for multi-task transfer learning with limited training data. We show an approach that takes advantage of a technical systems’ focus on selected objects and their properties. We create focused representations and simultaneously solve joint objectives in a system through multi-task learning with convolutional autoencoders. The focused representations are used as a starting point for the data-saving solution of the additional tasks. The efficiency of this approach is demonstrated using images and tasks of an autonomous circular crane with a grapple.
In this work, we evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings. Specifically, we train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss in order to study their clustering properties under the assumption of noisy labels. Additionally, we propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods. We evaluate all three Triplet loss formulations for K-means and correlation clustering on the CIFAR-10 image classification dataset.
Despite the success of convolutional neural networks (CNNs) in many computer vision and image analysis tasks, they remain vulnerable against so-called adversarial attacks: Small, crafted perturbations in the input images can lead to false predictions. A possible defense is to detect adversarial examples. In this work, we show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images. We propose two novel detection methods: Our first method employs the magnitude spectrum of the input images to detect an adversarial attack. This simple and robust classifier can successfully detect adversarial perturbations of three commonly used attack methods. The second method builds upon the first and additionally extracts the phase of Fourier coefficients of feature-maps at different layers of the network. With this extension, we are able to improve adversarial detection rates compared to state-of-the-art detectors on five different attack methods. The code for the methods proposed in the paper is available at github.com/paulaharder/SpectralAdversarialDefense
We demonstrate how to exploit group sparsity in order to bridge the areas of network pruning and neural architecture search (NAS). This results in a new one-shot NAS optimizer that casts the problem as a single-level optimization problem and does not suffer any performance degradation from discretizating the architecture.
Interpreting seismic data requires the characterization of a number of key elements such as the position of faults and main reflections, presence of structural bodies, and clustering of areas exhibiting a similar amplitude versus angle response. Manual interpretation of geophysical data is often a difficult and time-consuming task, complicated by lack of resolution and presence of noise. In recent years, approaches based on convolutional neural networks have shown remarkable results in automating certain interpretative tasks. However, these state-of-the-art systems usually need to be trained in a supervised manner, and they suffer from a generalization problem. Hence, it is highly challenging to train a model that can yield accurate results on new real data obtained with different acquisition, processing, and geology than the data used for training. In this work, we introduce a novel method that combines generative neural networks with a segmentation task in order to decrease the gap between annotated training data and uninterpreted target data. We validate our approach on two applications: the detection of diffraction events and the picking of faults. We show that when transitioning from synthetic training data to real validation data, our workflow yields superior results compared to its counterpart without the generative network.