Refine
Document Type
- Conference Proceeding (46)
- Article (unreviewed) (16)
- Article (reviewed) (10)
- Book (2)
- Part of a Book (2)
- Contribution to a Periodical (1)
- Doctoral Thesis (1)
- Report (1)
Conference Type
- Konferenzartikel (44)
- Konferenz-Abstract (1)
- Sonstiges (1)
Is part of the Bibliography
- yes (79)
Keywords
- Deep Leaning (11)
- Machine Learning (9)
- Robustness (4)
- Data Science (3)
- Generative Adversarial Network (3)
- image classification (3)
- Aliasing (2)
- Benutzererlebnis (2)
- CNNs (2)
- Computer Vision (2)
- Deep Learning (2)
- Electronic Commerce (2)
- Electronic Shopping (2)
- Künstliche Intelligenz (2)
- Stability (2)
- adversarial attacks (2)
- autoattack (2)
- convolutional neural networks (2)
- deep learning (2)
- neural networks (2)
- Adversarial Attacks (1)
- Adversarial Robustness (1)
- Adversarial examples (1)
- Adversarial robustness (1)
- Artificial Intelligence (1)
- Business Intelligence (1)
- CNN (1)
- COVID-19 pandemic (1)
- Deep Reinforcement Learning (1)
- Deep diffusion models (1)
- Deep learning (1)
- Digital Identity (1)
- Digitale Identität (1)
- E-Commerce (1)
- E-grocery (1)
- Eigenvalues (1)
- Elderly consumers (1)
- Generative models (1)
- Germany (1)
- Identität (1)
- Image restoration (1)
- InceptionTime (1)
- KI-Labor Südbaden (1)
- Kaufentscheidung (1)
- Kaufverhalten (1)
- Maschinelles Lernen (1)
- Mode Collapse (1)
- Model Calibration (1)
- Monocular Depth Estimation (1)
- Neural Architecture Search (1)
- Nyquist-Shannon (1)
- Octave Convolution (1)
- Online Grocery Shopping (1)
- Online-Shopping (1)
- Optimization (1)
- PPS (1)
- Pattern Recognition (1)
- Periodic Table of AI (1)
- Produktionsplanung und -steuerung (1)
- Regularization (1)
- Representation Learning (1)
- ResNet (1)
- Road-Quality Prediction (1)
- RoboCup (1)
- Roubst overfitting (1)
- Sampling (1)
- Second-order Optimization (1)
- Seismic processing (1)
- Selbstbestimmte Identität (1)
- Self-Sovereign Identity (1)
- Time-series Classification (1)
- Unsupervised Conditional Training (1)
- Use Case (1)
- User Experience (1)
- Wallet (1)
- adversarial (1)
- adversarial detection (1)
- aerosol modeling (1)
- artificial intelligence (1)
- attribute manipulation (1)
- autoML (1)
- cifar (1)
- climate emulation (1)
- correlation (1)
- curriculum learning (1)
- deep reinforcement learning (1)
- defense (1)
- detection (1)
- face editing (1)
- face recognition (1)
- fourier (1)
- gan (1)
- generative adversarial networks (1)
- hair (1)
- image color analysis (1)
- imagenet (1)
- interpretation (1)
- lid (1)
- mahalanobis (1)
- neural architecture search (1)
- noise measurement (1)
- nose (1)
- pattern recognition (1)
- physics-informed ML (1)
- pruning (1)
- robustness (1)
- seismic (1)
- semantics (1)
- spectral defense (1)
- spectraldefense (1)
- style transfer (1)
- transversal skills (1)
Institute
- IMLA - Institute for Machine Learning and Analytics (79) (remove)
Open Access
- Open Access (56)
- Bronze (16)
- Closed Access (11)
- Closed (10)
- Diamond (9)
- Gold (4)
- Hybrid (3)
- Grün (2)
In diesem Beitrag werden grundlegende Aspekte und Methoden der Data Science erläutert. Nach dem Vorgehensmodell CRISP-DM sind in den Phasen Data Unterstanding und Data Preparation vor allem Verfahren der Datenselektion, Datenvorverarbeitung und der explorativen Datenanalyse anzuwenden. Beim Modeling, der Hauptaufgabe der Data Science, kann man überwachte und unüberwachte Methoden sowie Reinforcement Learning unterscheiden. Auf die Evaluation der Güte eines Modells anhand von Qualitätsmaßen wird anschließend eingegangen. Der Beitrag schließt mit einem Ausblick auf weitere Themen wie Cognitive Computing.
Machine Learning als Schlüsseltechnologie für Digitalisierung: Wie funktioniert maschinelles Lernen?
(2019)
Maschinelles Lernen verändert zusehends die Arbeitswelt. Auch in der Produktionsplanung und -steuerung finden sich vielversprechende Anwendungsfälle. In diesem Beitrag sollen ausgewählte Anwendungsbereiche und Ansätze vorgestellt werden, die anhand einer umfangreichen Untersuchung wissenschaftlicher Veröffentlichungen identifiziert wurden. Als Strukturierungshilfe dient das Aachener PPS-Modell.
Artificial Intelligence (AI) can potentially transform many aspects of modern society in various ways, including automation of tasks, personalization of products and services, diagnosis of diseases and their treatment, transportation, safety, and security in public spaces, etc. Recently, AI technology has been transforming the financial industry, offering new ways to analyse data and automate processes, reduce costs, increase efficiency, and provide more personalized services to customers. However, it also raised important ethical and regulatory questions that need to be addressed by the industry and society as a whole. The aim of the Erasmus+ project Transversal Skills in Applied Artificial Intelligence - TSAAI (KA220-HED - Cooperation Partnerships in higher education) has been to establish a training platform that will incorporate teaching guidelines based on a curriculum covering different areas of application of AI technology. In this work, we will focus on applying AI models in the financial and insurance sectors.
Apache Hadoop is a well-known open-source framework for storing and processing huge amounts of data. This paper shows the usage of the framework within a project of the university in cooperation with a semiconductor company. The goal of this project was to supplement the existing data landscape by the facilities of storing and analyzing the data on a new Apache Hadoop based platform.
Neural networks tend to overfit the training distribution and perform poorly on out-ofdistribution data. A conceptually simple solution lies in adversarial training, which introduces worst-case perturbations into the training data and thus improves model generalization to some extent. However, it is only one ingredient towards generally more robust models and requires knowledge about the potential attacks or inference time data corruptions during model training. This paper focuses on the native robustness of models that can learn robust behavior directly from conventional training data without out-of-distribution examples. To this end, we study the frequencies in learned convolution filters. Clean-trained models often prioritize high-frequency information, whereas adversarial training enforces models to shift the focus to low-frequency details during training. By mimicking this behavior through frequency regularization in learned convolution weights, we achieve improved native robustness to adversarial attacks, common corruptions, and other out-of-distribution tests. Additionally, this method leads to more favorable shifts in decision-making towards low-frequency information, such as shapes, which inherently aligns more closely with human vision.
Recently, RobustBench (Croce et al. 2020) has become a widely recognized benchmark for the adversarial robustness of image
classification networks. In it’s most commonly reported sub-task, RobustBench evaluates and ranks the adversarial robustness of trained neural networks on CIFAR10 under AutoAttack (Croce and Hein 2020b) with l∞ perturbations limited to ϵ = 8/255. With leading scores of the currently best performing models of around 60% of the baseline, it is fair to characterize this benchmark to be quite challenging. Despite it’s general acceptance in recent literature, we aim to foster discussion about the suitability of RobustBench as a key indicator for robustness which could be generalized to practical applications. Our line of argumentation against this is two-fold and supported by excessive experiments presented in this paper: We argue that I) the alternation of data by AutoAttack with l∞, ϵ = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples even by simple detection algorithms and human observers.
We also show that other attack methods are much harder to detect while achieving similar success rates. II) That results on low resolution data sets like CIFAR10 do not generalize well to higher resolution images as gradient based attacks appear to become even more detectable with increasing resolutions.
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant m argin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
Recently, adversarial attacks on image classification networks by the AutoAttack (Croce and Hein, 2020b) framework have drawn a lot of attention. While AutoAttack has shown a very high attack success rate, most defense approaches are focusing on network hardening and robustness enhancements, like adversarial training. This way, the currently best-reported method can withstand about 66% of adversarial examples on CIFAR10. In this paper, we investigate the spatial and frequency domain properties of AutoAttack and propose an alternative defense. Instead of hardening a network, we detect adversarial attacks during inference, rejecting manipulated inputs. Based on a rather simple and fast analysis in the frequency domain, we introduce two different detection algorithms. First, a black box detector that only operates on the input images and achieves a detection accuracy of 100% on the AutoAttack CIFAR10 benchmark and 99.3% on ImageNet, for epsilon = 8/255 in both cases. Second, a whitebox detector using an analysis of CNN feature maps, leading to a detection rate of also 100% and 98.7% on the same benchmarks.
Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality
(2023)
Diffusion models recently have been successfully applied for the visual synthesis of strikingly realistic appearing images. This raises strong concerns about their potential for malicious purposes. In this paper, we propose using the lightweight multi Local Intrinsic Dimensionality (multiLID), which has been originally developed in context of the detection of adversarial examples, for the automatic detection of synthetic images and the identification of the according generator networks. In contrast to many existing detection approaches, which often only work for GAN-generated images, the proposed method provides close to perfect detection results in many realistic use cases. Extensive experiments on known and newly created datasets demonstrate that the proposed multiLID approach exhibits superiority in diffusion detection and model identification.Since the empirical evaluations of recent publications on the detection of generated images are often mainly focused on the "LSUN-Bedroom" dataset, we further establish a comprehensive benchmark for the detection of diffusion-generated images, including samples from several diffusion models with different image sizes.The code for our experiments is provided at https://github.com/deepfake-study/deepfake-multiLID.
Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms.
Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.
In this paper, we describe a first publicly available fine-grained product recognition dataset based on leaflet images. Using advertisement leaflets, collected over several years from different European retailers, we provide a total of 41.6k manually annotated product images in 832 classes. Further, we investigate three different approaches for this fine-grained product classification task, Classification by Image, by Text, as well as by Image and Text. The approach "Classification by Text" uses the text extracted directly from the leaflet product images. We show, that the combination of image and text as input improves the classification of visual difficult to distinguish products. The final model leads to an accuracy of 96.4% with a Top-3 score of 99.2%. We release our code at https://github.com/ladwigd/Leaflet-Product-Classification.
In many application areas, Deep Reinforcement Learning (DRL) has led to breakthroughs. In Curriculum Learning, the Machine Learning algorithm is not randomly presented with examples, but in a meaningful order of increasing difficulty. This has been used in many application areas to further improve the results of learning systems or to reduce their learning time. Such approaches range from learning plans created manually by domain experts to those created automatically. The automated creation of learning plans is one of the biggest challenges.In this work, we investigate an approach in which a trainer learns in parallel and analogously to the student to automatically create a learning plan for the student during this Double Deep Reinforcement Learning (DDRL). Three Reward functions, Friendly, Adversarial, and Dynamic based on the learner’s reward are compared. The domain for evaluation is kicking with variable distance, direction and relative ball position in the SimSpark simulated soccer environment.As a result, Statistic Curriculum Learning (SCL) performs better than a random curriculum with respect to training time and result quality. DDRL reaches a comparable quality as the baseline and outperforms it significantly in shorter trainings in the distance-direction subdomain reducing the number of required training cycles by almost 50%.
Extracting horizon surfaces from key reflections in a seismic image is an important step of the interpretation process. Interpreting a reflection surface in a geologically complex area is a difficult and time-consuming task, and it requires an understanding of the 3D subsurface geometry. Common methods to help automate the process are based on tracking waveforms in a local window around manual picks. Those approaches often fail when the wavelet character lacks lateral continuity or when reflections are truncated by faults. We have formulated horizon picking as a multiclass segmentation problem and solved it by supervised training of a 3D convolutional neural network. We design an efficient architecture to analyze the data over multiple scales while keeping memory and computational needs to a practical level. To allow for uncertainties in the exact location of the reflections, we use a probabilistic formulation to express the horizons position. By using a masked loss function, we give interpreters flexibility when picking the training data. Our method allows experts to interactively improve the results of the picking by fine training the network in the more complex areas. We also determine how our algorithm can be used to extend horizons to the prestack domain by following reflections across offsets planes, even in the presence of residual moveout. We validate our approach on two field data sets and show that it yields accurate results on nontrivial reflectivity while being trained from a workable amount of manually picked data. Initial training of the network takes approximately 1 h, and the fine training and prediction on a large seismic volume take a minute at most.
Diffracted waves carry high resolution information that can help interpreting fine structural details at a scale smaller than the seismic wavelength. Because of the low signal-to-noise ratio of diffracted waves, it is challenging to preserve them during processing and to identify them in the final data. It is, therefore, a traditional approach to pick manually the diffractions. However, such task is tedious and often prohibitive, thus, current attention is given to domain adaptation. Those methods aim to transfer knowledge from a labeled domain to train the model, and then infer on the real unlabeled data. In this regard, it is common practice to create a synthetic labeled training dataset, followed by testing on unlabeled real data. Unfortunately, such procedure may fail due to the existing gap between the synthetic and the real distribution since quite often synthetic data oversimplifies the problem, and consequently the transfer learning becomes a hard and non-trivial procedure. Furthermore, deep neural networks are characterized by their high sensitivity towards cross-domain distribution shift. In this work, we present deep learning model that builds a bridge between both distributions creating a semi-synthetic datatset that fills in the gap between synthetic and real domains. More specifically, our proposal is a feed-forward, fully convolutional neural network for imageto-image translation that allows to insert synthetic diffractions while preserving the original reflection signal. A series of experiments validate that our approach produces convincing seismic data containing the desired synthetic diffractions.
Due to the rapidly increasing storage consumption worldwide, as well as the expectation of continuous availability of information, the complexity of administration in today’s data centers is growing permanently. Integrated techniques for monitoring hard disks can increase the reliability of storage systems. However, these techniques often lack intelligent data analysis to perform predictive maintenance. To solve this problem, machine learning algorithms can be used to detect potential failures in advance and prevent them. In this paper, an unsupervised model for predicting hard disk failures based on Isolation Forest is proposed. Consequently, a method is presented that can deal with the highly imbalanced datasets, as the experiment on the Backblaze benchmark dataset demonstrates.
Multiple Object Tracking (MOT) is a long-standing task in computer vision. Current approaches based on the tracking by detection paradigm either require some sort of domain knowledge or supervision to associate data correctly into tracks. In this work, we present an unsupervised multiple object tracking approach based on visual features and minimum cost lifted multicuts. Our method is based on straight-forward spatio-temporal cues that can be extracted from neighboring frames in an image sequences without superivison. Clustering based on these cues enables us to learn the required appearance invariances for the tracking task at hand and train an autoencoder to generate suitable latent representation. Thus, the resulting latent representations can serve as robust appearance cues for tracking even over large temporal distances where no reliable spatio-temporal features could be extracted. We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking.
The mathematical representations of data in the Spherical Harmonic (SH) domain has recently regained increasing interest in the machine learning community. This technical report gives an in-depth introduction to the theoretical foundation and practical implementation of SH representations, summarizing works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. In extension, these methods are then generalized from scalar SH representations to Vectorial Harmonics (VH), providing the same capabilities for 3d vector fields on spheres.
This paper describes the concept and some results of the project "Menschen Lernen Maschinelles Lernen" (Humans Learn Machine Learning, ML2) of the University of Applied Sciences Offenburg. It brings together students of different courses of study and practitioners from companies on the subject of Machine Learning. A mixture of blended learning and practical projects ensures a tight coupling of machine learning theory and application. The paper details the phases of ML2 and mentions two successful example projects.
Engineering, construction and operation of complex machines involves a wide range of complicated, simultaneous tasks, which potentially could be automated. In this work, we focus on perception tasks in such systems, investigating deep learning approaches for multi-task transfer learning with limited training data. We show an approach that takes advantage of a technical systems’ focus on selected objects and their properties. We create focused representations and simultaneously solve joint objectives in a system through multi-task learning with convolutional autoencoders. The focused representations are used as a starting point for the data-saving solution of the additional tasks. The efficiency of this approach is demonstrated using images and tasks of an autonomous circular crane with a grapple.
In this work, we evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings. Specifically, we train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss in order to study their clustering properties under the assumption of noisy labels. Additionally, we propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods. We evaluate all three Triplet loss formulations for K-means and correlation clustering on the CIFAR-10 image classification dataset.
Estimating the Robustness of Classification Models by the Structure of the Learned Feature-Space
(2022)
Over the last decade, the development of deep image classification networks has mostly been driven by the search for the best performance in terms of classification accuracy on standardized benchmarks like ImageNet. More recently, this focus has been expanded by the notion of model robustness, \ie the generalization abilities of models towards previously unseen changes in the data distribution. While new benchmarks, like ImageNet-C, have been introduced to measure robustness properties, we argue that fixed testsets are only able to capture a small portion of possible data variations and are thus limited and prone to generate new overfitted solutions. To overcome these drawbacks, we suggest to estimate the robustness of a model directly from the structure of its learned feature-space. We introduce robustness indicators which are obtained via unsupervised clustering of latent representations from a trained classifier and show very high correlations to the model performance on corrupted test data.
In this work, we evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings. Specifically, we train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss in order to study their clustering properties under the assumption of noisy labels. Additionally, we propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods. We evaluate all three Triplet loss formulations for K-means and correlation clustering on the CIFAR-10 image classification dataset.
Correlation Clustering, also called the minimum cost Multicut problem, is the process of grouping data by pairwise similarities. It has proven to be effective on clustering problems, where the number of classes is unknown. However, not only is the Multicut problem NP-hard, an undirected graph G with n vertices representing single images has at most edges, thus making it challenging to implement correlation clustering for large datasets. In this work, we propose Multi-Stage Multicuts (MSM) as a scalable approach for image clustering. Specifically, we solve minimum cost Multicut problems across multiple distributed compute units. Our approach not only allows to solve problem instances which are too large to fit into the shared memory of a single compute node, but it also achieves significant speedups while preserving the clustering accuracy at the same time. We evaluate our proposed method on the CIFAR10 …
Design and Implementation of a Camera-Based Tracking System for MAV Using Deep Learning Algorithms
(2023)
In recent years, the advancement of micro-aerial vehicles has been rapid, leading to their widespread utilization across various domains due to their adaptability and efficiency. This research paper focuses on the development of a camera-based tracking system specifically designed for low-cost drones. The primary objective of this study is to build up a system capable of detecting objects and locating them on a map in real time. Detection and positioning are achieved solely through the utilization of the drone’s camera and sensors. To accomplish this goal, several deep learning algorithms are assessed and adopted because of their suitability with the system. Object detection is based upon a single-shot detector architecture chosen for maximum computation speed, and the tracking is based upon the combination of deep neural-network-based features combined with an efficient sorting strategy. Subsequently, the developed system is evaluated using diverse metrics to determine its performance for detection and tracking. To further validate the approach, the system is employed in the real world to show its possible deployment. For this, two distinct scenarios were chosen to adjust the algorithms and system setup: a search and rescue scenario with user interaction and precise geolocalization of missing objects, and a livestock control scenario, showing the capability of surveying individual members and keeping track of number and area. The results demonstrate that the system is capable of operating in real time, and the evaluation verifies that the implemented system enables precise and reliable determination of detected object positions. The ablation studies prove that object identification through small variations in phenotypes is feasible with our approach.
This study focuses on the autonomous navigation and mapping of indoor environments using a drone equipped only with a monocular camera and height measurement sensors. A visual SLAM algorithm was employed to generate a preliminary map of the environment and to determine the drone's position within the map. A deep neural network was utilized to generate a depth image from the monocular camera's input, which was subsequently transformed into a point cloud to be projected into the map. By aligning the depth point cloud with the map, 3D occupancy grid maps were constructed by using ray tracing techniques to get a precise depiction of obstacles and the surroundings. Due to the absence of IMU data from the low-cost drone for the SLAM algorithm, the created maps are inherently unscaled. However, preliminary tests with relative navigation in unscaled maps have revealed potential accuracy issues, which can only be overcome by incorporating additional information from the given sensors for scale estimation.
The present work ties in with the problem of bicycle road assessment that is currently done using expensive special measuring vehicles. Our alternative approach for road condition assessment is to mount a sensor device on a bicycle which sends accelerometer and gyroscope data via WiFi to a classification server. There, a prediction model determines road type and condition based on the sensor data. For the classification task, we compare different machine learning methods with each other, whereby validation accuracies of 99% can be achieved with deep residual networks such as InceptionTime. The main contribution of this work with respect to comparable work is that we achieve excellent accuracies on a realistic dataset classifying road conditions into nine distinct classes that are highly relevant for practice.
Aerosol particles play an important role in the climate system by absorbing and scattering radiation and influencing cloud properties. They are also one of the biggest sources of uncertainty for climate modeling. Many climate models do not include aerosols in sufficient detail. In order to achieve higher accuracy, aerosol microphysical properties and processes have to be accounted for. This is done in the ECHAM-HAM global climate aerosol model using the M7 microphysics model, but increased computational costs make it very expensive to run at higher resolutions or for a longer time. We aim to use machine learning to approximate the microphysics model at sufficient accuracy and reduce the computational cost by being fast at inference time. The original M7 model is used to generate data of input-output pairs to train a neural network on it. By using a special logarithmic transform we are able to learn the variables tendencies achieving an average score of . On a GPU we achieve a speed-up of 120 compared to the original model.
Aerosol particles play an important role in the climate system by absorbing and scattering radiation and influencing cloud properties. They are also one of the biggest sources of uncertainty for climate modeling. Many climate models do not include aerosols in sufficient detail due to computational constraints. To represent key processes, aerosol microphysical properties and processes have to be accounted for. This is done in the ECHAM-HAM (European Center for Medium-Range Weather Forecast-Hamburg-Hamburg) global climate aerosol model using the M7 microphysics, but high computational costs make it very expensive to run with finer resolution or for a longer time. We aim to use machine learning to emulate the microphysics model at sufficient accuracy and reduce the computational cost by being fast at inference time. The original M7 model is used to generate data of input–output pairs to train a neural network (NN) on it. We are able to learn the variables’ tendencies achieving an average R² score of 77.1%. We further explore methods to inform and constrain the NN with physical knowledge to reduce mass violation and enforce mass positivity. On a Graphics processing unit (GPU), we achieve a speed-up of up to over 64 times faster when compared to the original model.
Despite the success of convolutional neural networks (CNNs) in many computer vision and image analysis tasks, they remain vulnerable against so-called adversarial attacks: Small, crafted perturbations in the input images can lead to false predictions. A possible defense is to detect adversarial examples. In this work, we show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images. We propose two novel detection methods: Our first method employs the magnitude spectrum of the input images to detect an adversarial attack. This simple and robust classifier can successfully detect adversarial perturbations of three commonly used attack methods. The second method builds upon the first and additionally extracts the phase of Fourier coefficients of feature-maps at different layers of the network. With this extension, we are able to improve adversarial detection rates compared to state-of-the-art detectors on five different attack methods. The code for the methods proposed in the paper is available at github.com/paulaharder/SpectralAdversarialDefense
Einleitung
(2019)
We introduce an open source python framework named PHS-Parallel Hyperparameter Search to enable hyperparameter optimization on numerous compute instances of any arbitrary python function. This is achieved with minimal modifications inside the target function. Possible applications appear in expensive to evaluate numerical computations which strongly depend on hyperparameters such as machine learning. Bayesian optimization is chosen as a sample efficient method to propose the next query set of parameters.
Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling
(2023)
Convolutional neural networks encode images through a sequence of convolutions, normalizations and non-linearities as well as downsampling operations into potentially strong semantic embeddings. Yet, previous work showed that even slight mistakes during sampling, leading to aliasing, can be directly attributed to the networks' lack in robustness. To address such issues and facilitate simpler and faster adversarial training, [12] recently proposed FLC pooling, a method for provably alias-free downsampling - in theory. In this work, we conduct a further analysis through the lens of signal processing and find that such current pooling methods, which address aliasing in the frequency domain, are still prone to spectral leakage artifacts. Hence, we propose aliasing and spectral artifact-free pooling, short ASAP. While only introducing a few modifications to FLC pooling, networks using ASAP as downsampling method exhibit higher native robustness against common corruptions, a property that FLC pooling was missing. ASAP also increases native robustness against adversarial attacks on high and low resolution data while maintaining similar clean accuracy or even outperforming the baseline.
Motivated by the recent trend towards the usage of larger receptive fields for more context-aware neural networks in vision applications, we aim to investigate how large these receptive fields really need to be. To facilitate such study, several challenges need to be addressed, most importantly: (i) We need to provide an effective way for models to learn large filters (potentially as large as the input data) without increasing their memory consumption during training or inference, (ii) the study of filter sizes has to be decoupled from other effects such as the network width or number of learnable parameters, and (iii) the employed convolution operation should be a plug-and-play module that can replace any conventional convolution in a Convolutional Neural Network (CNN) and allow for an efficient implementation in current frameworks. To facilitate such models, we propose to learn not spatial but frequency representations of filter weights as neural implicit functions, such that even infinitely large filters can be parameterized by only a few learnable weights. The resulting neural implicit frequency CNNs are the first models to achieve results on par with the state-of-the-art on large image classification benchmarks while executing convolutions solely in the frequency domain and can be employed within any CNN architecture. They allow us to provide an extensive analysis of the learned receptive fields. Interestingly, our analysis shows that, although the proposed networks could learn very large convolution kernels, the learned filters practically translate into well-localized and relatively small convolution kernels in the spatial domain.
Many commonly well-performing convolutional neural network models have shown to be susceptible to input data perturbations, indicating a low model robustness. To reveal model weaknesses, adversarial attacks are specifically optimized to generate small, barely perceivable image perturbations that flip the model prediction. Robustness against attacks can be gained by using adversarial examples during training, which in most cases reduces the measurable model attackability. Unfortunately, this technique can lead to robust overfitting, which results in non-robust models. In this paper, we analyze adversarially trained, robust models in the context of a specific network operation, the downsampling layer, and provide evidence that robust models have learned to downsample more accurately and suffer significantly less from downsampling artifacts, aka. aliasing, than baseline models. In the case of robust overfitting, we observe a strong increase in aliasing and propose a novel early stopping approach based on the measurement of aliasing.
Many commonly well-performing convolutional neural network models have shown to be susceptible to input data perturbations, indicating a low model robustness. Adversarial attacks are thereby specifically optimized to reveal model weaknesses, by generating small, barely perceivable image perturbations that flip the model prediction. Robustness against attacks can be gained for example by using adversarial examples during training, which effectively reduces the measurable model attackability. In contrast, research on analyzing the source of a model’s vulnerability is scarce. In this paper, we analyze adversarially trained, robust models in the context of a specifically suspicious network operation, the downsampling layer, and provide evidence that robust models have learned to downsample more accurately and suffer significantly less from aliasing than baseline models.
Over the last years, Convolutional Neural Networks (CNNs) have been the dominating neural architecture in a wide range of computer vision tasks. From an image and signal processing point of view, this success might be a bit surprising as the inherent spatial pyramid design of most CNNs is apparently violating basic signal processing laws, i.e. Sampling Theorem in their down-sampling operations. However, since poor sampling appeared not to affect model accuracy, this issue has been broadly neglected until model robustness started to receive more attention. Recent work in the context of adversarial attacks and distribution shifts, showed after all, that there is a strong correlation between the vulnerability of CNNs and aliasing artifacts induced by poor down-sampling operations. This paper builds on these findings and introduces an aliasing free down-sampling operation which can easily be plugged into any CNN architecture: FrequencyLowCut pooling. Our experiments show, that in combination with simple and Fast Gradient Sign Method (FGSM) adversarial training, our hyper-parameter free operator substantially improves model robustness and avoids catastrophic overfitting. Our code is available at https://github.com/GeJulia/flc_pooling
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Current attack methods are able to manipulate the network's prediction by adding specific but small amounts of noise to the input. In turn, adversarial training (AT) aims to achieve robustness against such attacks and ideally a better model generalization ability by including adversarial samples in the trainingset. However, an in-depth analysis of the resulting robust models beyond adversarial robustness is still pending. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions, even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences. Data & Project website: https://github.com/GeJulia/robustness_confidences_evaluation
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Adversarial training (AT) is often considered as a remedy to train more robust networks. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences.
Neural networks have a number of shortcomings. Amongst the severest ones is the sensitivity to distribution shifts which allows models to be easily fooled into wrong predictions by small perturbations to inputs that are often imperceivable to humans and do not have to carry semantic meaning. Adversarial training poses a partial solution to address this issue by training models on worst-case perturbations. Yet, recent work has also pointed out that the reasoning in neural networks is different from humans. Humans identify objects by shape, while neural nets mainly employ texture cues. Exemplarily, a model trained on photographs will likely fail to generalize to datasets containing sketches. Interestingly, it was also shown that adversarial training seems to favorably increase the shift toward shape bias. In this work, we revisit this observation and provide an extensive analysis of this effect on various architectures, the common L_2-and L_-training, and Transformer-based models. Further, we provide a possible explanation for this phenomenon from a frequency perspective.
Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack.
Project website: https://github.com/paulgavrikov/adversarial_solarization
Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient 1×1 convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning 3×3 convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise ($1\times 1$) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights.
It is common practice to apply padding prior to convolution operations to preserve the resolution of feature-maps in Convolutional Neural Networks (CNN). While many alternatives exist, this is often achieved by adding a border of zeros around the inputs. In this work, we show that adversarial attacks often result in perturbation anomalies at the image boundaries, which are the areas where padding is used. Consequently, we aim to provide an analysis of the interplay between padding and adversarial attacks and seek an answer to the question of how different padding modes (or their absence) affect adversarial robustness in various scenarios.
Currently, many theoretical as well as practically relevant questions towards the transferability and robustness of Convolutional Neural Networks (CNNs) remain unsolved. While ongoing research efforts are engaging these problems from various angles, in most computer vision related cases these approaches can be generalized to investigations of the effects of distribution shifts in image data. In this context, we propose to study the shifts in the learned weights of trained CNN models. Here we focus on the properties of the distributions of dominantly used 3×3 convolution filter kernels. We collected and publicly provide a dataset with over 1.4 billion filters from hundreds of trained CNNs, using a wide range of datasets, architectures, and vision tasks. In a first use case of the proposed dataset, we can show highly relevant properties of many publicly available pre-trained models for practical applications: I) We analyze distribution shifts (or the lack thereof) between trained filters along different axes of meta-parameters, like visual category of the dataset, task, architecture, or layer depth. Based on these results, we conclude that model pre-training can succeed on arbitrary datasets if they meet size and variance conditions. II) We show that many pre-trained models contain degenerated filters which make them less robust and less suitable for fine-tuning on target applications. Data & Project website: https://github.com/paulgavrikov/cnn-filter-db.
Deep learning models are intrinsically sensitive to distribution shifts in the input data. In particular, small, barely perceivable perturbations to the input data can force models to make wrong predictions with high confidence. An common defense mechanism is regularization through adversarial training which injects worst-case perturbations back into training to strengthen the decision boundaries, and to reduce overfitting. In this context, we perform an investigation of 3 × 3 convolution filters that form in adversarially- trained models. Filters are extracted from 71 public models of the ℓ ∞ -RobustBench CIFAR-10/100 and ImageNet1k leaderboard and compared to filters extracted from models built on the same architectures but trained without robust regularization. We observe that adversarially-robust models appear to form more diverse, less sparse, and more orthogonal convolution filters than their normal counterparts. The largest differences between robust and normal models are found in the deepest layers, and the very first convolution layer, which consistently and predominantly forms filters that can partially eliminate perturbations, irrespective of the architecture.
Recent work has investigated the distributions of learned convolution filters through a large-scale study containing hundreds of heterogeneous image models. Surprisingly, on average, the distributions only show minor drifts in comparisons of various studied dimensions including the learned task, image domain, or dataset. However, among the studied image domains, medical imaging models appeared to show significant outliers through "spikey" distributions, and, therefore, learn clusters of highly specific filters different from other domains. Following this observation, we study the collected medical imaging models in more detail. We show that instead of fundamental differences, the outliers are due to specific processing in some architectures. Quite the contrary, for standardized architectures, we find that models trained on medical data do not significantly differ in their filter distributions from similar architectures trained on data from other domains. Our conclusions reinforce previous hypotheses stating that pre-training of imaging models can be done with any kind of diverse image data.
An Empirical Investigation of Model-to-Model Distribution Shifts in Trained Convolutional Filters
(2021)
We present first empirical results from our ongoing investigation of distribution shifts in image data used for various computer vision tasks. Instead of analyzing the original training and test data, we propose to study shifts in the learned weights of trained models. In this work, we focus on the properties of the distributions of dominantly used 3x3 convolution filter kernels. We collected and publicly provide a data set with over half a billion filters from hundreds of trained CNNs, using a wide range of data sets, architectures, and vision tasks. Our analysis shows interesting distribution shifts (or the lack thereof) between trained filters along different axes of meta-parameters, like data type, task, architecture, or layer depth. We argue, that the observed properties are a valuable source for further investigation into a better understanding of the impact of shifts in the input data to the generalization abilities of CNN models and novel methods for more robust transfer-learning in this domain.
Seismic data processing relies on multiples attenuation to improve inversion and interpretation. Radon-based algorithms are often used for multiples and primaries discrimination. Deep learning, based on convolutional neural networks (CNNs), has shown encouraging applications for demultiple that could mitigate Radon-based challenges. In this work, we investigate new strategies to train a CNN for multiples removal based on different loss functions. We propose combined primaries and multiples labels in the loss for training a CNN to predict primaries, multiples, or both simultaneously. Moreover, we investigate two distinctive training methods for all the strategies: UNet based on minimum absolute error (L1) training, and adversarial training (GAN-UNet). We test the trained models with the different strategies and methods on 400 synthetic data. We found that training to predict multiples, including the primaries …