Refine
Document Type
- Conference Proceeding (10) (remove)
Conference Type
- Konferenzartikel (10)
Language
- English (10) (remove)
Has Fulltext
- no (10)
Is part of the Bibliography
- yes (10)
Keywords
- Deep Leaning (3)
- CNN (1)
- Image restoration (1)
- KI-Labor Südbaden (1)
- Künstliche Intelligenz (1)
- Monocular Depth Estimation (1)
- Periodic Table of AI (1)
- Robustness (1)
- Use Case (1)
- curriculum learning (1)
Institute
- IMLA - Institute for Machine Learning and Analytics (10) (remove)
Open Access
- Closed (10) (remove)
Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality
(2023)
Diffusion models recently have been successfully applied for the visual synthesis of strikingly realistic appearing images. This raises strong concerns about their potential for malicious purposes. In this paper, we propose using the lightweight multi Local Intrinsic Dimensionality (multiLID), which has been originally developed in context of the detection of adversarial examples, for the automatic detection of synthetic images and the identification of the according generator networks. In contrast to many existing detection approaches, which often only work for GAN-generated images, the proposed method provides close to perfect detection results in many realistic use cases. Extensive experiments on known and newly created datasets demonstrate that the proposed multiLID approach exhibits superiority in diffusion detection and model identification.Since the empirical evaluations of recent publications on the detection of generated images are often mainly focused on the "LSUN-Bedroom" dataset, we further establish a comprehensive benchmark for the detection of diffusion-generated images, including samples from several diffusion models with different image sizes.The code for our experiments is provided at https://github.com/deepfake-study/deepfake-multiLID.
Following their success in visual recognition tasks, Vision Transformers(ViTs) are being increasingly employed for image restoration. As a few recent works claim that ViTs for image classification also have better robustness properties, we investigate whether the improved adversarial robustness of ViTs extends to image restoration. We consider the recently proposed Restormer model, as well as NAFNet and the "Baseline network" which are both simplified versions of a Restormer. We use Projected Gradient Descent (PGD) and CosPGD for our robustness evaluation. Our experiments are performed on real-world images from the GoPro dataset for image deblurring. Our analysis indicates that contrary to as advocated by ViTs in image classification works, these models are highly susceptible to adversarial attacks. We attempt to find an easy fix and improve their robustness through adversarial training. While this yields a significant increase in robustness for Restormer, results on other networks are less promising. Interestingly, we find that the design choices in NAFNet and Baselines, which were based on iid performance, and not on robust generalization, seem to be at odds with the model robustness.
In this paper we present the concept of the "KI-Labor Südbaden" to support regional companies in the use of AI technologies. The approach is based on the "Periodic Table of AI" and is extended with both new dimensions for sustainability, and the impact of AI on the working environment. It is illustrated on the basis of three real-world use cases: 1. The detection of humans with lowresolution infrared (IR) images for collaborative robotics; 2. The use of machine data from specifically designed vehicles; 3. State-of-the-art Large Language Models (LLMs) applied to internal company documents. We explain the use cases, thereby demonstrating how to apply the Periodic Table of AI to structure AI applications.
This study focuses on the autonomous navigation and mapping of indoor environments using a drone equipped only with a monocular camera and height measurement sensors. A visual SLAM algorithm was employed to generate a preliminary map of the environment and to determine the drone's position within the map. A deep neural network was utilized to generate a depth image from the monocular camera's input, which was subsequently transformed into a point cloud to be projected into the map. By aligning the depth point cloud with the map, 3D occupancy grid maps were constructed by using ray tracing techniques to get a precise depiction of obstacles and the surroundings. Due to the absence of IMU data from the low-cost drone for the SLAM algorithm, the created maps are inherently unscaled. However, preliminary tests with relative navigation in unscaled maps have revealed potential accuracy issues, which can only be overcome by incorporating additional information from the given sensors for scale estimation.
Seismic data processing relies on multiples attenuation to improve inversion and interpretation. Radon-based algorithms are often used for multiples and primaries discrimination. Deep learning, based on convolutional neural networks (CNNs), has shown encouraging applications for demultiple that could mitigate Radon-based challenges. In this work, we investigate new strategies to train a CNN for multiples removal based on different loss functions. We propose combined primaries and multiples labels in the loss for training a CNN to predict primaries, multiples, or both simultaneously. Moreover, we investigate two distinctive training methods for all the strategies: UNet based on minimum absolute error (L1) training, and adversarial training (GAN-UNet). We test the trained models with the different strategies and methods on 400 synthetic data. We found that training to predict multiples, including the primaries …
Seismic data processing involves techniques to deal with undesired effects that occur during acquisition and pre-processing. These effects mainly comprise coherent artefacts such as multiples, non-coherent signals such as electrical noise, and loss of signal information at the receivers that leads to incomplete traces. In this work, we employ a generative solution, since it can explicitly model complex data distributions and hence, yield to a better decision-making process. In particular, we introduce diffusion models for multiple removal. To that end, we run experiments on synthetic and on real data, and we compare the deep diffusion performance with standard algorithms. We believe that our pioneer study not only demonstrates the capability of diffusion models, but also opens the door to future research to integrate generative models in seismic workflows.
An important step in seismic data processing to improve inversion and interpretation is multiples attenuation. Radon-based algorithms are often used for discriminating primaries and multiples. Recently, deep learning (DL), based on convolutional neural networks (CNNs) has shown promising results in demultiple that could mitigate the challenges of Radon-based methods. In this work, we investigate new different strategies to train a CNN for multiples removal based on different loss functions. We propose combined primaries and multiples labels in the loss for training a CNN to predict primaries, multiples, or both simultaneously. We evaluate the performance of the CNNs trained with the different strategies on 400 clean and noisy synthetic data, considering 3 metrics. We found that training a CNN to predict the multiples and then subtracting them from the input image is the most effective strategy for demultiple. Furthermore, including the primaries labels as a constraint during the training of multiples prediction improves the results. Finally, we test the strategies on a field dataset. The CNNs trained with different strategies report competitive results on real data compared with Radon demultiple. As a result, effectively trained CNN models can potentially replace Radon-based demultiple in existing workflows.
In many application areas, Deep Reinforcement Learning (DRL) has led to breakthroughs. In Curriculum Learning, the Machine Learning algorithm is not randomly presented with examples, but in a meaningful order of increasing difficulty. This has been used in many application areas to further improve the results of learning systems or to reduce their learning time. Such approaches range from learning plans created manually by domain experts to those created automatically. The automated creation of learning plans is one of the biggest challenges.In this work, we investigate an approach in which a trainer learns in parallel and analogously to the student to automatically create a learning plan for the student during this Double Deep Reinforcement Learning (DDRL). Three Reward functions, Friendly, Adversarial, and Dynamic based on the learner’s reward are compared. The domain for evaluation is kicking with variable distance, direction and relative ball position in the SimSpark simulated soccer environment.As a result, Statistic Curriculum Learning (SCL) performs better than a random curriculum with respect to training time and result quality. DDRL reaches a comparable quality as the baseline and outperforms it significantly in shorter trainings in the distance-direction subdomain reducing the number of required training cycles by almost 50%.
Currently, many theoretical as well as practically relevant questions towards the transferability and robustness of Convolutional Neural Networks (CNNs) remain unsolved. While ongoing research efforts are engaging these problems from various angles, in most computer vision related cases these approaches can be generalized to investigations of the effects of distribution shifts in image data. In this context, we propose to study the shifts in the learned weights of trained CNN models. Here we focus on the properties of the distributions of dominantly used 3×3 convolution filter kernels. We collected and publicly provide a dataset with over 1.4 billion filters from hundreds of trained CNNs, using a wide range of datasets, architectures, and vision tasks. In a first use case of the proposed dataset, we can show highly relevant properties of many publicly available pre-trained models for practical applications: I) We analyze distribution shifts (or the lack thereof) between trained filters along different axes of meta-parameters, like visual category of the dataset, task, architecture, or layer depth. Based on these results, we conclude that model pre-training can succeed on arbitrary datasets if they meet size and variance conditions. II) We show that many pre-trained models contain degenerated filters which make them less robust and less suitable for fine-tuning on target applications. Data & Project website: https://github.com/paulgavrikov/cnn-filter-db.
Deep learning models are intrinsically sensitive to distribution shifts in the input data. In particular, small, barely perceivable perturbations to the input data can force models to make wrong predictions with high confidence. An common defense mechanism is regularization through adversarial training which injects worst-case perturbations back into training to strengthen the decision boundaries, and to reduce overfitting. In this context, we perform an investigation of 3 × 3 convolution filters that form in adversarially- trained models. Filters are extracted from 71 public models of the ℓ ∞ -RobustBench CIFAR-10/100 and ImageNet1k leaderboard and compared to filters extracted from models built on the same architectures but trained without robust regularization. We observe that adversarially-robust models appear to form more diverse, less sparse, and more orthogonal convolution filters than their normal counterparts. The largest differences between robust and normal models are found in the deepest layers, and the very first convolution layer, which consistently and predominantly forms filters that can partially eliminate perturbations, irrespective of the architecture.