Refine
Document Type
- Conference Proceeding (213)
- Article (reviewed) (128)
- Bachelor Thesis (64)
- Master's Thesis (34)
- Article (unreviewed) (27)
- Patent (23)
- Letter to Editor (16)
- Book (12)
- Contribution to a Periodical (12)
- Doctoral Thesis (12)
Conference Type
- Konferenzartikel (183)
- Konferenz-Abstract (21)
- Sonstiges (5)
- Konferenz-Poster (2)
- Konferenzband (2)
Language
- English (373)
- German (189)
- Other language (1)
- Multiple languages (1)
- Russian (1)
Keywords
- Machine Learning (15)
- RoboCup (13)
- Deep Leaning (11)
- Götz von Berlichingen (11)
- neuroprosthetics (8)
- Deep learning (7)
- Medizintechnik (7)
- Blockchain (6)
- Internet of Things (6)
- Regelungstechnik (6)
Institute
- Fakultät Elektrotechnik, Medizintechnik und Informatik (EMI) (ab 04/2019) (565) (remove)
Open Access
- Open Access (251)
- Closed Access (181)
- Closed (121)
- Bronze (54)
- Diamond (31)
- Gold (29)
- Grün (3)
- Hybrid (3)
Introduction: Subjects with mild to moderate hearing loss today often receive hearing aids (HA) with open-fitting (OF). In OF, direct sound reaches the eardrums with minimal damping. Due to the required processing delay in digital HA, the amplified HA sound follows some milliseconds later. This process occurs in both ears symmetrically in bilateral HA provision and is likely to have no or minor detrimental effect on binaural hearing. However, the delayed and amplified sound are only present in one ear in cases of unilateral hearing loss provided with one HA. This processing alters interaural timing differences in the resulting ear signals.
Methods: In the present study, an experiment with normal-hearing subjects to investigate speech intelligibility in noise with direct and delayed sound was performed to mimic unilateral and bilateral HA provision with OF.
Results: The outcomes reveal that these delays affect speech reception thresholds (SRT) in the unilateral OF simulation when presenting speech and noise from different spatial directions. A significant decrease in the median SRT from –18.1 to –14.7 dB SNR is observed when typical HA processing delays are applied. On the other hand, SRT was independent of the delay between direct and delayed sound in the bilateral OF simulation.
Discussion: The significant effect emphasizes the development of rapid processing algorithms for unilateral HA provision.
Virtual-Reality
(2023)
Die Virtual-Reality (VR) Technologie ermöglicht Unternehmen eine Produktpräsentation, die weit über traditionelle Darstellungsmethoden hinausgeht. Obgleich die Integration der VR-Technologie für Unternehmen viele Chancen eröffnet, ist deren Einsatz auch mit Risiken verbunden. Insbesondere der Mangel an empirisch gesicherten Erkenntnissen zur Kundenakzeptanz, zu den Auswirkungen der Nutzung sowie zu Kannibalisierungseffekten ist ein wesentlicher Grund, der die Verbreitung von VR in der Kundenkommunikation noch hemmt. Das Buch adressiert diese Forschungslücken und identifiziert mittels eines nutzerzentrierten, quantitativen Forschungsdesigns konkrete Chancen und Risiken, die mit dem Einsatz von VR-Produktpräsentationen verbunden sind.
This paper presents the new Deep Reinforcement Learning (DRL) library RL-X and its application to the RoboCup Soccer Simulation 3D League and classic DRL benchmarks. RL-X provides a flexible and easy-to-extend codebase with self-contained single directory algorithms. Through the fast JAX-based implementations, RL-X can reach up to 4.5x speedups compared to well-known frameworks like Stable-Baselines3.
The use of artificial intelligence continues to impact a broad variety of domains, application areas, and people. However, interpretability, understandability, responsibility, accountability, and fairness of the algorithms' results - all crucial for increasing humans' trust into the systems - are still largely missing. The purpose of this seminar is to understand how these components factor into the holistic view of trust. Further, this seminar seeks to identify design guidelines and best practices for how to build interactive visualization systems to calibrate trust.
With the rising necessity of explainable artificial intelligence (XAI), we see an increase in task-dependent XAI methods on varying abstraction levels. XAI techniques on a global level explain model behavior and on a local level explain sample predictions. We propose a visual analytics workflow to support seamless transitions between global and local explanations, focusing on attributions and counterfactuals on time series classification. In particular, we adapt local XAI techniques (attributions) that are developed for traditional datasets (images, text) to analyze time series classification, a data type that is typically less intelligible to humans. To generate a global overview, we apply local attribution methods to the data, creating explanations for the whole dataset. These explanations are projected onto two dimensions, depicting model behavior trends, strategies, and decision boundaries. To further inspect the model decision-making as well as potential data errors, a what-if analysis facilitates hypothesis generation and verification on both the global and local levels. We constantly collected and incorporated expert user feedback, as well as insights based on their domain knowledge, resulting in a tailored analysis workflow and system that tightly integrates time series transformations into explanations. Lastly, we present three use cases, verifying that our technique enables users to (1)~explore data transformations and feature relevance, (2)~identify model behavior and decision boundaries, as well as, (3)~the reason for misclassifications.
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise ($1\times 1$) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights.
In the past ten years, applications of artificial neural networks have changed dramatically. outperforming earlier predictions in domains like robotics, computer vision, natural language processing, healthcare, and finance. Future research and advancements in CNN architectures, Algorithms and applications are expected to revolutionize various industries and daily life further. Our task is to find current products that resemble the given product image and description. Deep learning-based automatic product identification is a multi-step process that starts with data collection and continues with model training, deployment, and continuous improvement. The caliber and variety of the dataset, the design selected, and ongoing testing and improvement all affect the model's effectiveness. We achieved 81.47% training accuracy and 72.43% validation accuracy for our combined text and image classification model. Additionally, we have discussed the outcomes from the other dataset and numerous methods for creating an appropriate model.
An international study summarizes the threat situation in the OT environment under the heading "Growing security threats" [1]. According to this study, attacks on automation systems are likely to increase in the future. Accordingly, an automation system must be able to protect the integrity of the transmitted information in the future. This requirement is motivated, among other things, by the fact that the network-side isolation of industrial communication systems is no longer considered sufficient as the sole protective measure. This paper uses the example of PROFINET to show how the future requirements for a real-time communication protocol can be met and how they can be derived from the IEC 62443 standard.
Analysing and predicting the advance rate of a tunnel boring machine (TBM) in hard rock is integral to tunnelling project planning and execution. It has been applied in the industry for several decades with varying success. Most prediction models are based on or designed for large-diameter TBMs, and much research has been conducted on related tunnelling projects. However, only a few models incorporate information from projects with an outer diameter smaller than 5 m and no penetration prediction model for pipe jacking machines exists to date. In contrast to large TBMs, small-diameter TBMs and their projects have been considered little in research. In general, they are characterised by distinctive features, including insufficient geotechnical information, sometimes rather short drive lengths, special machine designs and partially concurring lining methods like pipe jacking and segment lining. A database which covers most of the parameters mentioned above has been compiled to investigate the performance of small-diameter TBMs in hard rock. In order to provide sufficient geological and technical variance, this database contains 37 projects with 70 geotechnically homogeneous areas. Besides the technical parameters, important geotechnical data like lithological information, unconfined compressive strength, tensile strength and point load index is included and evaluated. The analysis shows that segment lining TBMs have considerably higher penetration rates in similar geological and technical settings mostly due to their design parameters. Different methodologies for predicting TBM penetration, including state-of-the-art models from the literature as well as newly derived regression and machine learning models, are discussed and deployed for backward modelling of the projects contained in the database. New ranges of application for small-diameter tunnelling in several industry-standard penetration models are presented, and new approaches for the penetration prediction of pipe jacking machines in hard rock are proposed.
Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient 1×1 convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning 3×3 convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.
We have developed a methodology for the systematic generation of a large image dataset of macerated wood references, which we used to generate image data for nine hardwood genera. This is the basis for a substantial approach to automate, for the first time, the identification of hardwood species in microscopic images of fibrous materials by deep learning. Our methodology includes a flexible pipeline for easy annotation of vessel elements. We compare the performance of different neural network architectures and hyperparameters. Our proposed method performs similarly well to human experts. In the future, this will improve controls on global wood fiber product flows to protect forests.
Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling
(2023)
Convolutional neural networks encode images through a sequence of convolutions, normalizations and non-linearities as well as downsampling operations into potentially strong semantic embeddings. Yet, previous work showed that even slight mistakes during sampling, leading to aliasing, can be directly attributed to the networks' lack in robustness. To address such issues and facilitate simpler and faster adversarial training, [12] recently proposed FLC pooling, a method for provably alias-free downsampling - in theory. In this work, we conduct a further analysis through the lens of signal processing and find that such current pooling methods, which address aliasing in the frequency domain, are still prone to spectral leakage artifacts. Hence, we propose aliasing and spectral artifact-free pooling, short ASAP. While only introducing a few modifications to FLC pooling, networks using ASAP as downsampling method exhibit higher native robustness against common corruptions, a property that FLC pooling was missing. ASAP also increases native robustness against adversarial attacks on high and low resolution data while maintaining similar clean accuracy or even outperforming the baseline.
Motivated by the recent trend towards the usage of larger receptive fields for more context-aware neural networks in vision applications, we aim to investigate how large these receptive fields really need to be. To facilitate such study, several challenges need to be addressed, most importantly: (i) We need to provide an effective way for models to learn large filters (potentially as large as the input data) without increasing their memory consumption during training or inference, (ii) the study of filter sizes has to be decoupled from other effects such as the network width or number of learnable parameters, and (iii) the employed convolution operation should be a plug-and-play module that can replace any conventional convolution in a Convolutional Neural Network (CNN) and allow for an efficient implementation in current frameworks. To facilitate such models, we propose to learn not spatial but frequency representations of filter weights as neural implicit functions, such that even infinitely large filters can be parameterized by only a few learnable weights. The resulting neural implicit frequency CNNs are the first models to achieve results on par with the state-of-the-art on large image classification benchmarks while executing convolutions solely in the frequency domain and can be employed within any CNN architecture. They allow us to provide an extensive analysis of the learned receptive fields. Interestingly, our analysis shows that, although the proposed networks could learn very large convolution kernels, the learned filters practically translate into well-localized and relatively small convolution kernels in the spatial domain.
Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack.
Project website: https://github.com/paulgavrikov/adversarial_solarization
Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms.
Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.
Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality
(2023)
Diffusion models recently have been successfully applied for the visual synthesis of strikingly realistic appearing images. This raises strong concerns about their potential for malicious purposes. In this paper, we propose using the lightweight multi Local Intrinsic Dimensionality (multiLID), which has been originally developed in context of the detection of adversarial examples, for the automatic detection of synthetic images and the identification of the according generator networks. In contrast to many existing detection approaches, which often only work for GAN-generated images, the proposed method provides close to perfect detection results in many realistic use cases. Extensive experiments on known and newly created datasets demonstrate that the proposed multiLID approach exhibits superiority in diffusion detection and model identification.Since the empirical evaluations of recent publications on the detection of generated images are often mainly focused on the "LSUN-Bedroom" dataset, we further establish a comprehensive benchmark for the detection of diffusion-generated images, including samples from several diffusion models with different image sizes.The code for our experiments is provided at https://github.com/deepfake-study/deepfake-multiLID.
Die Erfindung betrifft in einem ersten Aspekt eine Vorrichtung zur transkutanen Aufbringung eines elektrischen Stimulationsreizes auf ein Ohr. Die Vorrichtung umfasst einen Schaltungsträger, mindestens zwei Elektroden sowie eine Steuerungseinheit, wobei die Steuerungseinheit dazu konfiguriert ist, anhand von Stimulationsparametern ein elektrisches Stimulationssignal an den Elektroden zu erzeugen. Dabei ist die Vorrichtung, insbesondere eine Oberfläche des Schaltungsträgers der Vorrichtung, auf eine anatomische Form eines Ohres angepasst, sodass Elektroden auf der Oberfläche des Schaltungsträgers aufgebracht sind und ausgewählte Bereiche des Ohres kontaktieren Die Vorrichtung ist dadurch kennzeichnet, dass diese weiterhin einen Sensor zur Erkennung mindestens eines physiologischen Parameter umfasst und eine Steuerungseinheit dazu konfiguriert ist, anhand des mindestens einen physiologischen Parameters die Stimulationsparameter für den Stimulationsreiz anzupassen.In einem weiteren Aspekt betrifft die Erfindung ein Verfahren zur Herstellung der erfindungsgemäßen Vorrichtung.
Die Erfindung betrifft ein Verfahren zum Maximieren der von einer analogen Entropiequelle abgeleiteten Entropie, wobei das Verfahren folgende Schritte aufweist:- Bereitstellen von Eingabedaten für die analoge Entropiequelle (2);- Erzeugen von Rückgabewerten durch die analoge Entropiequelle basierend auf den Eingabedaten (3); und- Gruppieren der Rückgabewerte, wobei das Gruppieren der Rückgabewerte ein Anwenden von Versätzen auf Rückgabewerte aufweist (4).