Refine
Document Type
- Conference Proceeding (57)
- Article (unreviewed) (21)
- Article (reviewed) (19)
- Letter to Editor (11)
- Doctoral Thesis (6)
- Patent (4)
- Moving Images (1)
- Other (1)
- Report (1)
Conference Type
- Konferenzartikel (38)
- Konferenz-Abstract (13)
- Sonstiges (5)
- Konferenz-Poster (1)
Language
- English (121) (remove)
Has Fulltext
- no (121) (remove)
Is part of the Bibliography
- yes (121)
Keywords
- RoboCup (12)
- Machine Learning (9)
- Deep Leaning (6)
- Generative Adversarial Network (3)
- Robustness (3)
- deep learning (3)
- Aliasing (2)
- CNNs (2)
- Cardiac Resynchronization Therapy (2)
- Computer Vision (2)
Institute
- Fakultät Elektrotechnik, Medizintechnik und Informatik (EMI) (ab 04/2019) (121) (remove)
Open Access
- Open Access (121) (remove)
With the expansion of IoT devices in many aspects of our life, the security of such systems has become an important challenge. Unlike conventional computer systems, any IoT security solution should consider the constraints of these systems such as computational capability, memory, connectivity, and power consumption limitations. Physical Unclonable Functions (PUFs) with their special characteristics were introduced to satisfy the security needs while respecting the mentioned constraints. They exploit the uncontrollable and reproducible variations of the underlying component for security applications such as identification, authentication, and communication security. Since IoT devices are typically low cost, it is important to reuse existing elements in their hardware (for instance sensors, ADCs, etc.) instead of adding extra costs for the PUF hardware. Micro-electromechanical system (MEMS) devices are widely used in IoT systems as sensors and actuators. In this thesis, a comprehensive study of the potential application of MEMS devices as PUF primitives is provided. MEMS PUF leverages the uncontrollable variations in the parameters of MEMS elements to derive secure keys for cryptographic applications. Experimental and simulation results show that our proposed MEMS PUFs are capable of generating enough entropy for a complex key generation, while their responses show low fluctuations in different environmental conditions.
Keeping in mind that the PUF responses are prone to change in the presence of noise and environmental variations, it is critical to derive reliable keys from the PUF and to use the maximum entropy at the same time. In the second part of this thesis, we elaborate on different key generation schemes and their advantages and drawbacks. We propose the PUF output positioning (POP) and integer linear programming (ILP) methods, which are novel methods for grouping the PUF outputs in order to maximize the extracted entropy. To implement these methods, the key enrollment and key generation algorithms are presented. The proposed methods are then evaluated by applying on the responses of the MEMS PUF, where it can be practically shown that the proposed method outperforms other existing PUF key generation methods.
The final part of this thesis is dedicated to the application of the MEMS PUF as a security solution for IoT systems. We select the mutual authentication of IoT devices and their backend system, and propose two lightweight authentication protocols based on MEMS PUFs. The presented protocols undergo a comprehensive security analysis to show their eligibility to be used in IoT systems. As the result, the output of this thesis is a lightweight security solution based on MEMS PUFs, which introduces a very low overhead on the cost of the hardware.
Ultra-low-power passive telemetry systems for industrial and biomedical applications have gained much popularity lately. The reduction of the power consumption and size of the circuits poses critical challenges in ultra-low-power circuit design. Biotelemetry applications like leakage detection in silicone breast implants require low-power-consuming small-size electronics. In this doctoral thesis, the design, simulation, and measurement of a programmable mixed-signal System-on-Chip (SoC) called General Application Passive Sensor Integrated Circuit (GAPSIC) is presented. Owing to the low power consumption, GAPSIC is capable of completely passive operation. Such a batteryless passive system has lower maintenance complexity and is also free from battery-related health hazards. With a die area of 4.92 mm² and a maximum analog power consumption of 592 µW, GAPSIC has one of the best figure-of-merits compared to similar state-of-the-art SoCs. Regarding possible applications, GAPSIC can read out and digitally transmit the signals of resistive sensors for pressure or temperature measurements. Additionally, GAPSIC can measure electrocardiogram (ECG) signals and conductivity.
The design of GAPSIC complies with the International Organization for Standardization (ISO) 15693/NFC (near field communication) 5 standard for radio frequency identification (RFID), corresponding to the frequency range of 13.56 MHz. A passive transponder developed with GAPSIC comprises of an external memory storage and very few other external components, like an antenna and sensors. The passive tag antenna and reader antenna use inductive coupling for communication and energy transfer, which enables passive operation. A passive tag developed with GAPSIC can communicate with an NFC compatible smart device or an ISO 15693 RFID reader. An external memory storage contains the programmable application-specific firmware.
As a mixed-signal SoC, GAPSIC includes both analog and digital circuitries. The analog block of GAPSIC includes a power management unit, an RFID/NFC communication unit, and a sensor readout unit. The digital block includes an integrated 32-bit microcontroller, developed by the Hochschule Offenburg ASIC design center, and digital peripherals. A 16-kilobyte random-access memory and a read-only 16-kilobyte memory constitute the GAPSIC internal memory. For the fabrication of GAPSIC, one poly, six-metal 0.18 µm CMOS process is used.
The design of GAPSIC includes two stages. In the first stage, a standalone RFID/NFC frontend chip with a power management unit, an RFID/NFC communication unit, a clock regenerator unit, and a field detector unit was designed. In the second stage, the rest of the functional blocks were integrated with the blocks of the RFID/NFC frontend chip for the final integration of GAPSIC. To reduce the power consumption, conventional low-power design techniques were applied extensively like multiple power supplies, and the operation of complementary metal-oxide-semiconductor (CMOS) transistors in the sub-threshold region of operation, as well as further innovative circuit designs.
An overvoltage protection circuit, a power rectifier, a bandgap reference circuit, and two low-dropout (LDO) voltage regulators constitute the power management unit of GAPSIC. The overvoltage protection circuit uses a novel method where three stacked transistor pairs shunt the extra voltage. In the power rectifier, four rectifier units are arranged in parallel, which is a unique approach. The four parallel rectifier units provide the optimal choice in terms of voltage drop and the area required.
The communication unit is responsible for RFID/NFC communication and incorporates demodulation and load modulation circuitry. The demodulator circuit comprises of an envelope detector, a high-pass filter, and a comparator. Following a new approach, the bandgap reference circuit itself acts as the load for the envelope detector circuit, which minimizes the circuit complexity and area. For the communication between the reader and the RFID/NFC tag, amplitude-shift keying (ASK) is used to modulate signals, where the smallest modulation index can be as low as 10%. A novel technique involving a comparator with a preset offset voltage effectively demodulates the ASK signal. With an effective die area of 0.7 mm² and power consumption of 107 µW, the standalone RFID/NFC frontend chip has the best figure-of-merits compared to the state-of-the-art frontend chips reported in the relevant literature. A passive RFID/NFC tag developed with the standalone frontend chip, as well as temperature and pressure sensors demonstrate the full passive operational capability of the frontend chip. An NFC reader device using a custom-built Android-based application software reads out the sensor data from the passive tag.
The sensor readout circuit consists of a channel selector with two differential and four single-ended inputs with a programmable-gain instrumentation amplifier. The entire sensor readout part remains deactivated when not in use. The internal memory stores the measured offset voltage of the instrumentation amplifier, where a firmware code removes the offset voltage from the measured sensor signal. A 12-bit successive approximation register (SAR) type analog-to-digital-converter (ADC) based on a charge redistribution architecture converts the measured sensor data to a digital value. The digital peripherals include a serial peripheral interface, four timers, RFID/NFC interfaces, sensor readout unit interfaces, and 12-bit SAR logic.
Two sets of studies with custom-made NFC tag antennas for biomedical applications were conducted to ascertain their compatibility with GAPSIC. The first study involved the link efficiency measurements of NFC tag antennas and an NFC reader antenna with porcine tissue. In a separate experiment, the effect of a ferrite compared to air core on the antenna-coupling factor was investigated. With the ferrite core, the coupling factor increased by four times.
Among the state-of-the-art SoCs published in recent scientific articles, GAPSIC is the only passive programmable SoC with a power management unit, an RFID/NFC communication interface, a sensor readout circuit, a 12-bit SAR ADC, and an integrated 32-bit microcontroller. This doctoral research includes the preliminary study of three passive RFID tags designed with discrete components for biomedical and industrial applications like measurements of temperature, pH, conductivity, and oxygen concentration, along with leakage detection in silicone breast implants. Besides its small size and low power consumption, GAPSIC is suitable for each of the biomedical and industrial applications mentioned above due to the integrated high-performance microcontroller, the robust programmable instrumentation amplifier, and the 12-bit analog-to-digital converter. Furthermore, the simulation and measurement data show that GAPSIC is well suited for the design of a passive tag to monitor arterial blood pressure in patients experiencing Peripheral Artery Disease (PAD), which is proposed in this doctoral thesis as an exemplary application of the developed system.
Team description papers of magmaOffenburg are incremental in the sense that each year we address a different topic of our team and the tools around our team. In this year’s team description paper we focus on the architecture of the software. It is a main factor for being able to keep the code maintainable even after 15 years of development. We also describe how we make sure that the code follows this architecture.
This paper presents the new Deep Reinforcement Learning (DRL) library RL-X and its application to the RoboCup Soccer Simulation 3D League and classic DRL benchmarks. RL-X provides a flexible and easy-to-extend codebase with self-contained single directory algorithms. Through the fast JAX-based implementations, RL-X can reach up to 4.5x speedups compared to well-known frameworks like Stable-Baselines3.
A circuit arrangement of a motor vehicle includes a high-voltage battery for storing electrical energy, an electric machine for driving the motor vehicle, a converter via which high-voltage direct current voltage provided by the high-voltage battery is convertible into high-voltage alternating current voltage for operating the electric machine, and a charging connection for providing electrical energy for charging the high-voltage battery. The converter is a three-stage converter having a first switch unit which is assigned to a first phase of the electric machine. The first switch unit has two switch groups connected in series which each have two insulated-gate bipolar transistors (IGBTs) connected in series, where a connection is disposed between the IGBTs of one of the two switch groups, which connection is electrically connected directly to a line of the charging connection.
Printed electrolyte-gated oxide electronics is an emerging electronic technology in the low voltage regime (≤1 V). Whereas in the past mainly dielectrics have been used for gating the transistors, many recent approaches employ the advantages of solution processable, solid polymer electrolytes, or ion gels that provide high gate capacitances produced by a Helmholtz double layer, allowing for low-voltage operation. Herein, with special focus on work performed at KIT recent advances in building electronic circuits based on indium oxide, n-type electrolyte-gated field-effect transistors (EGFETs) are reviewed. When integrated into ring oscillator circuits a digital performance ranging from 250 Hz at 1 V up to 1 kHz is achieved. Sequential circuits such as memory cells are also demonstrated. More complex circuits are feasible but remain challenging also because of the high variability of the printed devices. However, the device inherent variability can be even exploited in security circuits such as physically unclonable functions (PUFs), which output a reliable and unique, device specific, digital response signal. As an overall advantage of the technology all the presented circuits can operate at very low supply voltages (0.6 V), which is crucial for low-power printed electronics applications.
Due to its performance, the field of deep learning has gained a lot of attention, with neural networks succeeding in areas like Computer Vision (CV), Neural Language Processing (NLP), and Reinforcement Learning (RL). However, high accuracy comes at a computational cost as larger networks require longer training time and no longer fit onto a single GPU. To reduce training costs, researchers are looking into the dynamics of different optimizers, in order to find ways to make training more efficient. Resource requirements can be limited by reducing model size during training or designing more efficient models that improve accuracy without increasing network size.
This thesis combines eigenvalue computation and high-dimensional loss surface visualization to study different optimizers and deep neural network models. Eigenvectors of different eigenvalues are computed, and the loss landscape and optimizer trajectory are projected onto the plane spanned by those eigenvectors. A new parallelization method for the stochastic Lanczos method is introduced, resulting in faster computation and thus enabling high-resolution videos of the trajectory and secondorder information during neural network training. Additionally, the thesis presents the loss landscape between two minima along with the eigenvalue density spectrum at intermediate points for the first time.
Secondly, this thesis presents a regularization method for Generative Adversarial Networks (GANs) that uses second-order information. The gradient during training is modified by subtracting the eigenvector direction of the biggest eigenvalue, preventing the network from falling into the steepest minima and avoiding mode collapse. The thesis also shows the full eigenvalue density spectra of GANs during training.
Thirdly, this thesis introduces ProxSGD, a proximal algorithm for neural network training that guarantees convergence to a stationary point and unifies multiple popular optimizers. Proximal gradients are used to find a closed-form solution to the problem of training neural networks with smooth and non-smooth regularizations, resulting in better sparsity and more efficient optimization. Experiments show that ProxSGD can find sparser networks while reaching the same accuracy as popular optimizers.
Lastly, this thesis unifies sparsity and neural architecture search (NAS) through the framework of group sparsity. Group sparsity is achieved through ℓ2,1-regularization during training, allowing for filter and operation pruning to reduce model size with minimal sacrifice in accuracy. By grouping multiple operations together, group sparsity can be used for NAS as well. This approach is shown to be more robust while still achieving competitive accuracies compared to state-of-the-art methods
In this paper, we propose a unified approach for network pruning and one-shot neural architecture search (NAS) via group sparsity. We first show that group sparsity via the recent Proximal Stochastic Gradient Descent (ProxSGD) algorithm achieves new state-of-the-art results for filter pruning. Then, we extend this approach to operation pruning, directly yielding a gradient-based NAS method based on group sparsity. Compared to existing gradient-based algorithms such as DARTS, the advantages of this new group sparsity approach are threefold. Firstly, instead of a costly bilevel optimization problem, we formulate the NAS problem as a single-level optimization problem, which can be optimally and efficiently solved using ProxSGD with convergence guarantees. Secondly, due to the operation-level sparsity, discretizing the network architecture by pruning less important operations can be safely done without any performance degradation. Thirdly, the proposed approach finds architectures that are both stable and well-performing on a variety of search spaces and datasets.
We demonstrate how to exploit group sparsity in order to bridge the areas of network pruning and neural architecture search (NAS). This results in a new one-shot NAS optimizer that casts the problem as a single-level optimization problem and does not suffer any performance degradation from discretizating the architecture.
The use of artificial intelligence continues to impact a broad variety of domains, application areas, and people. However, interpretability, understandability, responsibility, accountability, and fairness of the algorithms' results - all crucial for increasing humans' trust into the systems - are still largely missing. The purpose of this seminar is to understand how these components factor into the holistic view of trust. Further, this seminar seeks to identify design guidelines and best practices for how to build interactive visualization systems to calibrate trust.
Dissertation D. Dongol
Sweaty has already participated several times in RoboCup soccer competitions (Adult Size). Now the work is focused on stabilizing the gait. Moreover, we would like to overcome the constraints of a ZMP-algorithm that has a horizontal footplate as precondition for the simplification of the equations. In addition we would like to switch between impedance and position control with a fuzzy-like algorithm that might help to minimize jerks when Sweaty’s feet touch the ground.
Sweaty has already participated four times in RoboCup soccer competitions (Adult Size) and came second three times. While 2016 Sweaty needed a lot of luck to be finalist, 2017 Sweaty was a serious adversary in the preliminary rounds. In 2018 Sweaty showed up in the final with some lack of experience and room for improvements, but not without any chance. This paper describes the intended improvements of the humanoid adult size robot Sweaty in order to qualify for the RoboCup 2019 adult size competition.
Generative adversarial networks (GANs) provide state-of-the-art results in image generation. However, despite being so powerful, they still remain very challenging to train. This is in particular caused by their highly non-convex optimization space leading to a number of instabilities. Among them, mode collapse stands out as one of the most daunting ones. This undesirable event occurs when the model can only fit a few modes of the data distribution, while ignoring the majority of them. In this work, we combat mode collapse using second-order gradient information. To do so, we analyse the loss surface through its Hessian eigenvalues, and show that mode collapse is related to the convergence towards sharp minima. In particular, we observe how the eigenvalues of the G are directly correlated with the occurrence of mode collapse. Finally, motivated by these findings, we design a new optimization algorithm called nudged-Adam (NuGAN) that uses spectral information to overcome mode collapse, leading to empirically more stable convergence properties.
Generative adversarial networks (GANs) provide state-of-the-art results in image generation. However, despite being so powerful, they still remain very challenging to train. This is in particular caused by their highly non-convex optimization space leading to a number of instabilities. Among them, mode collapse stands out as one of the most daunting ones. This undesirable event occurs when the model can only fit a few modes of the data distribution, while ignoring the majority of them. In this work, we combat mode collapse using second-order gradient information. To do so, we analyse the loss surface through its Hessian eigenvalues, and show that mode collapse is related to the convergence towards sharp minima. In particular, we observe how the eigenvalues of the are directly correlated with the occurrence of mode collapse. Finally, motivated by these findings, we design a new optimization algorithm called nudged-Adam (NuGAN) that uses spectral information to overcome mode collapse, leading to empirically more stable convergence properties.
Generative adversarial networks are the state of the art approach towards learned synthetic image generation. Although early successes were mostly unsupervised, bit by bit, this trend has been superseded by approaches based on labelled data. These supervised methods allow a much finer-grained control of the output image, offering more flexibility and stability. Nevertheless, the main drawback of such models is the necessity of annotated data. In this work, we introduce an novel framework that benefits from two popular learning techniques, adversarial training and representation learning, and takes a step towards unsupervised conditional GANs. In particular, our approach exploits the structure of a latent space (learned by the representation learning) and employs it to condition the generative model. In this way, we break the traditional dependency between condition and label, substituting the latter by unsupervised features coming from the latent space. Finally, we show that this new technique is able to produce samples on demand keeping the quality of its supervised counterpart.
Generative adversarial networks are the state of the art approach towards learned synthetic image generation. Although early successes were mostly unsupervised, bit by bit, this trend has been superseded by approaches based on labelled data. These supervised methods allow a much finer-grained control of the output image, offering more flexibility and stability. Nevertheless, the main drawback of such models is the necessity of annotated data. In this work, we introduce an novel framework that benefits from two popular learning techniques, adversarial training and representation learning, and takes a step towards unsupervised conditional GANs. In particular, our approach exploits the structure of a latent space (learned by the representation learning) and employs it to condition the generative model. In this way, we break the traditional dependency between condition and label, substituting the latter by unsupervised features coming from the latent space. Finally, we show that this new technique is able to produce samples on demand keeping the quality of its supervised counterpart.
Facial image manipulation is a generation task where the output face is shifted towards an intended target direction in terms of facial attribute and styles. Recent works have achieved great success in various editing techniques such as style transfer and attribute translation. However, current approaches are either focusing on pure style transfer, or on the translation of predefined sets of attributes with restricted interactivity. To address this issue, we propose FacialGAN, a novel framework enabling simultaneous rich style transfers and interactive facial attributes manipulation. While preserving the identity of a source image, we transfer the diverse styles of a target image to the source image. We then incorporate the geometry information of a segmentation mask to provide a fine-grained manipulation of facial attributes. Finally, a multi-objective learning strategy is introduced to optimize the loss of each specific tasks. Experiments on the CelebA-HQ dataset, with CelebAMask-HQ as semantic mask labels, show our model’s capacity in producing visually compelling results in style transfer, attribute manipulation, diversity and face verification. For reproducibility, we provide an interactive open-source tool to perform facial manipulations, and the Pytorch implementation of the model.
A fundamental and still largely unsolved question in the context of Generative Adversarial Networks is whether they are truly able to capture the real data distribution and, consequently, to sample from it. In particular, the multidimensional nature of image distributions leads to a complex evaluation of the diversity of GAN distributions. Existing approaches provide only a partial understanding of this issue, leaving the question unanswered. In this work, we introduce a loop-training scheme for the systematic investigation of observable shifts between the distributions of real training data and GAN generated data. Additionally, we introduce several bounded measures for distribution shifts, which are both easy to compute and to interpret. Overall, the combination of these methods allows an explorative investigation of innate limitations of current GAN algorithms. Our experiments on different data-sets and multiple state-of-the-art GAN architectures show large shifts between input and output distributions, showing that existing theoretical guarantees towards the convergence of output distributions appear not to be holding in practice.
Generative convolutional deep neural networks, e.g. popular GAN architectures, are relying on convolution based up-sampling methods to produce non-scalar outputs like images or video sequences. In this paper, we show that common up-sampling methods, i.e. known as up-convolution or transposed convolution, are causing the inability of such models to reproduce spectral distributions of natural training data correctly. This effect is independent of the underlying architecture and we show that it can be used to easily detect generated data like deepfakes with up to 100% accuracy on public benchmarks. To overcome this drawback of current generative models, we propose to add a novel spectral regularization term to the training optimization objective. We show that this approach not only allows to train spectral consistent GANs that are avoiding high frequency errors. Also, we show that a correct approximation of the frequency spectrum has positive effects on the training stability and output quality of generative networks.
Deep generative models have recently achieved impressive results for many real-world applications, successfully generating high-resolution and diverse samples from complex datasets. Due to this improvement, fake digital contents have proliferated growing concern and spreading distrust in image content, leading to an urgent need for automated ways to detect these AI-generated fake images.
Despite the fact that many face editing algorithms seem to produce realistic human faces, upon closer examination, they do exhibit artifacts in certain domains which are often hidden to the naked eye. In this work, we present a simple way to detect such fake face images - so-called DeepFakes. Our method is based on a classical frequency domain analysis followed by basic classifier. Compared to previous systems, which need to be fed with large amounts of labeled data, our approach showed very good results using only a few annotated training samples and even achieved good accuracies in fully unsupervised scenarios. For the evaluation on high resolution face images, we combined several public datasets of real and fake faces into a new benchmark: Faces-HQ. Given such high-resolution images, our approach reaches a perfect classification accuracy of 100% when it is trained on as little as 20 annotated samples. In a second experiment, in the evaluation of the medium-resolution images of the CelebA dataset, our method achieves 100% accuracy supervised and 96% in an unsupervised setting. Finally, evaluating a low-resolution video sequences of the FaceForensics++ dataset, our method achieves 91% accuracy detecting manipulated videos.
The term attribute transfer refers to the tasks of altering images in such a way, that the semantic interpretation of a given input image is shifted towards an intended direction, which is quantified by semantic attributes. Prominent example applications are photo realistic changes of facial features and expressions, like changing the hair color, adding a smile, enlarging the nose or altering the entire context of a scene, like transforming a summer landscape into a winter panorama. Recent advances in attribute transfer are mostly based on generative deep neural networks, using various techniques to manipulate images in the latent space of the generator.
In this paper, we present a novel method for the common sub-task of local attribute transfers, where only parts of a face have to be altered in order to achieve semantic changes (e.g. removing a mustache). In contrast to previous methods, where such local changes have been implemented by generating new (global) images, we propose to formulate local attribute transfers as an inpainting problem. Removing and regenerating only parts of images, our Attribute Transfer Inpainting Generative Adversarial Network (ATI-GAN) is able to utilize local context information to focus on the attributes while keeping the background unmodified resulting in visually sound results.
Recent studies have shown remarkable success in image-to-image translation for attribute transfer applications. However, most of existing approaches are based on deep learning and require an abundant amount of labeled data to produce good results, therefore limiting their applicability. In the same vein, recent advances in meta-learning have led to successful implementations with limited available data, allowing so-called few-shot learning.
In this paper, we address this limitation of supervised methods, by proposing a novel approach based on GANs. These are trained in a meta-training manner, which allows them to perform image-to-image translations using just a few labeled samples from a new target class. This work empirically demonstrates the potential of training a GAN for few shot image-to-image translation on hair color attribute synthesis tasks, opening the door to further research on generative transfer learning.
In this preliminary report, we present a simple but very effective technique to stabilize the training of CNN based GANs. Motivated by recently published methods using frequency decomposition of convolutions (e.g. Octave Convolutions), we propose a novel convolution scheme to stabilize the training and reduce the likelihood of a mode collapse. The basic idea of our approach is to split convolutional filters into additive high and low frequency parts, while shifting weight updates from low to high during the training. Intuitively, this method forces GANs to learn low frequency coarse image structures before descending into fine (high frequency) details. Our approach is orthogonal and complementary to existing stabilization methods and can simply plugged into any CNN based GAN architecture. First experiments on the CelebA dataset show the effectiveness of the proposed method.
In this preliminary report, we present a simple but very effective technique to stabilize the training of CNN based GANs. Motivated by recently published methods using frequency decomposition of convolutions (eg Octave Convolutions), we propose a novel convolution scheme to stabilize the training and reduce the likelihood of a mode collapse. The basic idea of our approach is to split convolutional filters into additive high and low frequency parts, while shifting weight updates from low to high during the training. Intuitively, this method forces GANs to learn low frequency coarse image structures before descending into fine (high frequency) details. Our approach is orthogonal and complementary to existing stabilization methods and can simply plugged into any CNN based GAN architecture. First experiments on the CelebA dataset show the effectiveness of the proposed method.
Most eCommerce applications, like web-shops have millions of products. In this context, the identification of similar products is a common sub-task, which can be utilized in the implementation of recommendation systems, product search engines and internal supply logistics. Providing this data set, our goal is to boost the evaluation of machine learning methods for the prediction of the category of the retail products from tuples of images and descriptions.
Spatially Distributed Wireless Networks (SDWN) are one of the basic technologies for the Internet of Things (IoT) and (Industrial) Internet of Things (IIoT) applications. These SDWN for many of these applications has strict requirements such as low cost, simple installation and operations, and high potential flexibility and mobility. Among the different Narrowband Wireless Wide Area Networking (NBWWAN) technologies, which are introduced to address these categories of wireless networking requirements, Narrowband Internet of Things (NB-IoT) is getting more traction due to attractive system parameters, energy-saving mode of operation with low data rates and bandwidth, and its applicability in 5G use cases. Since several technologies are available and because the underlying use cases come with various requirements, it is essential to perform a systematic comparative analysis of competing technologies to choose the right technology. It is also important to perform testing during different phases of the system development life cycle. This paper describes the systematic test environment for automated testing of radio communication and systematic measurements of the performance of NB-IoT.
The manufacturing of conventional electronics has become a highly complicated process, which requires intensive investment. In this context, printed electronics keeps attracting attention from both academia and industry. The primary reason is the simplification of the manufacturing process via additive printing technology such as ink-jet printing. Consequently, advantages are realized such as on-demand fabrication, minimal material waste and versatile choice of substrate materials. Central to the development of printed electronic circuits are printed transistors. Recently, metal oxide semiconductors such as indium oxide have become promising materials for the fabrication of printed transistors due to their high charge mobility. Furthermore, electrolyte-gating also provides benefits such as the low-voltage operation in sub-1 V regime due to the large gate capacitance provided by electrical double layers. This opens new possibilities to fabricate printed devices and circuits for niche applications.
To facilitate the design and fabrication of printed circuits, the development of compact models is necessary. However, most of the current works have focused on the study of the static behavior of transistors, while the in-depth understanding of other characteristics such as the dynamic or noise behavior is missing. To this end, the purpose of this work is the comprehensive study on capacitance and noise properties of inkjet-printed electrolyte-gated thin-film transistors (EGT) based on indium oxide semiconductors. Proper modeling approaches are also proposed to capture accurately the electrical behaviour, which can be further utilized to enable advanced analysis of digital, analog and mixed-signal circuits.
In this work, the capacitance of EGTs is characterized using voltage-dependent impedance spectroscopy. Intrinsic and extrinsic effects are carefully separated by using de-embedding test structures. Also, a dedicated equivalent circuit model is established to offer accurate simulations of the measured frequency response of the gate impedance. Based on that, it is revealed that top-gated EGTs have the potential to reach operation frequency in the kHz regime with proper optimizations of materials and printing process. Furthermore, a Meyer-like model is proposed to accurately capture the capacitance-voltage characteristics of the lumped terminal capacitance. Both parasitic and nonquasi-static effects are considered. This further enables the AC and transient analysis of complex circuits in circuit simulators.
Following, the study of noise properties in the field of printed electronics is conducted. Low-frequency noise of EGTs is characterized using a reliable experimental setup. By examining measured noise spectra of the drain current at various gate voltages, the number fluctuation with correlated mobility fluctuation has been determined as the primary noise mechanism. Based on that, normalized flat-band voltage noise can be determined as the key performance metrics, which is only 1.08 × 10−7 V^2 µm^2, significantly lower in comparison with other thin-film technologies, which are based on dielectric gating and semiconductors such as IZO and IGZO. A plausible reason could be the large gate capacitance offered by the electrical double layers. This renders EGT technology useful for low-noise and sensitive applications such as sensor periphery circuits.
Last but not least, various circuit designs based on EGT technology are proposed, including basic digital circuits such as inverters and ring oscillators. Their performance metrics such as the propagation delay and power consumption are extensively characterized. Also, the first design of a printed full-wave rectifier is presented by using diode-connected EGTs, which features near-zero threshold voltage. As a consequence, the presented rectifier can effectively process input voltage with a small amplitude of 100 mV and a cut-off frequency of 300 Hz, which is particularly attractive for the application domain of energy harvesting. Additionally, the previously established capacitance models are verified on those circuits, which provide a satisfactory agreement between the simulation and measurement data.
Electrode modelling and simulation of diagnostic and pulmonary vein isolation in atrial fibrillation
(2022)
Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack.
Project website: https://github.com/paulgavrikov/adversarial_solarization
Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient 1×1 convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning 3×3 convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise ($1\times 1$) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights.
It is common practice to apply padding prior to convolution operations to preserve the resolution of feature-maps in Convolutional Neural Networks (CNN). While many alternatives exist, this is often achieved by adding a border of zeros around the inputs. In this work, we show that adversarial attacks often result in perturbation anomalies at the image boundaries, which are the areas where padding is used. Consequently, we aim to provide an analysis of the interplay between padding and adversarial attacks and seek an answer to the question of how different padding modes (or their absence) affect adversarial robustness in various scenarios.
Recent work has investigated the distributions of learned convolution filters through a large-scale study containing hundreds of heterogeneous image models. Surprisingly, on average, the distributions only show minor drifts in comparisons of various studied dimensions including the learned task, image domain, or dataset. However, among the studied image domains, medical imaging models appeared to show significant outliers through "spikey" distributions, and, therefore, learn clusters of highly specific filters different from other domains. Following this observation, we study the collected medical imaging models in more detail. We show that instead of fundamental differences, the outliers are due to specific processing in some architectures. Quite the contrary, for standardized architectures, we find that models trained on medical data do not significantly differ in their filter distributions from similar architectures trained on data from other domains. Our conclusions reinforce previous hypotheses stating that pre-training of imaging models can be done with any kind of diverse image data.
An Empirical Investigation of Model-to-Model Distribution Shifts in Trained Convolutional Filters
(2021)
We present first empirical results from our ongoing investigation of distribution shifts in image data used for various computer vision tasks. Instead of analyzing the original training and test data, we propose to study shifts in the learned weights of trained models. In this work, we focus on the properties of the distributions of dominantly used 3x3 convolution filter kernels. We collected and publicly provide a data set with over half a billion filters from hundreds of trained CNNs, using a wide range of data sets, architectures, and vision tasks. Our analysis shows interesting distribution shifts (or the lack thereof) between trained filters along different axes of meta-parameters, like data type, task, architecture, or layer depth. We argue, that the observed properties are a valuable source for further investigation into a better understanding of the impact of shifts in the input data to the generalization abilities of CNN models and novel methods for more robust transfer-learning in this domain.
Neural networks have a number of shortcomings. Amongst the severest ones is the sensitivity to distribution shifts which allows models to be easily fooled into wrong predictions by small perturbations to inputs that are often imperceivable to humans and do not have to carry semantic meaning. Adversarial training poses a partial solution to address this issue by training models on worst-case perturbations. Yet, recent work has also pointed out that the reasoning in neural networks is different from humans. Humans identify objects by shape, while neural nets mainly employ texture cues. Exemplarily, a model trained on photographs will likely fail to generalize to datasets containing sketches. Interestingly, it was also shown that adversarial training seems to favorably increase the shift toward shape bias. In this work, we revisit this observation and provide an extensive analysis of this effect on various architectures, the common L_2-and L_-training, and Transformer-based models. Further, we provide a possible explanation for this phenomenon from a frequency perspective.
The importance of machine learning has been increasing dramatically for years. From assistance systems to production optimisation to support the health sector, almost every area of daily life and industry comes into contact with machine learning. Besides all the benefits that ML brings, the lack of transparency and the difficulty in creating traceability pose major risks. While there are solutions that make the training of machine learning models more transparent, traceability is still a major challenge. Ensuring the identity of a model is another challenge. Unnoticed modification of a model is also a danger when using ML. One solution is to create an ML birth certificate and an ML family tree secured by blockchain technology. Important information about training and changes to the model through retraining can be stored in a blockchain and accessed by any user to create more security and traceability about an ML model.
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Current attack methods are able to manipulate the network's prediction by adding specific but small amounts of noise to the input. In turn, adversarial training (AT) aims to achieve robustness against such attacks and ideally a better model generalization ability by including adversarial samples in the trainingset. However, an in-depth analysis of the resulting robust models beyond adversarial robustness is still pending. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions, even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences. Data & Project website: https://github.com/GeJulia/robustness_confidences_evaluation
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Adversarial training (AT) is often considered as a remedy to train more robust networks. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences.
Over the last years, Convolutional Neural Networks (CNNs) have been the dominating neural architecture in a wide range of computer vision tasks. From an image and signal processing point of view, this success might be a bit surprising as the inherent spatial pyramid design of most CNNs is apparently violating basic signal processing laws, i.e. Sampling Theorem in their down-sampling operations. However, since poor sampling appeared not to affect model accuracy, this issue has been broadly neglected until model robustness started to receive more attention. Recent work in the context of adversarial attacks and distribution shifts, showed after all, that there is a strong correlation between the vulnerability of CNNs and aliasing artifacts induced by poor down-sampling operations. This paper builds on these findings and introduces an aliasing free down-sampling operation which can easily be plugged into any CNN architecture: FrequencyLowCut pooling. Our experiments show, that in combination with simple and Fast Gradient Sign Method (FGSM) adversarial training, our hyper-parameter free operator substantially improves model robustness and avoids catastrophic overfitting. Our code is available at https://github.com/GeJulia/flc_pooling
Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling
(2023)
Convolutional neural networks encode images through a sequence of convolutions, normalizations and non-linearities as well as downsampling operations into potentially strong semantic embeddings. Yet, previous work showed that even slight mistakes during sampling, leading to aliasing, can be directly attributed to the networks' lack in robustness. To address such issues and facilitate simpler and faster adversarial training, [12] recently proposed FLC pooling, a method for provably alias-free downsampling - in theory. In this work, we conduct a further analysis through the lens of signal processing and find that such current pooling methods, which address aliasing in the frequency domain, are still prone to spectral leakage artifacts. Hence, we propose aliasing and spectral artifact-free pooling, short ASAP. While only introducing a few modifications to FLC pooling, networks using ASAP as downsampling method exhibit higher native robustness against common corruptions, a property that FLC pooling was missing. ASAP also increases native robustness against adversarial attacks on high and low resolution data while maintaining similar clean accuracy or even outperforming the baseline.
Motivated by the recent trend towards the usage of larger receptive fields for more context-aware neural networks in vision applications, we aim to investigate how large these receptive fields really need to be. To facilitate such study, several challenges need to be addressed, most importantly: (i) We need to provide an effective way for models to learn large filters (potentially as large as the input data) without increasing their memory consumption during training or inference, (ii) the study of filter sizes has to be decoupled from other effects such as the network width or number of learnable parameters, and (iii) the employed convolution operation should be a plug-and-play module that can replace any conventional convolution in a Convolutional Neural Network (CNN) and allow for an efficient implementation in current frameworks. To facilitate such models, we propose to learn not spatial but frequency representations of filter weights as neural implicit functions, such that even infinitely large filters can be parameterized by only a few learnable weights. The resulting neural implicit frequency CNNs are the first models to achieve results on par with the state-of-the-art on large image classification benchmarks while executing convolutions solely in the frequency domain and can be employed within any CNN architecture. They allow us to provide an extensive analysis of the learned receptive fields. Interestingly, our analysis shows that, although the proposed networks could learn very large convolution kernels, the learned filters practically translate into well-localized and relatively small convolution kernels in the spatial domain.
Many commonly well-performing convolutional neural network models have shown to be susceptible to input data perturbations, indicating a low model robustness. To reveal model weaknesses, adversarial attacks are specifically optimized to generate small, barely perceivable image perturbations that flip the model prediction. Robustness against attacks can be gained by using adversarial examples during training, which in most cases reduces the measurable model attackability. Unfortunately, this technique can lead to robust overfitting, which results in non-robust models. In this paper, we analyze adversarially trained, robust models in the context of a specific network operation, the downsampling layer, and provide evidence that robust models have learned to downsample more accurately and suffer significantly less from downsampling artifacts, aka. aliasing, than baseline models. In the case of robust overfitting, we observe a strong increase in aliasing and propose a novel early stopping approach based on the measurement of aliasing.
Many commonly well-performing convolutional neural network models have shown to be susceptible to input data perturbations, indicating a low model robustness. Adversarial attacks are thereby specifically optimized to reveal model weaknesses, by generating small, barely perceivable image perturbations that flip the model prediction. Robustness against attacks can be gained for example by using adversarial examples during training, which effectively reduces the measurable model attackability. In contrast, research on analyzing the source of a model’s vulnerability is scarce. In this paper, we analyze adversarially trained, robust models in the context of a specifically suspicious network operation, the downsampling layer, and provide evidence that robust models have learned to downsample more accurately and suffer significantly less from aliasing than baseline models.
We introduce an open source python framework named PHS-Parallel Hyperparameter Search to enable hyperparameter optimization on numerous compute instances of any arbitrary python function. This is achieved with minimal modifications inside the target function. Possible applications appear in expensive to evaluate numerical computations which strongly depend on hyperparameters such as machine learning. Bayesian optimization is chosen as a sample efficient method to propose the next query set of parameters.
Aerosol particles play an important role in the climate system by absorbing and scattering radiation and influencing cloud properties. They are also one of the biggest sources of uncertainty for climate modeling. Many climate models do not include aerosols in sufficient detail due to computational constraints. To represent key processes, aerosol microphysical properties and processes have to be accounted for. This is done in the ECHAM-HAM (European Center for Medium-Range Weather Forecast-Hamburg-Hamburg) global climate aerosol model using the M7 microphysics, but high computational costs make it very expensive to run with finer resolution or for a longer time. We aim to use machine learning to emulate the microphysics model at sufficient accuracy and reduce the computational cost by being fast at inference time. The original M7 model is used to generate data of input–output pairs to train a neural network (NN) on it. We are able to learn the variables’ tendencies achieving an average R² score of 77.1%. We further explore methods to inform and constrain the NN with physical knowledge to reduce mass violation and enforce mass positivity. On a Graphics processing unit (GPU), we achieve a speed-up of up to over 64 times faster when compared to the original model.
Aerosol particles play an important role in the climate system by absorbing and scattering radiation and influencing cloud properties. They are also one of the biggest sources of uncertainty for climate modeling. Many climate models do not include aerosols in sufficient detail. In order to achieve higher accuracy, aerosol microphysical properties and processes have to be accounted for. This is done in the ECHAM-HAM global climate aerosol model using the M7 microphysics model, but increased computational costs make it very expensive to run at higher resolutions or for a longer time. We aim to use machine learning to approximate the microphysics model at sufficient accuracy and reduce the computational cost by being fast at inference time. The original M7 model is used to generate data of input-output pairs to train a neural network on it. By using a special logarithmic transform we are able to learn the variables tendencies achieving an average score of . On a GPU we achieve a speed-up of 120 compared to the original model.
Restoring hand motion to people experiencing amputation, paralysis, and stroke is a critical area of research and development. While electrode-based systems that use input from the brain or muscle have proven successful, these systems tend to be expensive and di¨cult to learn. One group of researchers is exploring the use of augmented reality (AR) as a new way of controlling hand prostheses. A camera mounted on eyeglasses tracks LEDs on a prosthetic to execute opening and closing commands using one of two different AR systems. One system uses a rectangular command window to control motion: crossing horizontally signals “open” along one direction and “close” in the opposite direction. The second system uses a circular command window: once control is enabled, gripping strength can be controlled by the direction of head motion. While the visual system remains to be tested with patients, its low cost, ease of use, and lack of electrodes make the device a promising solution for restoring hand motion.
Neuroprosthetics 2.0
(2019)
The present invention relates to open-loop and closed-loop control units for extracorporeal circulatory support, to systems comprising such an open-loop and closed-loop control unit, and to corresponding methods. An open-loop and closed-loop control unit (10) for extracorporeal circulatory support is proposed, which is configured to receive a measurement of an ECG signal (12) of a supported patient over a predefined period of time, wherein the ECG signal (12) comprises multiple data points for each time point within a heart cycle. The open-loop and closed-loop control unit (10) comprises an evaluation unit (100) which is configured to evaluate the data points for at least one time point in a spatial and/or temporal manner and to determine at least one amplitude change (14) within the heart cycle based on the evaluated data points. The open-loop and closed-loop control unit (10) is further configured to output an open-loop and/or closed-loop signal (16) for extracorporeal circulatory support at a predefined point in time after the at least one amplitude change (14).
The present invention relates to open-loop and closed-loop control units for extracorporeal circulatory support, to systems comprising such an open-loop and closed-loop control unit, and to corresponding methods. An open-loop and closed-loop control unit (10) for extracorporeal circulatory support is proposed, which is configured to receive a measurement of an ECG signal (12) of a supported patient over a predefined period of time, wherein the ECG signal (12) comprises multiple data points for each time point within a heart cycle. The open-loop and closed-loop control unit (10) comprises an evaluation unit (100) which is configured to evaluate the data points for at least one time point in a spatial and/or temporal manner and to determine at least one amplitude change (14) within the heart cycle based on the evaluated data points. The open-loop and closed-loop control unit (10) is further configured to output an open-loop and/or closed-loop signal (16) for extracorporeal circulatory support at a predefined point in time after the at least one amplitude change (14).
Device and method for monitoring and optimising a temporal trigger stability (WO2023094554A1)
(2023)
The present invention relates to devices for monitoring and optimising a temporal trigger stability of an extracorporeal circulatory support means, and to open-loop and closed-loop control units for the extracorporeal circulatory support means comprising such a device, and to corresponding methods. A device (10) for monitoring a temporal trigger stability of an extracorporeal circulatory support means is accordingly proposed, which device is designed to receive a first dataset (14) of a measurement of an ECG signal of a supported patient over a predefined period of time. The device (10) comprises an evaluation unit (16), which is designed to determine or identify a plurality of R triggers (26) from the first dataset (14), wherein the evaluation unit (16) is also designed to receive or provide a second dataset (20) having evaluated ECG signals and a plurality of R triggers (28) and to selectively map the second dataset (20) on the first dataset (14). The device is also designed to emit a signal (22) that characterises a temporal gap between successive R triggers (26) from the first dataset (14) and successive R triggers (28) from the second dataset (20) which are mapped on the first dataset.
Sweaty has already participated several times in RoboCup soccer competitions (Adult Size). Now the work is focused coordinating the play of two robots. Moreover, we are working on stabilizing the gait by adding additional sensor information. An ongoing work is the optimization of the control strategy by balancing between impedance and position control. By minimizing the jerk, gait and overall gameplay should improve significantly.
Multiple Object Tracking (MOT) is a long-standing task in computer vision. Current approaches based on the tracking by detection paradigm either require some sort of domain knowledge or supervision to associate data correctly into tracks. In this work, we present a self-supervised multiple object tracking approach based on visual features and minimum cost lifted multicuts. Our method is based on straight-forward spatio-temporal cues that can be extracted from neighboring frames in an image sequences without supervision. Clustering based on these cues enables us to learn the required appearance invariances for the tracking task at hand and train an AutoEncoder to generate suitable latent representations. Thus, the resulting latent representations can serve as robust appearance cues for tracking even over large temporal distances where no reliable spatio-temporal features can be extracted. We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking.
In this work, we evaluate two different image clustering objectives, k-means clustering and correlation clustering, in the context of Triplet Loss induced feature space embeddings. Specifically, we train a convolutional neural network to learn discriminative features by optimizing two popular versions of the Triplet Loss in order to study their clustering properties under the assumption of noisy labels. Additionally, we propose a new, simple Triplet Loss formulation, which shows desirable properties with respect to formal clustering objectives and outperforms the existing methods. We evaluate all three Triplet loss formulations for K-means and correlation clustering on the CIFAR-10 image classification dataset.
Estimating the Robustness of Classification Models by the Structure of the Learned Feature-Space
(2022)
Over the last decade, the development of deep image classification networks has mostly been driven by the search for the best performance in terms of classification accuracy on standardized benchmarks like ImageNet. More recently, this focus has been expanded by the notion of model robustness, \ie the generalization abilities of models towards previously unseen changes in the data distribution. While new benchmarks, like ImageNet-C, have been introduced to measure robustness properties, we argue that fixed testsets are only able to capture a small portion of possible data variations and are thus limited and prone to generate new overfitted solutions. To overcome these drawbacks, we suggest to estimate the robustness of a model directly from the structure of its learned feature-space. We introduce robustness indicators which are obtained via unsupervised clustering of latent representations from a trained classifier and show very high correlations to the model performance on corrupted test data.
Engineering, construction and operation of complex machines involves a wide range of complicated, simultaneous tasks, which potentially could be automated. In this work, we focus on perception tasks in such systems, investigating deep learning approaches for multi-task transfer learning with limited training data. We show an approach that takes advantage of a technical systems’ focus on selected objects and their properties. We create focused representations and simultaneously solve joint objectives in a system through multi-task learning with convolutional autoencoders. The focused representations are used as a starting point for the data-saving solution of the additional tasks. The efficiency of this approach is demonstrated using images and tasks of an autonomous circular crane with a grapple.
A versatile liquid metal (LM) printing process enabling the fabrication of various fully printed devices such as intra- and interconnect wires, resistors, diodes, transistors, and basic circuit elements such as inverters which are process compatible with other digital printing and thin film structuring methods for integration is presented. For this, a glass capillary-based direct-write method for printing LMs such as eutectic gallium alloys, exploring the potential for fully printed LM-enabled devices is demonstrated. Examples for successful device fabrication include resistors, p–n diodes, and field effect transistors. The device functionality and easiness of one integrated fabrication flow shows that the potential of LM printing is far exceeding the use of interconnecting conventional electronic devices in printed electronics.
This paper describes the concept and some results of the project "Menschen Lernen Maschinelles Lernen" (Humans Learn Machine Learning, ML2) of the University of Applied Sciences Offenburg. It brings together students of different courses of study and practitioners from companies on the subject of Machine Learning. A mixture of blended learning and practical projects ensures a tight coupling of machine learning theory and application. The paper details the phases of ML2 and mentions two successful example projects.
In this study, a facile method to fabricate a cohesive ion‐gel based gate insulator for electrolyte‐gated transistors is introduced. The adhesive and flexible ion‐gel can be laminated easily on the semiconducting channel and electrode manually by hand. The ion‐gel is synthesized by a straightforward technique without complex procedures and shows a remarkable ionic conductivity of 4.8 mS cm−1 at room temperature. When used as a gate insulator in electrolyte‐gated transistors (EGTs), an on/off current ratio of 2.24×104 and a subthreshold swing of 117 mV dec−1 can be achieved. This performance is roughly equivalent to that of ink drop‐casted ion‐gels in electrolyte‐gated transistors, indicating that the film‐attachment method might represent a valuable alternative to ink drop‐casting for the fabrication of gate insulators.
The mathematical representations of data in the Spherical Harmonic (SH) domain has recently regained increasing interest in the machine learning community. This technical report gives an in-depth introduction to the theoretical foundation and practical implementation of SH representations, summarizing works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. In extension, these methods are then generalized from scalar SH representations to Vectorial Harmonics (VH), providing the same capabilities for 3d vector fields on spheres.
Multiple Object Tracking (MOT) is a long-standing task in computer vision. Current approaches based on the tracking by detection paradigm either require some sort of domain knowledge or supervision to associate data correctly into tracks. In this work, we present an unsupervised multiple object tracking approach based on visual features and minimum cost lifted multicuts. Our method is based on straight-forward spatio-temporal cues that can be extracted from neighboring frames in an image sequences without superivison. Clustering based on these cues enables us to learn the required appearance invariances for the tracking task at hand and train an autoencoder to generate suitable latent representation. Thus, the resulting latent representations can serve as robust appearance cues for tracking even over large temporal distances where no reliable spatio-temporal features could be extracted. We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking.
Diffracted waves carry high‐resolution information that can help interpreting fine structural details at a scale smaller than the seismic wavelength. However, the diffraction energy tends to be weak compared to the reflected energy and is also sensitive to inaccuracies in the migration velocity, making the identification of its signal challenging. In this work, we present an innovative workflow to automatically detect scattering points in the migration dip angle domain using deep learning. By taking advantage of the different kinematic properties of reflected and diffracted waves, we separate the two types of signals by migrating the seismic amplitudes to dip angle gathers using prestack depth imaging in the local angle domain. Convolutional neural networks are a class of deep learning algorithms able to learn to extract spatial information about the data in order to identify its characteristics. They have now become the method of choice to solve supervised pattern recognition problems. In this work, we use wave equation modelling to create a large and diversified dataset of synthetic examples to train a network into identifying the probable position of scattering objects in the subsurface. After giving an intuitive introduction to diffraction imaging and deep learning and discussing some of the pitfalls of the methods, we evaluate the trained network on field data and demonstrate the validity and good generalization performance of our algorithm. We successfully identify with a high‐accuracy and high‐resolution diffraction points, including those which have a low signal to noise and reflection ratio. We also show how our method allows us to quickly scan through high dimensional data consisting of several versions of a dataset migrated with a range of velocities to overcome the strong effect of incorrect migration velocity on the diffraction signal.
Extracting horizon surfaces from key reflections in a seismic image is an important step of the interpretation process. Interpreting a reflection surface in a geologically complex area is a difficult and time-consuming task, and it requires an understanding of the 3D subsurface geometry. Common methods to help automate the process are based on tracking waveforms in a local window around manual picks. Those approaches often fail when the wavelet character lacks lateral continuity or when reflections are truncated by faults. We have formulated horizon picking as a multiclass segmentation problem and solved it by supervised training of a 3D convolutional neural network. We design an efficient architecture to analyze the data over multiple scales while keeping memory and computational needs to a practical level. To allow for uncertainties in the exact location of the reflections, we use a probabilistic formulation to express the horizons position. By using a masked loss function, we give interpreters flexibility when picking the training data. Our method allows experts to interactively improve the results of the picking by fine training the network in the more complex areas. We also determine how our algorithm can be used to extend horizons to the prestack domain by following reflections across offsets planes, even in the presence of residual moveout. We validate our approach on two field data sets and show that it yields accurate results on nontrivial reflectivity while being trained from a workable amount of manually picked data. Initial training of the network takes approximately 1 h, and the fine training and prediction on a large seismic volume take a minute at most.
Significant progress in the development and commercialization of electrically conductive adhesives has been made. This makes shingling a very attractive approach for solar cell interconnection. In this study, we investigate the shading tolerance of two types of solar modules based on shingle interconnection: first, the already commercialized string approach, and second, the matrix technology where solar cells are intrinsically interconnected in parallel and in series. An experimentally validated LTspice model predicts major advantages for the power output of the matrix layout under partial shading. Diagonal as well as random shading of a 1.6-m2 solar module is examined. Power gains of up to 73.8 % for diagonal shading and up to 96.5 % for random shading are found for the matrix technology compared to the standard string approach. The key factor is an increased current extraction due to lateral current flows. Especially under minor shading, the matrix technology benefits from an increased fill factor as well. Under diagonal shading, we find the probability of parts of the matrix module being bypassed to be reduced by 40 % in comparison to the string module. In consequence, the overall risk of hotspot occurrence in matrix modules is decreased significantly.
In this paper, we describe a first publicly available fine-grained product recognition dataset based on leaflet images. Using advertisement leaflets, collected over several years from different European retailers, we provide a total of 41.6k manually annotated product images in 832 classes. Further, we investigate three different approaches for this fine-grained product classification task, Classification by Image, by Text, as well as by Image and Text. The approach "Classification by Text" uses the text extracted directly from the leaflet product images. We show, that the combination of image and text as input improves the classification of visual difficult to distinguish products. The final model leads to an accuracy of 96.4% with a Top-3 score of 99.2%. We release our code at https://github.com/ladwigd/Leaflet-Product-Classification.
Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms.
Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.
Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.
Recently, adversarial attacks on image classification networks by the AutoAttack (Croce and Hein, 2020b) framework have drawn a lot of attention. While AutoAttack has shown a very high attack success rate, most defense approaches are focusing on network hardening and robustness enhancements, like adversarial training. This way, the currently best-reported method can withstand about 66% of adversarial examples on CIFAR10. In this paper, we investigate the spatial and frequency domain properties of AutoAttack and propose an alternative defense. Instead of hardening a network, we detect adversarial attacks during inference, rejecting manipulated inputs. Based on a rather simple and fast analysis in the frequency domain, we introduce two different detection algorithms. First, a black box detector that only operates on the input images and achieves a detection accuracy of 100% on the AutoAttack CIFAR10 benchmark and 99.3% on ImageNet, for epsilon = 8/255 in both cases. Second, a whitebox detector using an analysis of CNN feature maps, leading to a detection rate of also 100% and 98.7% on the same benchmarks.
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant m argin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small “detector” is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks’ local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
Recently, RobustBench (Croce et al. 2020) has become a widely recognized benchmark for the adversarial robustness of image
classification networks. In it’s most commonly reported sub-task, RobustBench evaluates and ranks the adversarial robustness of trained neural networks on CIFAR10 under AutoAttack (Croce and Hein 2020b) with l∞ perturbations limited to ϵ = 8/255. With leading scores of the currently best performing models of around 60% of the baseline, it is fair to characterize this benchmark to be quite challenging. Despite it’s general acceptance in recent literature, we aim to foster discussion about the suitability of RobustBench as a key indicator for robustness which could be generalized to practical applications. Our line of argumentation against this is two-fold and supported by excessive experiments presented in this paper: We argue that I) the alternation of data by AutoAttack with l∞, ϵ = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples even by simple detection algorithms and human observers.
We also show that other attack methods are much harder to detect while achieving similar success rates. II) That results on low resolution data sets like CIFAR10 do not generalize well to higher resolution images as gradient based attacks appear to become even more detectable with increasing resolutions.
Neural networks tend to overfit the training distribution and perform poorly on out-ofdistribution data. A conceptually simple solution lies in adversarial training, which introduces worst-case perturbations into the training data and thus improves model generalization to some extent. However, it is only one ingredient towards generally more robust models and requires knowledge about the potential attacks or inference time data corruptions during model training. This paper focuses on the native robustness of models that can learn robust behavior directly from conventional training data without out-of-distribution examples. To this end, we study the frequencies in learned convolution filters. Clean-trained models often prioritize high-frequency information, whereas adversarial training enforces models to shift the focus to low-frequency details during training. By mimicking this behavior through frequency regularization in learned convolution weights, we achieve improved native robustness to adversarial attacks, common corruptions, and other out-of-distribution tests. Additionally, this method leads to more favorable shifts in decision-making towards low-frequency information, such as shapes, which inherently aligns more closely with human vision.
Harnessing the overall benefits of the latest advancements in artificial intelligence (AI) requires the extensive collaboration of academia and industry. These collaborations promote innovation and growth while enforcing the practical usefulness of newer technologies in real life. The purpose of this article is to outline the challenges faced during cross-collaboration between academia and industry. These challenges are also inspected with the help of an ongoing project titled “Quality Assurance of Machine Learning Applications” (Q-AMeLiA), in which three universities cooperate with five industry partners to make the product risk of AI-based products visible. Further, we discuss the hurdles and the key challenges in machine learning (ML) technology transformation from academia to industry based on robustness, simplicity, and safety. These challenges are an outcome of the lack of common standards, metrics, and missing regulatory considerations when state-of-the-art (SOTA) technology is developed in academia. The use of biased datasets involves ethical concerns that might lead to unfair outcomes when the ML model is deployed in production. The advancement of AI in small and medium sized enterprises (SMEs) requires more in terms of common tandardization of concepts rather than algorithm breakthroughs. In this paper, in addition to the general challenges, we also discuss domain specific barriers for five different domains i.e., object detection, hardware benchmarking, continual learning, action recognition, and industrial process automation, and highlight the steps necessary for successfully managing the cross-sectoral collaborations between academia and industry.
Apache Hadoop is a well-known open-source framework for storing and processing huge amounts of data. This paper shows the usage of the framework within a project of the university in cooperation with a semiconductor company. The goal of this project was to supplement the existing data landscape by the facilities of storing and analyzing the data on a new Apache Hadoop based platform.
Printed systems spark immense interest in industry, and for several parts such as solar cells or radio frequency identification antennas, printed products are already available on the market. This has led to intense research; however, printed field-effect transistors (FETs) and logics derived thereof still have not been sufficiently developed to be adapted by industry. Among others, one of the reasons for this is the lack of control of the threshold voltage during production. In this work, we show an approach to adjust the threshold voltage (Vth) in printed electrolyte-gated FETs (EGFETs) with high accuracy by doping indium-oxide semiconducting channels with chromium. Despite high doping concentrations achieved by a wet chemical process during precursor ink preparation, good on/off-ratios of more than five orders of magnitude could be demonstrated. The synthesis process is simple, inexpensive, and easily scalable and leads to depletion-mode EGFETs, which are fully functional at operation potentials below 2 V and allows us to increase Vth by approximately 0.5 V.
We have developed a methodology for the systematic generation of a large image dataset of macerated wood references, which we used to generate image data for nine hardwood genera. This is the basis for a substantial approach to automate, for the first time, the identification of hardwood species in microscopic images of fibrous materials by deep learning. Our methodology includes a flexible pipeline for easy annotation of vessel elements. We compare the performance of different neural network architectures and hyperparameters. Our proposed method performs similarly well to human experts. In the future, this will improve controls on global wood fiber product flows to protect forests.
Artificial intelligence (AI), and in particular machine learning algorithms, are of increasing importance in many application areas but interpretability and understandability as well as responsibility, accountability, and fairness of the algorithms' results, all crucial for increasing the humans' trust into the systems, are still largely missing. Big industrial players, including Google, Microsoft, and Apple, have become aware of this gap and recently published their own guidelines for the use of AI in order to promote fairness, trust, interpretability, and other goals. Interactive visualization is one of the technologies that may help to increase trust in AI systems. During the seminar, we discussed the requirements for trustworthy AI systems as well as the technological possibilities provided by interactive visualizations to increase human trust in AI.
eLetter zum Artikel "Plague Through History" von Nils Chr. Stenseth, veröffentlicht in Science, Vol. 321, Issue 5890, Seite 773-774 (doi.org/10.1126/science.1161496)
(1) Background: Little is known about the baroque composer Domenico Scarlatti (1685-1757), whose life was centred behind closed doors at the royal court in Spain. There are no reports about his illnesses. From his compositions, mainly for harpsichord, an outstanding virtuosity can be read. (2) Case Presentation: In this case report, the only known oil painting of Domenico Scarlatti is presented, on which he is about 50 years old. In it one recognizes conspicuous hands with hints of watch glass nails and drumstick fingers. (3) Discussion: Whether Scarlatti had chronic hypoxia of peripheral body regions as a sign of, e.g., bronchial cancer or a severe heart disease, is not known. (4) Conclusions: The above-mentioned signs recorded in the oil painting, even if they were not interpretable at that time, are clearly represented and recorded for us and are open to diagnostic discussion from today's point of view.
In this entry, the 3D CAD reconstructions and 3D multi-material polymer replica printings of knight Götz von Berlichingen´s first „Iron Hand,“ which were developed in the last few years at Offenburg University, are presented. Even by today's standards, the first “Iron Hand”–as could be shown in the replicas–demonstrates sophisticated mechanics and well thought-out functionality and still offers inspiration and food for discussion when it comes to the question of an artificial prosthetic replacement for a hand.
eLetter zum Artikel "The Hannes hand prosthesis replicates the key biological properties of the human hand" von Matteo Laffranchi et al., veröffentlicht in Science Robotics, Vol. 5, Issue 46, eabb0467 (doi.org/10.1126/scirobotics.abb0467)
eLetter: "The ancient Capua leg from 300 BC and the 1941 air raid on the Royal College of Surgeons"
(2021)
eLetter zum Artikel "The College of Surgeons, London", veröffentlicht in Science, Vol. 93, Issue 2425, Seite 587 (DOI: 10.1126/science.93.2425.587).
eLetter zum Artikel "Condiciones neuropsi-quiátricas y probable causa de muerte de Maurice Ravel" von Gómez-Carvajal AM, Botero-Meneses JS, Palacios-Espinosa X und Palacios-Sánchez L., veröffentlicht in Iatreia 35(3), Seite 341-8 (DOI: https://doi.org/10.17533/udea.iatreia.154).
In this paper pathophysiological interrelated deactivation/activation phenomena are set out in the example of whiplash injury. These phenomena could have been underestimated in previous positron emission tomography studies as their focus was on hypoperfusion rather than hyperperfusion. In addition, statistical parametric mapping analysis of cerebral studies is normally not fine-tuned to special interesting areas rather than to obvious clusters of difference.
Kommentar zum Artikel "Arthur Willis Goodspeed" von Otto Glasser, veröffentlicht in Science Vol. 98, Issue 2540, Seite 219 (doi.org/10.1126/science.98.2536.125).
A disturbed synchronization of the ventricular contraction can cause a highly developed systolic heart failure in affected patients with reduction of the left ventricular ejection fraction, which can often be explained by a diseased left bundle branch block (LBBB). If medication remains unresponsive, the concerned patients will be treated with a cardiac resynchronization therapy (CRT) system. The aim of this study was to integrate His-bundle pacing into the Offenburg heart rhythm model in order to visualize the electrical pacing field generated by His-Bundle-Pacing. Modelling and electrical field simulation activities were performed with the software CST (Computer Simulation Technology) from Dessault Systèms. CRT with biventricular pacing is to be achieved by an apical right ventricular electrode and an additional left ventricular electrode, which is floated into the coronary vein sinus. The non-responder rate of the CRT therapy is about one third of the CRT patients. His- Bundle-Pacing represents a physiological alternative to conventional cardiac pacing and cardiac resynchronization. An electrode implanted in the His-bundle emits a stronger electrical pacing field than the electrical pacing field of conventional cardiac pacemakers. The pacing of the Hisbundle was performed by the Medtronic Select Secure 3830 electrode with pacing voltage amplitudes of 3 V, 2 V and 1,5 V in combination with a pacing pulse duration of 1 ms. Compared to conventional pacemaker pacing, His-bundle pacing is capable of bridging LBBB conduction disorders in the left ventricle. The His-bundle pacing electrical field is able to spread via the physiological pathway in the right and left ventricles for CRT with a narrow QRS-complex in the surface ECG.