Refine
Document Type
Conference Type
- Konferenzartikel (5)
- Konferenz-Abstract (1)
Language
- English (6)
Is part of the Bibliography
- yes (6)
Keywords
- Neural networks (3)
- efficient training (3)
- Edge AI (2)
- Embedded AI (2)
- Embedded Systems (1)
- Federated Learning (1)
- Predictive Maintenance (1)
- TinyML (1)
- Unsupervised Learning (1)
- Variational Autoencoders (1)
Open Access
- Open Access (4)
- Closed (2)
- Diamond (2)
- Bronze (1)
This study introduces EmbeddedTrain, an innovative algorithm optimized for on-device learning in deep neural networks, specifically designed for low-power microcontroller units. EmbeddedTrain refines sparse backpropagation by dynamically adjusting the level of sparity, including the ability to selectively skip training steps. This feature significantly lowers computational effort without substantially compromising accuracy. Our comprehensive evaluation across diverse datasets—CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR, and DCASE2020—reveals that EmbeddedTrain achieves near-parity with full training methods, with an average accuracy drop of only around 1% in most cases. For instance, against full training, EmbeddedTrain’s accuracy drop is minimal, for example, only 0.82% on CIFAR 10 and 1.07% on CIFAR100. In terms of computational effort, EmbeddedTrain shows a marked reduction, requiring as little as 10% of the computational effort needed for full training in some scenarios, and consistently outperforms other sparse training methodologies. These findings underscore EmbeddedTrain’s capacity to efficiently manage computational resources while maintaining high accuracy, positioning it as an advantageous solution for advanced embedded device applications in the IoT ecosystem.
Objective: Dickkopf 3 (DKK3) has been identified as a urinary biomarker. Values above 4000 pg/mg creatinine (Cr) were linked with a higher risk of short-term decline of kidney function (J Am Soc Nephrol 29: 2722–2733). However, as of today, there is little experience with DKK3 as a risk marker in everyday clinical practice. We used algorithm-based data analysis to evaluate the potential dependence of DKK3 in a cohort from a large single center in Germany.
Method: DKK3 was measured in all CKD patients in our center October 1 st 2018 till Dec. 31 2019, together with calculated GFR (eGFR) and urinary albumin/creatinine ratio (UACR). Kidney transplant patients were excluded. Until the end of follow-up Dec 31 st 2021, repeated measurements were performed for all parameters. Data analysis was performed using MD-Explorer (BioArtProducts, Rostock, Germany) and Python with multiple libraries. Linear regression models were applied in patients for DKK3, eGFR and UACR. Comparison of the models was performed with a twosided Kolmogorov-Smirnov test.
Results: 1206 DKK3 measurements were performed in 1103 patients (621 male, age 70yrs, eGFR 29,41 ml/min/1.73qm, UACR 800 mg/g). 134 patients died during follow-up. DKK3 mean was 2905 pg/mg Cr (max. 20000, 75 % percentile 3800). 121 pts had DKK3 > 4000. At the end of follow-up 7 % of patients with DKK3 < 4000 (initial eGFR 17.6) versus 39.6 % of patients with DDK3 > 4000 (initial eGFR 15.7) underwent dialysis. Compared to eGFR and UACR at baseline, DKK3 > 4000 performed best to predict eGFR loss over the next 12 months.
Conclusion: In this cohort of CKD patients, DKK3 > 4000 at baseline predicted the eGFR slope better than eGFR or UACR at baseline. DKK3 > 4000 reflected a higher risk of progression towards ESRD in patients with similar baseline eGFR levels.
Training deep neural networks using backpropagation is very memory and computationally intensive. This makes it difficult to run on-device learning or fine-tune neural networks on tiny, embedded devices such as low-power micro-controller units (MCUs). Sparse backpropagation algorithms try to reduce the computational load of on-device learning by training only a subset of the weights and biases. Existing approaches use a static number of weights to train. A poor choice of this so-called backpropagation ratio limits either the computational gain or can lead to severe accuracy losses. In this paper we present TinyProp, the first sparse backpropagation method that dynamically adapts the back-propagation ratio during on-device training for each training step. TinyProp induces a small calculation overhead to sort the elements of the gradient, which does not significantly impact the computational gains. TinyProp works particularly well on fine-tuning trained networks on MCUs, which is a typical use case for embedded applications. For typical datasets from three datasets MNIST, DCASE2020 and CIFAR10, we are 5 times faster compared to non-sparse training with an accuracy loss of on average 1%. On average, TinyProp is 2.9 times faster than existing, static sparse backpropagation algorithms and the accuracy loss is reduced on average by 6 % compared to a typical static setting of the back-propagation ratio.
In recent years, the topic of embedded machine learning has become very popular in AI research. With the help of various compression techniques such as pruning, quantization and others compression techniques, it became possible to run neural networks on embedded devices. These techniques have opened up a whole new application area for machine learning. They range from smart products such as voice assistants to smart sensors that are needed in robotics. Despite the achievements in embedded machine learning, efficient algorithms for training neural networks in constrained domains are still lacking. Training on embedded devices will open up further fields of applications. Efficient training algorithms would enable federated learning on embedded devices, in which the data remains where it was collected, or retraining of neural networks in different domains. In this paper, we summarize techniques that make training on embedded devices possible. We first describe the need and requirements for such algorithms. Then we examine existing techniques that address training in resource-constrained environments as well as techniques that are also suitable for training on embedded devices, such as incremental learning. At the end, we also discuss which problems and open questions still need to be solved in these areas.
To demonstrate how deep learning can be applied to industrial applications with limited training data, deep learning methodologies are used in three different applications. In this paper, we perform unsupervised deep learning utilizing variational autoencoders and demonstrate that federated learning is a communication efficient concept for machine learning that protects data privacy. As an example, variational autoencoders are utilized to cluster and visualize data from a microelectromechanical systems foundry. Federated learning is used in a predictive maintenance scenario using the C-MAPSS dataset.
A new algorithm for incremental learning in the context of Tiny Machine learning (TinyML) is presented, which is optimized for low-performance and energy efficient embedded devices. TinyML is an emerging field that deploys machine learning models on resource-constrained devices such as microcontrollers, enabling intelligent applications like voice recognition, anomaly detection, predictive maintenance, and sensor data processing in environments where traditional machine learning models are not feasible. The algorithm solve the challenge of catastrophic forgetting through the use of knowledge distillation to create a small, distilled dataset. The novelty of the method is that the size of the model can be adjusted dynamically, so that the complexity of the model can be adapted to the requirements of the task. This offers a solution for incremental learning in resource-constrained environments, where both model size and computational efficiency are critical factors. Results show that the proposed algorithm offers a promising approach for TinyML incremental learning on embedded devices. The algorithm was tested on five datasets including: CIFAR10, MNIST, CORE50, HAR, Speech Commands. The findings indicated that, despite using only 43% of Floating Point Operations (FLOPs) compared to a larger fixed model, the algorithm experienced a negligible accuracy loss of just 1%. In addition, the presented method is memory efficient. While state-of-the-art incremental learning is usually very memory intensive, the method requires only 1% of the original data set.