Functional Error Correction for Robust Neural Networks

Submitted by admin on Mon, 06/10/2024 - 05:00

When neural networks (NeuralNets) are implemented in hardware, their weights need to be stored in memory devices. As noise accumulates in the stored weights, the NeuralNet's performance will degrade. This paper studies how to use error correcting codes (ECCs) to protect the weights. Different from classic error correction in data storage, the optimization objective is to optimize the NeuralNet's performance after error correction, instead of minimizing the Uncorrectable Bit Error Rate in the protected bits.

Extracting Robust and Accurate Features via a Robust Information Bottleneck

Submitted by admin on Mon, 06/10/2024 - 05:00

We propose a novel strategy for extracting features in supervised learning that can be used to construct a classifier which is more robust to small perturbations in the input space. Our method builds upon the idea of the information bottleneck, by introducing an additional penalty term that encourages the Fisher information of the extracted features to be small when parametrized by the inputs. We present two formulations where the relevance of the features to output labels is measured using either mutual information or MMSE.

Physical Layer Communication via Deep Learning

Submitted by admin on Mon, 06/10/2024 - 05:00

Reliable digital communication is a primary workhorse of the modern information age. The disciplines of communication, coding, and information theories drive the innovation by designing efficient codes that allow transmissions to be robustly and efficiently decoded. Progress in near optimal codes is made by individual human ingenuity over the decades, and breakthroughs have been, befittingly, sporadic and spread over several decades. Deep learning is a part of daily life where its successes can be attributed to a lack of a (mathematical) generative model.

Expression of Fractals Through Neural Network Functions

Submitted by admin on Mon, 06/10/2024 - 05:00

To help understand the underlying mechanisms of neural networks (NNs), several groups have studied the number of linear regions ℓ of piecewise linear (PwL) functions, generated by deep neural networks (DNN). In particular, they showed that ℓ can grow exponentially with the number of network parameters p, a property often used to explain the advantages of deep over shallow NNs.

MaxiMin Active Learning in Overparameterized Model Classes

Submitted by admin on Mon, 06/10/2024 - 05:00

Generating labeled training datasets has become a major bottleneck in Machine Learning (ML) pipelines. Active ML aims to address this issue by designing learning algorithms that automatically and adaptively select the most informative examples for labeling so that human time is not wasted labeling irrelevant, redundant, or trivial examples. This paper proposes a new approach to active ML with nonparametric or overparameterized models such as kernel methods and neural networks.

The Information Bottleneck Problem and its Applications in Machine Learning

Submitted by admin on Mon, 06/10/2024 - 05:00

Inference capabilities of machine learning (ML) systems skyrocketed in recent years, now playing a pivotal role in various aspect of society. The goal in statistical learning is to use data to obtain simple algorithms for predicting a random variable Y from a correlated observation X. Since the dimension of X is typically huge, computationally feasible solutions should summarize it into a lower-dimensional feature vector T, from which Y is predicted.

Understanding GANs in the LQG Setting: Formulation, Generalization and Stability

Submitted by admin on Mon, 06/10/2024 - 05:00
Generative Adversarial Networks (GANs) have become a popular method to learn a probability model from data. In this paper, we provide an understanding of basic issues surrounding GANs including their formulation, generalization and stability on a simple LQG benchmark where the generator is Linear, the discriminator is Quadratic and the data has a high-dimensional Gaussian distribution.

Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning

Submitted by admin on Mon, 06/10/2024 - 05:00
We consider distributed gradient descent in the presence of stragglers. Recent work on gradient coding and approximate gradient coding have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are stragglers-that is, slow or non-responsive. In this work we propose an approximate gradient coding scheme called Stochastic Gradient Coding (SGC), which works when the stragglers are random.

Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks

Submitted by admin on Mon, 06/10/2024 - 05:00

Many modern neural network architectures are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Sufficiently overparameterized neural network architectures in principle have the capacity to fit any set of labels including random noise. However, given the highly nonconvex nature of the training landscape it is not clear what level and kind of overparameterization is required for first order methods to converge to a global optima that perfectly interpolate any labels.