ICLR 2023: Unveiling the Latest Trends and Innovations in Machine Learning

Date: 30/04/2023

The International Conference on Learning Representations (ICLR) is one of the most prestigious and influential conferences in the field of machine learning. ICLR 2023, to be held in Kigali, Rwanda, from May 1 to May 5, promises to be an exciting event that will bring together leading researchers, practitioners, and industry experts to discuss the latest trends and innovations in machine learning.

The conference will feature a diverse range of topics, from reinforcement learning and generative models to graph representation learning and federated learning. Through keynote speeches, paper presentations, and workshops, attendees will have the opportunity to explore cutting-edge research, share insights and best practices, and network with peers.

ICLR 2023 is not only a platform for showcasing the latest research findings but also a forum for discussing the underlying trends and common aspects of the field. As machine learning continues to evolve and expand, the conference will provide a unique opportunity to gain a comprehensive understanding of the state-of-the-art in the field.

VosViewer visualization of the conference

Reinforcement Learning Advances

Recent advances in reinforcement learning show a trend towards combining offline and online datasets, proposing new update rules and algorithms, and introducing new environments and settings for training.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Hybrid Q-learning [1]Hybrid RL: Using both offline and online data can make RL efficient

Hybrid Q-learning algorithm, which combines offline dataset and online interaction, outperforms state-of-the-art RL baselines on challenging benchmarks. combines offline data with online interaction to outperform state-of-the-art RL baselines, while Extreme Q-Learning [2]Extreme Q-Learning: MaxEnt RL without Entropy

A new update rule for online and offline RL, Extreme Q-Learning, models the maximal Q-value using Extreme Value Theory. models the maximal Q-value using Extreme Value Theory. DiL-piKL [3]Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

A planning algorithm called DiL-piKL is introduced to achieve optimal performance in a complex strategy game called No-press Diplomacy. presents a planning algorithm to optimize performance in No-press Diplomacy, while observational adversarial attackers [4]On the Robustness of Safe Reinforcement Learning under Observational Perturbations

This paper proposes two new approaches to designing effective observational adversarial attackers for safe reinforcement learning, and evaluates them via comprehensive experiments. are proposed for safe reinforcement learning. HECOGrid [5]Stateful Active Facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning

The paper presents HECOGrid, a suite of multi-agent RL environments, and proposes a Centralized Training Decentralized Execution approach that outperforms baselines. offers a suite of multi-agent RL environments, and VIP [6]VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

VIP is a self-supervised pre-trained visual representation that uses human videos to generate dense and smooth reward functions for unseen robotic tasks. uses self-supervised pre-trained visual representation to generate reward functions. Moreover, explore deep RL analysis with learnable representations of neural dynamics [7]Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics

A novel algorithm is proposed to analyze deep reinforcement learning by designing exploration incentives via learnable representations of neural dynamics. and algorithm distillation [8]In-context Reinforcement Learning with Algorithm Distillation

Algorithm Distillation is a method for distilling reinforcement learning algorithms into neural networks by modeling their training histories.. Lastly, the dichotomy of control [9]Dichotomy of Control: Separating What You Can Control from What You Cannot

The dichotomy of control (DoC) is a new future-conditioned supervised learning framework that separates mechanisms within a policy's control from those outside, improving reinforcement learning in highly stochastic environments. is introduced to improve RL in highly stochastic environments.

Improving Machine Learning Models

Machine learning models' accuracy and robustness are improved by new methods that adjust the confidence threshold, mitigate model bias, and achieve fairness generalization.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Learn about TabPFN [1]TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

TabPFN is a Prior-Data Fitted Network trained offline to approximate Bayesian inference on small tabular datasets, achieving competitive classification results., a network that approximates Bayesian inference and achieves competitive classification results. Discover Transductive Confidence Minimization (TCM) [2]Conservative Prediction via Transductive Confidence Minimization

Transductive Confidence Minimization (TCM) is a new method for improving machine learning models' accuracy in safety-critical settings., a novel method for improving accuracy in safety-critical settings. Distinguish between model bias and sample bias with the semantic scale of classes [3]Delving into Semantic Scale Imbalance

The semantic scale of classes is defined and quantified to measure and mitigate model bias on sample-balanced datasets.. Improve semi-supervised learning with FreeMatch [4]FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

FreeMatch is a new semi-supervised learning method that adjusts the confidence threshold in a self-adaptive manner, resulting in superior performance. and achieve classification and fairness generalization with the FIFA approach [5]FIFA: Making Fairness More Generalizable in Classifiers Trained on Imbalanced Data

The proposed FIFA approach is a theoretically-principled and flexible method that can achieve both classification and fairness generalization.. Achieve state-of-the-art performance with Mutual Learning based PLL [6]Mutual Partial Label Learning with Competitive Label Noise

The proposed Mutual Learning based PLL approach, ML-PLL, demonstrates state-of-the-art performance for partial label learning with competitive label noise., and improve uncertainty estimation and sampling with Introspective Self-play (ISP) [7]Pushing the Accuracy-Group Robustness Frontier with Introspective Self-play

Introspective Self-play (ISP) improves the uncertainty estimation of deep neural networks and sampling of underrepresented subgroups for active learning.. Improve neural network classifiers' performance with last layer retraining [8]Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

Last layer retraining can improve neural network classifiers' performance on spurious correlation benchmarks while reducing computational expenses and improving robustness. and enhance training stability with Environment Label Smoothing (ELS) [9]Free Lunch for Domain Adversarial Training: Environment Label Smoothing

Environment Label Smoothing (ELS) is proposed to improve training stability and robustness to noisy environment labels in Domain Adversarial Training (DAT).. Finally, explore the new Test-Time Adaptation Benchmark, TTAB [10]On Pitfalls of Test-Time Adaptation

A new Test-Time Adaptation Benchmark, TTAB, identifies three common pitfalls in prior efforts to tackle distribution shifts., to identify common pitfalls in prior efforts to tackle distribution shifts.

Novel Machine Learning Algorithms

Recent research in machine learning has focused on developing novel algorithms and models to improve optimization, representation learning, and meta-learning.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Discover novel Bayesian meta-learning model GMM-NP [1]Accurate Bayesian Meta-Learning by Accurate Task Posterior Inference

GMM-NP, a novel Bayesian meta-learning model, combines VI with natural gradients and trust regions to improve epistemic uncertainty estimation and accuracy., which improves uncertainty estimation and accuracy via VI, natural gradients, and trust regions. Meanwhile, explore the use of neural networks to infer causal graph structures [2]Learning to Induce Causal Structure

A neural network architecture learns to infer the underlying graph structure in causal induction via supervised training on synthetic graphs. and the acceleration of Hamiltonian Monte Carlo sampling with Chebyshev polynomials [3]Accelerating Hamiltonian Monte Carlo via Chebyshev Integration Time

A new scheme based on Chebyshev polynomials is proposed to accelerate the Hamiltonian Monte Carlo method for sampling from distributions.. Gromov-Wasserstein Autoencoders [4]Gromov-Wasserstein Autoencoders

Gromov-Wasserstein Autoencoders (GWAE) is a new representation learning method that matches latent and data distributions directly, introducing meta-priors. introduce meta-priors and match latent and data distributions directly. MORBiT [5]Min-Max Multi-objective Bilevel Optimization with Applications in Robust Machine Learning

MORBiT is a new algorithm that solves the generic min-max multi-objective bilevel optimization problem and converges to the first-order stationary point. solves the min-max multi-objective bilevel optimization problem, while adaptive gradient methods [6]Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

Adaptive gradient methods like Adam can converge to worse test errors than gradient descent, due to non-convex optimization landscapes. can lead to worse test errors than gradient descent. Analyze the properties of mini-batch SGD for linear models [7]A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta.

A new analytic framework is developed to analyze noise-averaged properties of mini-batch SGD for linear models, revealing optimal convergence rates. and examine the effectiveness of deep learning algorithms [8]Git Re-Basin: Merging Models modulo Permutation Symmetries

The success of deep learning is due to simple algorithms that exhibit surprising effectiveness in fitting large neural networks.. Finally, model population dynamics with a new Lagrangian Schrodinger bridge problem and regularized neural SDE [9]Neural Lagrangian Schr\{o}dinger Bridge: Diffusion Modeling for Population Dynamics

A new method based on Lagrangian Schrödinger bridge problem and regularized neural SDE is proposed to model population dynamics. and achieve minimax rates with parallel neural networks [10]Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

Parallel neural networks can adaptively estimate functions with heterogeneous smoothness by tuning only the weight decay, achieving minimax rates..

Self-Supervised Learning Benefits

Self-supervised learning is a powerful paradigm that leads to improved performance and efficiency in various computer vision tasks, including object detection, representation learning, and pre-training.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

DINO [1]DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

DINO is a strong end-to-end object detector that improves the performance and efficiency of previous DETR-like models. introduces an end-to-end object detector that improves the efficiency and performance of previous models, while contrastive learning with linear probing [2]The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning

Contrastive learning with linear probing is a prevalent pre-training representation paradigm, but there is a trade-off between label efficiency and universality. presents a trade-off between label efficiency and universality. Self-supervised learning is leveraged in Simplicial embeddings [3]Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Simplicial embeddings learned through self-supervised learning lead to better generalization and improved performance on natural image datasets., Sparse Large Kernel Network [4]More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity

Researchers propose Sparse Large Kernel Network (SLaK), a pure CNN architecture with sparse factorized 51x51 kernels that performs on par with Transformers and ConvNets., and Corrupted Image Modeling [6]Corrupted Image Modeling for Self-Supervised Visual Pre-Training

Corrupted Image Modeling (CIM) is introduced for self-supervised visual pre-training, achieving compelling results in vision benchmarks using ViT and CNN. to improve generalization and performance on natural image datasets. Unsupervised object discovery and semantic segmentation are made possible with recent advances in self-supervised representation learning [5]Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations

Recent advances in self-supervised representation learning enable unsupervised object discovery and semantic segmentation with state-of-the-art performance.. Additionally, MUST [7]Masked Unsupervised Self-training for Label-free Image Classification

Masked Unsupervised Self-Training (MUST) improves the performance of pre-trained zero-shot classifiers by unsupervised finetuning using abundant unlabeled data. improves pre-trained zero-shot classifiers, and SimPer [8]SimPer: Simple Self-Supervised Learning of Periodic Targets

SimPer is a novel SSL regime that uses customized augmentations and a generalized contrastive loss to learn efficient and robust periodic representations. introduces a novel SSL regime for learning efficient and robust periodic representations. Finally, PaLI [9]PaLI: A Jointly-Scaled Multilingual Language-Image Model

PaLI is a model that excels at many vision, language, and multimodal tasks, in many languages, by generating text based on visual and textual inputs. excels at many vision, language, and multimodal tasks in multiple languages.

Enhancing Language Models

Language models and NLP tasks are being improved through new pre-training frameworks and techniques, such as multimodal prompt engineering and pseudo-label training.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

APE [1]Large Language Models are Human-Level Prompt Engineers

Automatic Prompt Engineer (APE) generates and selects instructions for large language models, outperforming prior LLM baselines on 24 NLP tasks. generates and selects instructions for large language models, outperforming prior LLM baselines on 24 NLP tasks, while multimodal prompt engineering [2]Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Multimodal prompt engineering uses language as an intermediate representation to combine knowledge from different pretrained language models for various tasks. combines knowledge from different pretrained language models. Pseudo-label training [3]Pseudo-label Training and Model Inertia in Neural Machine Translation

Pseudo-label training (PLT) enhances neural machine translation (NMT) model stability to input perturbations and model updates, improving its inertia. enhances model stability and performance for neural machine translation. A unified pre-training framework [4]UL2: Unifying Language Learning Paradigms

A new unified framework for pre-training models in NLP is presented, achieving SOTA performance on 50 supervised NLP tasks. achieves state-of-the-art performance across 50 supervised NLP tasks, and MPCFORMER [5]MPCFORMER: FAST, PERFORMANT AND PRIVATE TRANSFORMER INFERENCE WITH MPC

MPCFORMER is a framework that uses Secure Multi-Party Computation and Knowledge Distillation to speed up Transformer inference in MPC settings while achieving similar ML performance. speeds up Transformer inference in MPC settings. LexMAE [6]LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

A new pre-training framework called LexMAE is proposed to bridge the gap between language modeling and lexicon-weighting retrieval. bridges the gap between language modeling and lexicon-weighting retrieval, and PIXEL [7]Language Modelling with Pixels

PIXEL, a pretrained language model that renders text as images, outperforms BERT on syntactic and semantic processing tasks in diverse languages. outperforms BERT on diverse languages. DiffuSeq [8]DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

DiffuSeq, a new diffusion model designed for sequence-to-sequence text generation tasks, shows comparable or better performance than six established baselines. presents a new diffusion model for sequence-to-sequence text generation tasks, while researchers investigate Neural Tangent Kernel [9]A Kernel-Based View of Language Model Fine-Tuning

Researchers investigate whether Neural Tangent Kernel describes fine-tuning of pre-trained language models and propose an explanation for subspace-based fine-tuning methods. for fine-tuning pre-trained language models. Lastly, PG-TD [10]Planning with Large Language Models for Code Generation

Planning-Guided Transformer Decoding (PG-TD) is a novel algorithm that uses a planning algorithm to guide the Transformer in generating better programs. uses a planning algorithm to guide Transformer in generating better programs.

Advancements in Generative Models

Recent advancements in generative models are enabling novel approaches for image and audio synthesis, such as using diffusion models or transformers, and improving performance on various tasks.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Conditional neural groundplans enable 2D image to 3D scene representation mapping [1]Neural Groundplans: Persistent Neural Scene Representations from a Single Image

A new method maps 2D images to 3D scene representation using conditional neural groundplans, enabling novel view synthesis and disentangled representation., while score-based modeling through stochastic differential equations is extended to discrete variables with a new stochastic jump process [2]Score-based Continuous-time Discrete Diffusion Models

A new stochastic jump process is introduced to extend score-based modeling through stochastic differential equations to discrete variables.. The use of deformable tetrahedral grids and a diffusion model produces realistic 3D meshes [3]MeshDiffusion: Score-based Generative 3D Mesh Modeling

A new method for generating realistic 3D meshes using deformable tetrahedral grids and a diffusion model is proposed., and the reversal of the heat equation generates images with emergent qualitative properties [4]Generative Modelling with Inverse Heat Dissipation

A new diffusion-like model generates images by reversing the heat equation and shows emergent qualitative properties not seen in standard diffusion models.. Denoising Diffusion Null-Space Model (DDNM) outperforms other zero-shot Image Restoration (IR) methods [5]Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

The Denoising Diffusion Null-Space Model (DDNM) is a zero-shot framework for arbitrary linear Image Restoration (IR) problems. DDNM outperforms other state-of-the-art zero-shot IR methods.. MaskViT pre-trains transformers for video prediction models [6]MaskViT: Masked Visual Pre-Training for Video Prediction

MaskViT, a new approach for video prediction models, pre-trains transformers via masked visual modeling to enable embodied agents to plan complex tasks., and DiffEdit uses text-conditioned diffusion models for semantic image editing [7]DiffEdit: Diffusion-based semantic image editing with mask guidance

DiffEdit is a new method that uses text-conditioned diffusion models for semantic image editing and automatically generates masks for editing.. Rarity score is a new evaluation metric for image generation performance [8]Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images

A new evaluation metric called rarity score is proposed to measure both image-wise uncommonness and model-wise diversified generation performance., while Motion Diffusion Model (MDM) achieves state-of-the-art results in human motion data generation [9]Human Motion Diffusion Model

Motion Diffusion Model (MDM) is a transformer-based generative model for human motion data that achieves state-of-the-art results.. Finally, BigVGAN synthesizes high-fidelity audio for numerous speakers across various recording environments without fine-tuning [10]BigVGAN: A Universal Neural Vocoder with Large-Scale Training

BigVGAN is a universal vocoder that synthesizes high-fidelity audio for numerous speakers across various recording environments without fine-tuning..

Machine Learning for Physical Systems

Cutting-edge machine learning techniques are being applied to physical systems to improve predictions and simulations, from molecular structure optimization to nonlinear dynamics discovery.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Discover the latest in protein representation learning [1]Protein Representation Learning by Geometric Structure Pretraining

A new protein representation learning method using pretraining on 3D structures outperforms sequence-based methods on function prediction tasks., with a new pretraining method that outperforms sequence-based approaches in function prediction tasks. Explore the Symbolic Physics Learner (SPL) [2]Symbolic Physics Learner: Discovering governing equations via Monte Carlo tree search

The Symbolic Physics Learner (SPL) machine is proposed to discover the mathematical structure of nonlinear dynamics from limited data., a machine that uncovers the mathematical structure of nonlinear dynamics from limited data. DiGress [3]DiGress: Discrete Denoising diffusion for graph generation

DiGress is a discrete denoising diffusion model that generates graphs with categorical attributes, achieving state-of-the-art performance on molecular and non-molecular datasets. generates graphs with categorical attributes, achieving state-of-the-art performance, while the success of vision transformers [4]The Lie Derivative for Measuring Learned Equivariance

The success of vision transformers challenges the idea that equivariance is directly encoded in architecture, and training data plays a significant role. challenges the idea of equivariance in architecture. A new data-efficient neural decoder [5]The END: An Equivariant Neural Decoder for Quantum Error Correction

Researchers introduce a data-efficient neural decoder that exploits symmetries to achieve state-of-the-art accuracy in quantum error correction. exploits symmetries to achieve state-of-the-art accuracy in quantum error correction. DiffDock [6]DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

DiffDock, a diffusion generative model, outperforms traditional and deep learning methods in molecular docking with a 38% success rate. outperforms traditional and deep learning methods in molecular docking, while LAMP [7]Learning Controllable Adaptive Simulation for Multi-resolution Physics

LAMP is a deep learning-based surrogate model that optimizes spatial resolution to simulate multi-resolution physical systems more efficiently. optimizes spatial resolution for more efficient simulation of multi-resolution physical systems. Lastly, explore the use of multivector fields and Clifford algebras [8]Clifford Neural Layers for PDE Modeling

Multivector fields and Clifford algebras are used to improve neural PDE surrogates, achieving better generalization capabilities in physical simulations. to improve neural PDE surrogates and the creation of a new physics-driven energy prediction model [9]Physics-empowered Molecular Representation Learning

A new physics-driven energy prediction model using a Transformer is proposed, trained with physical insights and self-supervision for molecular structure optimization., trained with physical insights and self-supervision. Additionally, learn about competitive PINNs (CPINNs) [10]Competitive Physics Informed Networks

An adversarial approach called competitive PINNs (CPINNs) is presented to improve the accuracy of physics-informed neural networks (PINNs) in solving partial differential equations., an adversarial approach that improves the accuracy of physics-informed neural networks (PINNs) in solving partial differential equations.

Robustness and Efficiency in Deep Learning

Recent techniques in deep learning are focused on improving the robustness of models against adversarial attacks and reducing their model size while maintaining accuracy.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

A new surrogate for neural architecture search (NAS) [1]Transfer NAS with Meta-learned Bayesian Surrogates

A new surrogate for neural architecture search (NAS) uses Bayesian Optimization and meta-learning to achieve state-of-the-art results on six computer vision datasets. achieves state-of-the-art results on six computer vision datasets, while an adversarial training method [2]Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks

A novel adversarial training method for monocular depth estimation models improves their robustness against physical-world attacks without ground truth. improves the robustness of monocular depth estimation models against physical-world attacks. The teacher-guided training (TGT) framework [3]Teacher Guided Training: An Efficient Framework for Knowledge Transfer

Teacher-guided training (TGT) framework can improve the accuracy of compact models by leveraging knowledge from pretrained generative models. leverages knowledge from pretrained generative models to improve the accuracy of compact models. Crossmodal knowledge distillation [4]The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

Crossmodal knowledge distillation transfers knowledge across modalities but its effectiveness depends on the modality relationships, as revealed by the modality focusing hypothesis. transfers knowledge across modalities, and adversarial training [5]Revisiting adapters with adversarial training

Adversarial training can act as a regularizer to improve classification accuracy by co-training a neural network on clean and adversarial inputs. acts as a regularizer to improve classification accuracy. The CorruptEncoder [6]CorruptEncoder: Data Poisoning Based Backdoor Attacks to Contrastive Learning

The proposed CorruptEncoder is a highly effective data poisoning backdoor attack on contrastive learning pre-training for both single-modal and multi-modal CL. is a highly effective data poisoning backdoor attack, and LilNetX [7]LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification

LilNetX is an end-to-end trainable technique for neural networks that achieves up to 50% smaller model size with improved inference time. achieves up to 50% smaller model size with improved inference time. Two logit calibration methods [8]Logit Margin Matters: Improving Transferable Targeted Adversarial Attack by Logit Calibration

This paper proposes two logit calibration methods to enlarge margins between targeted and untargeted classes, improving transferability in black-box targeted attacks. improve transferability in black-box targeted attacks, while a new method [9]Towards Robustness Certification Against Universal Perturbations

Researchers propose a new method to certify neural networks' robustness against universal perturbations using linear relaxation-based perturbation analysis. certifies neural networks' robustness against universal perturbations. Lastly, the Robust Neural Architecture Search by Cross-Layer Knowledge Distillation (RNAS-CL) algorithm [10]Robust Neural Architecture Search by Cross-Layer Knowledge Distillation

The new Robust Neural Architecture Search by Cross-Layer Knowledge Distillation (RNAS-CL) algorithm improves the robustness of NAS against adversarial attacks. improves the robustness of NAS against adversarial attacks.

Improving Graph Representation Learning

Graph representation learning is an active field, with recent works focusing on improving performance, scalability, and interpretability.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Discover the latest advancements in graph neural networks, such as PARETOGNN [1]Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization

PARETOGNN, a multi-task self-supervised learning framework for node representation learning over graphs, achieves the best overall performance across tasks., a multi-task self-supervised learning framework, and MetaGL [2]MetaGL: Evaluation-Free Selection of Graph Learning Models via Meta-Learning

MetaGL is a new meta-learning approach that utilizes prior performances of existing methods to automatically select effective models for graph learning., a meta-learning approach that automatically selects effective models for graph learning. GraphEx [3]GraphEx: A User-Centric Model-Level Explainer for Graph Neural Networks

GraphEx is a model-level explainer that generates diverse explanation graphs without requiring another black box deep model, improving accessibility. generates diverse explanation graphs, while SignNet and BasisNet [4]Sign and Basis Invariant Networks for Spectral Graph Representation Learning

SignNet and BasisNet are new neural architectures that are invariant to two key symmetries displayed by eigenvectors, outperforming existing baselines. display invariance to key symmetries of eigenvectors. NAGphormer [5]NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

Neighborhood Aggregation Graph Transformer (NAGphormer) treats each node as a sequence of tokens, enabling it to scale to large graphs and outperform existing graph Transformers and GNNs. scales to large graphs by treating each node as a sequence of tokens, whereas ELPH and BUDDY [6]Graph Neural Networks for Link Prediction with Subgraph Sketching

ELPH and BUDDY are novel full-graph Graph Neural Networks that efficiently approximate the key components of subgraph GNNs for link prediction. are novel full-graph Graph Neural Networks that efficiently approximate subgraph GNNs. Investigate the effectiveness of gradient-based attack methods with adversarial edges [7]Revisiting Graph Adversarial Attack and Defense From a Data Distribution Perspective

Adversarial edges are not uniformly distributed in graph neural networks, and this phenomenon explains the effectiveness of gradient-based attack methods., and improve GNNs with Gradient Gating [8]Gradient Gating for Deep Multi-Rate Learning on Graphs

Gradient Gating is a novel framework for improving Graph Neural Networks by gating the output of GNN layers to achieve state-of-the-art performance.. Lastly, solve mixed-integer linear programming instances with a predict-and-search framework [9]A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming

A predict-and-search framework combining machine learning with optimization is proposed to efficiently solve mixed-integer linear programming instances. and explore a new approach for hyperbolic networks [10]Random Laplacian Features for Learning with Hyperbolic Space

This paper proposes a new approach for hyperbolic networks, using a hyperbolic embedding, a mapping to Euclidean space, and a standard Euclidean network..

Collaborative and Privacy-Preserving Learning

Federated learning is a growing field that explores novel approaches for collaborative and privacy-preserving machine learning.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Discover novel approaches to Federated Learning with SWIFT [1]SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

SWIFT is a novel wait-free decentralized Federated Learning algorithm that allows clients to train at their own speed, achieving faster convergence., FRESCO [2]FRESCO: Federated Reinforcement Energy System for Cooperative Optimization

FRESCO is a reinforcement learning framework that implements energy markets using a hierarchical control architecture for a cleaner energy grid., FedDure [3]Federated Semi-supervised Learning with Dual Regulator

FedDure introduces a novel framework for federated semi-supervised learning that optimizes and customizes model training for specific data distributions., DP-SGD-GC [4]Improved Convergence of Differential Private SGD with Gradient Clipping

DP-SGD-GC is an effective optimization algorithm that can train machine learning models with a privacy guarantee and can achieve a vanishing utility bound without bias term., and DepthFL [7]DepthFL : Depthwise Federated Learning for Heterogeneous Clients

DepthFL is a new approach for federated learning that scales the depth of local models, allocates them based on client resources, and uses mutual self-distillation to improve accuracy., which optimize model training, improve convergence, and guarantee privacy. Learn about the potential for bias in federated learning and the need for fair algorithms [5]Bias Propagation in Federated Learning

Participating in federated learning can propagate bias against under-represented groups, highlighting the need for auditing and designing fair algorithms.. Unify personalization methods with a statistical framework [6]A Statistical Framework for Personalized Federated Learning and Estimation: Theory, Algorithms, and Privacy

A statistical framework is proposed to unify various personalization methods for federated learning, including novel private personalized estimation and learning algorithms. and introduce a new factorization for many-to-many generative models [8]Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation

A novel factorization for many-to-many generative models is introduced to improve synthetic data generation, preserving information within and across tables.. Furthermore, explore the use of few-shot transfer learning [9]Differentially Private Federated Few-shot Image Classification

Few-shot transfer learning is a parameter-efficient approach to Federated Learning that achieves state-of-the-art performance on the FLAIR benchmark. for improved performance and the use of adversarial tools for membership inference algorithms [10]Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Adversarial tools are used to improve membership inference algorithms, which trace training data back to their rightful owners in machine learning models..

Latest Trends in Neural Network Architectures

Continual learning algorithms, robust against arbitrary schedules, and efficient Bayesian network-based methods are some of the latest trends in neural network architectures.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Discover the latest advancements in continual learning [1]Is Forgetting Less a Good Inductive Bias for Forward Transfer?

Continual learning algorithms that retain past information lead to better forward transfer and learning efficiency on new tasks. and a novel algorithm [2]Schedule-Robust Online Continual Learning

A new continual learning algorithm is introduced that is robust against arbitrary schedules and outperforms existing methods on image classification benchmarks. that outperforms existing methods on image classification benchmarks. Efficiently model large neural populations with high-order correlations [3]Efficient approximation of neural population structure and correlations with probabilistic circuits

A computationally efficient Bayesian network-based method is presented to model large neural populations with high-order correlations and accuracy. and explore the brain's optimal representation of 2D space [4]Actionable Neural Representations: Grid Cells from Minimal Constraints

The brain must represent the consistent meaning of actions across space to afford flexible behavior, using actionable representations, as optimal representation of 2D space.. Incremental learning of structured memories of multiple object classes is possible with a minimal computational model [5]Incremental Learning of Structured Memory via Closed-Loop Transcription

A minimal computational model for incremental learning of structured memories of multiple object classes is proposed, achieving better performance with fewer resources.. Rescaling pre-activations and finding permutations can reduce linear interpolation barriers for deep networks [6]REPAIR: REnormalizing Permuted Activations for Interpolation Repair

Empirical investigation shows that rescaling pre-activations and finding permutations can reduce linear interpolation barrier for deep networks.. Hierarchical Variational Autoencoders [7]Neural Decoding of Visual Imagery via Hierarchical Variational Autoencoders

A novel neural network architecture using Hierarchical Variational Autoencoders can reconstruct natural images from fMRI recordings better than the state of the art. reconstruct natural images from fMRI recordings better than the state of the art. Spiking Transformer (Spikformer) [8]Spikformer: When Spiking Neural Network Meets Transformer

Spiking Transformer (Spikformer) is a powerful framework that combines Spiking Neural Networks and self-attention to achieve state-of-the-art image classification. achieves state-of-the-art image classification, while a modified MLP [9]Sparse Distributed Memory is a Continual Learner

A modified Multi-Layered Perceptron (MLP) is created that is a strong continual learner by connecting Sparse Distributed Memory (SDM) with the Transformer model. connects Sparse Distributed Memory with the Transformer model for strong continual learning. Lastly, LocalMixer [10]Scaling Forward Gradient With Local Losses

LocalMixer, a new architecture for forward gradient learning, uses local greedy loss functions to reduce variance and outperforms backprop-free algorithms. uses local greedy loss functions to reduce variance and outperforms backprop-free algorithms.

State-of-the-Art Time Series Modeling

State-of-the-art time series modeling techniques use novel architectures and attention mechanisms to improve forecasting accuracy and sequence modeling.

The 10 red nodes were picked as representatives of the cluster based on the diversity and the impact of each paper

Discover TIDER [1]Multivariate Time-series Imputation with Disentangled Temporal Representations

TIDER, a scalable matrix factorization-based method, provides disentangled temporal representations for multivariate time series imputation with interpretability., a scalable matrix factorization-based method for multivariate time series imputation that provides disentangled temporal representations with interpretability. SpaceTime [2]Effectively Modeling Time Series with Simple Discrete State Spaces

A new state-space time series architecture called SpaceTime improves expressivity, long horizon forecasting, and efficient training, achieving state-of-the-art results., a novel state-space time series architecture, outperforms previous models in expressivity, long horizon forecasting, and efficient training. GT-CausIn [3]GT-CausIn: a novel causal-based insight for traffic prediction

GT-CausIn, a novel model for traffic forecasting, integrates graph diffusion layers, TCN layers, and causal knowledge to outperform state-of-the-art models., a traffic forecasting model integrating graph diffusion layers, TCN layers, and causal knowledge, surpasses state-of-the-art models. Mega [4]Mega: Moving Average Equipped Gated Attention

Mega, a single-head gated attention mechanism equipped with moving average, improves sequence modeling benchmarks, outperforming variants of Transformers., a single-head gated attention mechanism equipped with moving average, improves sequence modeling benchmarks, while prompt-tuning for one-layer attention architectures [5]On the Role of Attention in Prompt-tuning

This work explores prompt-tuning for one-layer attention architectures and demonstrates its provable effectiveness in attending to context-relevant information. proves effective in attending to context-relevant information. Chomsky hierarchy [6]Neural Networks and the Chomsky Hierarchy

An empirical study shows that grouping tasks according to the Chomsky hierarchy can predict neural network generalization, with negative results. can predict neural network generalization, while Liquid-S4 [7]Liquid Structural State-Space Models

The Liquid-S4 model, a linear liquid time-constant state-space model, improves sequence modeling tasks with long-term dependencies, achieving state-of-the-art results. and S5 [8]Simplified State Space Layers for Sequence Modeling

A new state space layer, S5, has been introduced that achieves state-of-the-art performance on long-range sequence modeling tasks. models improve sequence modeling tasks with long-term dependencies. Finally, PatchTST [9]A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

The proposed PatchTST model for multivariate time series forecasting and self-supervised representation learning improves long-term forecasting accuracy significantly. improves long-term forecasting accuracy for multivariate time series, and ChordMixer [10]ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length

ChordMixer, a new neural network building block, models attention for long sequences with variable lengths, outperforming other neural attention models. outperforms other neural attention models in modeling attention for long sequences with variable lengths.

ICLR 2023 is a must-attend event for anyone interested in machine learning. With its diverse range of topics, world-class speakers, and opportunities for networking and collaboration, the conference promises to be an exciting and informative experience. Whether you are an experienced researcher or a newcomer to the field, ICLR 2023 is the perfect opportunity to stay up-to-date with the latest trends and innovations in machine learning.