Artificial intelligence (AI) research, particularly in the machine learning (ML) domain, continues to increase the amount of attention it receives worldwide. To give you an idea of the scientific hype around AI and ML, the number of works uploaded to the open-access pre-print archive ArXiv has nearly doubled since late 2023, with over 30K AI-related papers accessible in the repository at the end of 2024. As you may guess, most of them are ML-focused; after all, deep learning architectures, generative AI solutions, and almost all computer vision and natural language processing systems nowadays are, in essence, ML systems that learn from data to perform increasingly astonishing tasks.
This article lists 5 of the most influential ML papers that largely shaped AI research trends throughout 2024. While the links provided are to their version in ArXiv repository, these papers are published or in the publication process in top conferences or journals.
1. Vision Transformers Need Registers (T. Darcet et al.)
This paper received one of the latest Outstanding Paper Awards at the International Conference of Learning Representations (ICLR 2024) and, whilst is has only been published in ArXiv in recent months, it is quickly attracting high audiences and citations.
The authors investigate vision transformers’ issue of occasionally generating high-value tokens—in less important image areas, like backgrounds. They address this by adding extra tokens to the input called register tokens, thereby improving model performance and enabling better results in visual tasks like object detection.
2. Why Larger Language Models Do In-context Learning Differently? (Z. Shi et al.)
This highly-cited study released in late spring 2024 shows that small language models (SLMs) are more robust to noise and “less easily distracted” than their larger counterparts (LLMs), due to how they put emphasis on a narrower selection of hidden features — the features learned throughout the encoder and decoder layers of their transformer architecture — compared to LLMs. The study sheds light on a new level of understanding and interpreting the way these complex models operate.
3. The Llama 3 Herd of Models (A. Grattafiori et al.)
With nearly 600 co-authors in a single paper, this massive study has gained thousands of citations and arguably many more views since its first publication was published in July 2024. While still not released to the public, the paper introduces Meta’s new 405B-parameter multilingual language models whose performance matches GPT-4’s across various tasks. It integrates multimodal capabilities via a compositional approach, performing competitively in use cases like image, video, and speech recognition.
4. Gemma: Open Models Based on Gemini Research and Technology (T. Mesnard et al.)
Another highly co-authored paper with over 100 contributors, published in spring 2024, this work presents two of Google’s newest models, sized 2 billion and 7 billion parameters respectively. Based on a similar technology to the Gemini models, Gemma models outperform similarly sized models in nearly 70% of the language tasks investigated. The study also provides an analysis and reflection on the safety and responsibility aspects of these massive LLMs.
5. Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (K. Tian et al.)
This list couldn’t be wrapped up without mentioning the latest award-winning paper at one of the most prestigious global conferences in its 2024 edition: NeurIPS. The paper introduces Visual AutoRegressive modeling (VAR), a new image generation approach that predicts images in stages ranging between coarse and fine resolutions, yielding efficient training and enhanced performance. VAR outperforms state-of-the-art diffusion transformers in visual tasks like in-painting and editing while showcasing scaling properties similar to LLMs.