Powered by
2025 ACM SIGPLAN International Symposium on Memory Management (ISMM 2025),
June 17, 2025,
Seoul, Republic of Korea
2025 ACM SIGPLAN International Symposium on Memory Management (ISMM 2025)
Frontmatter
Welcome from the Chairs
It is with great pleasure that we welcome you to the 2025 ACM SIGPLAN International Symposium on Memory Management (ISMM '25)! This is the 24th event in the ISMM series. Continuing the expanded scope from last year, we encouraged submissions and participation from related fields such as computer architecture and operating systems in addition to the programming languages community.
Papers
EMD: Fair and Efficient Dynamic Memory De-bloating of Transparent Huge Pages
Parth Gangar,
Ashish Panwar, and
K. Gopinath
(Fujitsu Research, India; Microsoft Research, India; Rishihood University, India)
Recent processors rely on huge pages to reduce the cost
of virtual-to-physical address translation. However, huge
pages are notorious for creating memory bloat – a phenomenon
wherein the OS ends up allocating more physical
memory to an application than its actual requirement.
This extra memory can be reclaimed by the OS
via de-bloating at runtime. However, we find that current
OS-level solutions either lack support for dynamic memory
de-bloating, or suffer from performance and fairness
pathologies while de-bloating.
We address these issues with EMD (Efficient Memory
De-bloating). The key insight in EMD is that different
regions in an application’s address space exhibit
different amounts of memory bloat. Consequently, the
tradeoff between memory efficiency and performance
varies significantly within a given application e.g., we
find that memory bloat is typically concentrated in specific
regions, and de-bloating them leads to minimal
performance impact. Hinged on this insight, EMD employs
a prioritization scheme for fine-grained, efficient,
and fair reclamation of memory bloat. EMD improves
performance by up to 69% compared to HawkEye — a
state-of-the-art OS-based huge page management system.
EMD also eliminates fairness concerns associated
with dynamic memory de-bloating.
Arborescent Garbage Collection: A Dynamic Graph Approach to Immediate Cycle Collection
Frédéric Lahaie-Bertrand,
Léonard Oest O'Leary,
Olivier Melançon,
Marc Feeley, and
Stefan Monnier
(Université de Montréal, Canada)
Reclaiming cyclic garbage has been a long-standing challenge in automatic
memory management. Common approaches to this problem often involve extending
reference counting with an asynchronous background task to reclaim cycles.
While this ensures that cycles are eventually collected, it also introduces
unpredictable behaviours, making these approaches unsuitable for applications
where deterministic collection is required.
This paper introduces Arborescent Garbage Collection, a synchronous memory
management algorithm that immediately reclaims unreachable memory objects,
including cyclic structures. Inspired by single-source reachability algorithms
on dynamic graphs, it extends the idea of embedding a spanning forest in a
program's reference graph to track the reachability of any object from a root.
When a reference is removed, the algorithm efficiently rebuilds the forest and
immediately reclaims the memory of objects that are no longer reachable. The
result is a garbage collection algorithm suitable for applications that
require immediate memory reclamation and predictable behaviour.
SecureMind: A Framework for Benchmarking Large Language Models in Memory Bug Detection and Repair
Huanting Wang,
Dejice Jacob,
David Kelly,
Yehia Elkhatib,
Jeremy Singer, and
Zheng Wang
(University of Leeds, UK; University of Glasgow, UK)
Large language models (LLMs) hold great promise for automating software vulnerability detection and repair, but ensuring their correctness remains a challenge. While recent work has developed benchmarks for evaluating LLMs in bug detection and repair, existing studies rely on hand-crafted datasets that quickly become outdated. Moreover, systematic evaluation of advanced reasoning-based LLMs using chain-of-thought prompting for software security is lacking.
We introduce SecureMind, an open-source framework for evaluating LLMs in vulnerability detection and repair, focusing on memory-related vulnerabilities. SecureMind provides a user-friendly Python interface for defining test plans, which automates data retrieval, preparation, and benchmarking across a wide range of metrics.
Using SecureMind, we assess 10 representative LLMs, including 7 state-of-the-art reasoning models, on 16K test samples spanning 8 Common Weakness Enumeration (CWE) types related to memory safety violations. Our findings highlight the strengths and limitations of current LLMs in handling memory-related vulnerabilities.
Compiler-Assisted Crash Consistency for PMEM
Yun Joon Soh,
Sihang Liu,
Steven Swanson, and
Jishen Zhao
(University of California San Diego, USA; University of Waterloo, Canada)
Writing crash-consistent programs for memory-semantic storage such as persistent memory (PMEM) is error-prone and cumbersome. Programmers must implement both the main logic and the recovery logic to ensure data consistency after unexpected power failures. Prior work has reduced this burden using compiler-assisted logging techniques to enforce crash consistency. However, these techniques often apply persistence uniformly, limiting support for diverse programming models and incurring high logging overhead.
We present SSAPP (Statically and Systematically Automated Persistence is Possible), a compiler extension that transparently adds crash consistency to the main logic and automatically generates tailored recovery code. SSAPP persists transient state with low overhead during main logic execution and makes principled resumption decisions during post-failure recovery. Based on these decisions, the generated recovery code correctly completes the interrupted operation. This design supports a broader range of programming models — including lock-free data structures — while reducing crash consistency overhead.
We evaluate SSAPP on transactional benchmarks, lock-based, and lock-free data structures. With minimal developer effort, SSAPP converts volatile lock-free data structures into crash-consistent ones, achieving performance comparable to Mirror, a hand-optimized persistent data structure library. SSAPP also outperforms Clobber-NVM, a prior compiler-based PMEM system, achieving 1.8× higher throughput.
TierTrain: Proactive Memory Tiering for CPU-Based DNN Training
Sathvik Swaminathan,
Sandeep Kumar,
Aravinda Prasad, and
Sreenivas Subramoney
(Intel Labs, India)
Deep neural networks (DNNs) are one of the popular models for learning relationships between complex data. Training a DNN model is a compute- and memory-intensive operation. The size of modern DNN models spans into the terabyte region, requiring multiple accelerators to train -- driving up the training cost. Such humongous memory requirements shift the focus toward memory rather than computation.
CPU-memory, on the other hand, can be scaled to several terabytes with new emerging memory technologies such as HBM and CXL-attached memories. Furthermore, recent advancements to the CPUs in terms of dedicated instructions for DNN training and inference are bridging the compute gap between CPUs and accelerators.
Proposed is an exploratory work in the direction of cost-effective DNN training on CPUs where we aim to alleviate memory management challenges in DNN training. We propose TierTrain, a novel memory tiering solution based on a dynamic queuing system that leverage the periodic and deterministic memory access behavior in DNN training to manage data placement across memory tiers. TierTrain proactively manages tensors by aggressively offloading them to slow memory tiers (NVMM, CXL) and timely prefetching them back to fast memory tiers (HBM, DRAM). Our evaluation of TierTrain on a tiered memory system with a real CXL-attached memory used for memory expansion and NVMM as a low cost memory results in average fast memory footprint reduction of 59–83% and peak fast memory footprint reduction of 25–74% with a performance overhead of 1–16%. In a memory-constrained scenario, TierTrain outperforms the state-of-the-art tiering by improving the performance by 35–84% for a set of popular DNN training models.
Reconsidering Garbage Collection in Julia: A Practitioner Report
Luis Eduardo de Souza Amorim,
Yi Lin,
Stephen M. Blackburn,
Diogo Netto,
Gabriel Baraldi,
Nathan Daly,
Antony L. Hosking,
Kiran Pamnany, and
Oscar Smith
(Australian National University, Australia; Google, Australia; RelationalAI, USA; JuliaHub, USA)
Julia is a dynamically-typed garbage-collected language designed for high performance. Julia has a non-moving tracing collector, which, while performant, is subject to the same unavoidable fragmentation and lack of locality as all other non-moving collectors. In this work, we refactor the Julia runtime with the goal of supporting different garbage collectors, including copying collectors. Rather than integrate a specific collector implementation, we implement a third-party heap interface that allows Julia to work with various collectors, and use that to implement a series of increasingly more advanced designs. Our description of this process sheds light on Julia's existing collector and the challenges of implementing copying garbage collection in a mature, high-performance runtime.
We have successfully implemented a third-party heap interface for Julia and demonstrated its utility through integration with the MMTk garbage collection framework. We hope that this account of our multi-year effort will be useful both within the Julia community and the garbage collection research community, as well as providing insights and guidance for future language implementers on how to achieve high-performance garbage collection in a highly-tuned language runtime.
Lifetime Dispersion and Generational GC: An Intellectual Abstract
Stephen Dolan
(Jane Street, UK)
The effectiveness of generational garbage collection is usually explained through the generational hypothesis, that “most objects die young”.
Despite its simplicity, the generational hypothesis leaves some things to be desired: it is not obvious how it can be measured as a property of a program (independent of a particular GC strategy), it is not composable (in that it does not follow from a larger program by being true of its parts), and even its connection to the effectiveness of generational GC is murkier than it may first appear.
We propose instead lifetime dispersion as a measure of how generational a program’s objects are, and explain how it can be quantified by the Gini coefficient. We show that this measure is both composable, and directly connected to effectiveness of generational collection.
Fully Randomized Pointers
Sai Dhawal Phaye,
Gregory J. Duck,
Roland H. C. Yap, and
Trevor E. Carlson
(National University of Singapore, Singapore)
Memory errors continue to be a critical concern for programs written in low-level programming languages such as C and C++. Many different memory error defenses have been proposed, each with varying trade-offs in terms of overhead, compatibility, and attack resistance. Some defenses are highly compatible but only provide minimal protection, and can be easily bypassed by knowledgeable attackers. On the other end of the spectrum, capability systems offer very strong (unforgeable) protection, but require novel software and hardware implementations that are incompatible by definition. The challenge is to achieve both very strong protection and high compatibility.
In this paper, we propose Fully Randomized Pointers (FRP) as a strong memory error defense that also maintains compatibility with existing binary software. The key idea behind FRP is to design a new pointer encoding scheme that allows for the full randomization of most pointer bits, rendering even brute force attacks impractical. We design a FRP encoding that is: (1) compatible with existing binary code (recompilation not needed); and (2) decoupled from the underlying object layout. FRP is prototyped as: (i) a software implementation (BlueFat) to test security and compatibility; and (ii) a proof-of-concept hardware implementation (GreenFat) to evaluate performance. We show FRP is secure, practical, and compatible at the binary level, while our hardware implementation achieves low performance overheads (<4%).
Reworking Memory Management in CRuby: A Practitioner Report
Kunshan Wang,
Stephen M. Blackburn,
Peter Zhu, and
Matthew Valentine-House
(Australian National University, China; Google, Australia; Australian National University, Australia; Shopify, Canada; Shopify, UK)
Ruby is a dynamic programming language that was first released in 1995 and remains heavily used today.
Ruby underpins Ruby on Rails, one of the most widely deployed web application frameworks.
The scale at which Rails is deployed has placed increasing pressure on the underlying CRuby implementation, and in particular its approach to memory management.
CRuby implements a mark-sweep garbage collector which until recently was non-moving and only allocated fixed-size 40-byte objects, falling back to malloc to manage all larger objects.
This paper reports on a multi-year academic-industrial collaboration to rework CRuby's approach to memory management with the goal of introducing modularity and the ability to incorporate modern high performance garbage collection algorithms.
This required identifying and addressing deeply ingrained assumptions across many aspects of the CRuby runtime.
We describe the longstanding CRuby implementation and enumerate core challenges we faced and lessons they offer.
Our work has been embraced by the Ruby community, and the refactorings and new garbage collection interface we describe have been upstreamed.
We look forward to this work being used to deploy a new class of garbage collectors for Ruby.
We hope that this paper will provide important lessons and insights for Ruby developers, garbage collection researchers and language designers.
Gray-in-Young: A Generational Garbage Collection for Processing-in-Memory
Ryu Morimoto,
Kazuki Ichinose, and
Tomoharu Ugawa
(University of Tokyo, Japan)
Processing-in-memory (PIM) is a promising approach to overcome the performance bottleneck caused by the gap between CPU speed and memory speed, known as the memory wall problem.
The UPMEM PIM-enabled memory is the first commercialized general-purpose PIM accelerator, to which the program running on the host CPU offloads computation kernels.
Its DRAM Processing Units (DPUs) are general-purpose processors and have the flexibility to run various computation kernels.
However, there is no support for programming in managed languages with garbage collection (GC).
In this paper, we design GC for DPUs, which is a key component of managed runtimes.
Our GC is a parallel generational GC, whose young space is in scratch pad memory (SPM).
The GC updates pointers in promoting objects before copying them to old space in DRAM to reduce DRAM accesses.
It also determines class information needed for minor GC of each computation kernel at compile time and caches them in SPM.
The major GC routines are compiled in a separate binary so that the binaries of computation kernels fit in 24 KB of program memory.
The evaluation results using a micro benchmark showed that our proposed techniques reduced up to 85.9% DRAM accesses and improved performance by 46.2% for our benchmark.
The GC scaled up to 11 threads, and the remaining code size for the GC routine was only 4.3 KB after separating 6.9 KB of major GC.
proc time: 6.52