AIware 2026
3rd ACM International Conference on AI-Powered Software (AIware 2026)
Powered by
Conference Publishing Consulting

3rd ACM International Conference on AI-Powered Software (AIware 2026), July 6–7, 2026, Montreal, QC, Canada

AIware 2026 – Preliminary Table of Contents

Contents - Abstracts - Authors

Frontmatter

Title Page
Article: fseaiware26foreword-fm000-p
Welcome from the Chairs
Article: fseaiware26foreword-fm001-p
AIware 2026 Organization
Article: fseaiware26foreword-fm002-p
Sponsors
Article: fseaiware26foreword-fm003-p

Papers

Quality and Security Signals in AI-Generated Python Refactoring Pull Requests
Mohamed Almukhtar, Anwar Ghammam, and Hua Ming
(University of Michigan at Flint, USA; University of Michigan at Dearborn, USA)
Article Search Article: fseaiware26main-pp063-p
Configuring Agentic AI Coding Tools: An Exploratory Study
Matthias Galster, Seyedmoein Mohsenimofidi, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, and Sebastian Baltes
(Otto-Friedrich-Universität Bamberg, Germany; Ruprecht-Karls-Universität Heidelberg, Germany; Singapore Management University, Singapore)
Article Search Article: fseaiware26main-pp062-p
Can LLMs Really Reason about Code? Studying How Well LLMs Understand the Relation between Input, Code, and Output
Norman Becker, Tural Mammadov, and Andreas Zeller
(CISPA Helmholtz Center for Information Security, Germany)
Article Search Article: fseaiware26main-pp061-p
Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap
Christoph Treude
(Singapore Management University, Singapore)
Article Search Article: fseaiware26main-pp056-p
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation
Fazle Rabbi, Soumit Kanti Saha, and Jinqiu Yang
(Concordia University, Canada)
Article Search Article: fseaiware26main-pp054-p
Deterministic vs. LLM-Controlled Orchestration for COBOL-to-Python Modernization
Naing Oo Lwin and Rajesh Kumar
(Bucknell University, USA)
Article Search Article: fseaiware26main-pp053-p
Using Mutation-Analysis to Examine an LLM’s Ability to Summarize Code
Lara Khatib, Michael Pu, Bogdan Vasilescu, and Meiyappan Nagappan
(University of Waterloo, Canada; Carnegie Mellon University, USA)
Article Search Article: fseaiware26main-pp051-p
Collaborator or Assistant? How AI Coding Agents Partition Work across Pull Request Lifecycles
Young Jo Chung and Safwat Hassan
(University of Toronto, Canada)
Article Search Article: fseaiware26main-pp050-p
Testing AIware Systems: A Software Engineering Survey
Karla Gonzalez and Mariam El Mezouar
(Royal Military College of Canada, Canada)
Article Search Article: fseaiware26main-pp049-p
Zombie Agents: Detecting Semantic Livelock in Long-Horizon Autonomous Software
Simarjot Khanna
(Independent Researcher, Canada)
Article Search Article: fseaiware26main-pp047-p
Executable but Unlearnable: Designing Code That Resists LLM-Based Learning
Viraaji Mothukuri and Reza M. Parizi
(Kennesaw State University, USA)
Article Search Article: fseaiware26main-pp044-p
From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines
Marcus Emmanuel Barnes, Taher A. Ghaleb, and Safwat Hassan
(University of Toronto, Canada; Trent University, Canada)
Article Search Article: fseaiware26main-pp043-p
From Code Review to Spec-Driven Contracts: A Vision for Auditable AIWare Systems
Mohammad Hamdaqa and Moataz Chouchen
(Polytechnique Montréal, Canada; Université de Montréal, Canada; Concordia University, Canada)
Article Search Article: fseaiware26main-pp042-p
Operationalizing Ethics for AI Agents: How Developers Encode Values into Repository Context Files
Christoph Treude, Sebastian Baltes, and Marc Cheong
(Singapore Management University, Singapore; Ruprecht-Karls-Universität Heidelberg, Germany; University of Melbourne, Australia)
Article Search Article: fseaiware26main-pp041-p
Detecting Unsoundness in Neural Network Verifiers via Concrete–Abstract Consistency
Kaijie Liu and Yulei Sui
(UNSW, Australia)
Article Search Article: fseaiware26main-pp040-p
Artifact Readiness Gates with Saturation Stop Rules and Host-Parity Admissibility for FM Release Evaluation
Yanick Kanyiki
(InvarLock, Canada)
Article Search Article: fseaiware26main-pp039-p
When AI Coding Assistants Leak Training Data: A Study of LLM Memorization in Code Generation
Xiaoyu Cheng, Kundi Yao, Pengyu Nie, and Weiyi Shang
(University of Waterloo, Canada; Ontario Tech University, Canada)
Article Search Article: fseaiware26main-pp038-p
SOSecure: The Wisdom of the Crowd for Safer AI-Generated Code
Manisha Mukherjee and Vincent Josua Hellendoorn
(Carnegie Mellon University, USA; Google, USA)
Article Search Article: fseaiware26main-pp037-p
From Correctness to Consistency: Redefining Reliability for the Agentware Era
Xue Qin and Mauricio Gruppi
(Villanova University, USA)
Article Search Article: fseaiware26main-pp034-p
An Empirical Study of Reasoning Steps in Thinking Code LLMs
Haoran Xue, Gias Uddin, and Song Wang
(York University, Canada)
Article Search Article: fseaiware26main-pp033-p
VeriTrans: Fine-Tuned LLM-Assisted NL→PL Translation via a Deterministic Neuro-symbolic Pipeline
Xuan Liu, Dheeraj Kodakandla, Kushagra Srivastva, and Mahfuza Farooque
(Pennsylvania State University, USA)
Article Search Article: fseaiware26main-pp032-p
Is Artificial Intelligence an Elixir to the Software Engineering Community? An Empirical Study among Managers
Xin Zhao, Brian Vu, and Sitesh Pattanaik
(Seattle University, USA; University of California at Irvine, USA)
Article Search Article: fseaiware26main-pp031-p
Kubernetes Misconfigurations in the Wild: Taxonomy, Evolution, and Automated Repair with Large Language Models
Mostafa Anouar Ghorab, Ahmad Abdel Latif, and Mohamed Aymen Saied
(Université Laval, Canada; University of Calgary, Canada)
Article Search Article: fseaiware26main-pp030-p
When Code Authors Are Agents: A Large-Scale Study of Human–Agent Collaboration in Pull Requests
Anthonia Oluchukwu Njoku, Zohreh Sharafi, and Foutse Khomh
(Polytechnique Montréal, Canada)
Article Search Article: fseaiware26main-pp028-p
VISOR: A Vision-Language Model-Based Test Oracle for Testing Robots
Prasun Saurabh, Pablo Valle, Aitor Arrieta, Shaukat Ali, and Paolo Arcaini
(Simula Research Laboratory, Norway; Oslo Metropolitan University, Norway; Mondragon University, Spain; Tokyo Institute of Technology, Japan)
Article Search Article: fseaiware26main-pp025-p
Wink: Recovering from Misbehaviors in Coding Agents
Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Matteo Paltenghi, and Satish Chandra
(Meta Platforms, USA)
Article Search Article: fseaiware26main-pp024-p
Co-located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
Éric Jacopin
(Cosmic AI, France)
Article Search Artifacts Available Article: fseaiware26main-pp023-p
Towards AI as a Collaborative Partner: A Taxonomy of AI Agent Behavior in Software Engineering
Tao Dong, Sherry Shi, Harini Sampath, and Andrew Macvean
(Google, USA)
Article Search Article: fseaiware26main-pp020-p
Understanding Conversational Patterns in Multi-agent Programming: A Case Study on Fibonacci Game Development
Srijita Basu, Viktor Kjellberg, Simin Sun, Bengt Haraldsson, Md. Abu Ahammed Babu, Wilhelm Meding, Farnaz Fotrousi, and Miroslaw Staron
(University of Gothenburg, Sweden; Chalmers University of Technology, Sweden)
Article Search Article: fseaiware26main-pp019-p
Fixpad++: Automated Bug Fix Verification using LLM Agents
Mustafa Özkan İr, Mehmet Dedeler, Anıl Koyuncu, and Eray Tüzün
(Bilkent University, Türkiye)
Article Search Article: fseaiware26main-pp017-p
A Preliminary Study on Explaining Risk of Code Changes using LLM-Based Prediction Models
Yalin Liu, Kosay Jabre, Rui Abreu, Zachariah J. Carmichael, Vijayaraghavan Murali, Akshay Patel, Jun Ge, Weiyan Sun, Cong Zhang, Audris Mockus, David Khavari, Peter C. Rigby, and Nachiappan Nagappan
(Facebook, USA; Facebook, Canada; Rice University, USA; Southern Methodist University, USA; University of Tennessee at Knoxville, USA; Concordia University, Montreal, Canada)
Article Search Article: fseaiware26main-pp016-p
Towards Migrating Neural Network Implementations
Nadia Daoudi, Iván Alfonso, and Jordi Cabot
(Luxembourg Institute of Science and Technology, Luxembourg; University of Luxembourg, Luxembourg)
Article Search Article: fseaiware26main-pp014-p
Auditing Who Appears to Belong: A Large-Scale Empirical Study of Bias in Deployed Text-to-Image Systems for Software Engineering
Mohamad Kassab
(Boston University, USA)
Article Search Article: fseaiware26main-pp008-p
How Robustly Do LLMs Understand Execution Semantics?
Claudio Spiess, Prem Devanbu, and Earl T. Barr
(University of California at Davis, USA; University College London, UK)
Article Search Article: fseaiware26main-pp005-p
TriORM: Workload-Aware Neural-Symbolic Multi-objective Optimization for ORM Mapping Design
Sasan Azizian, Ayoub Hazrati, Artin Azizian, and Elham Rastegari
(Bellevue University, USA; McGill University, Canada; Creighton University, USA)
Article Search Article: fseaiware26main-pp004-p
Neural-Symbolic Multi-objective Optimization for Performance-Aware ORM Database Design
Sasan Azizian, Ayoub Hazrati, Artin Azizian, Elham Rastegari, Hamid Bagheri, and Juan Cui
(Bellevue University, USA; McGill University, Canada; Creighton University, USA; University of Nebraska-Lincoln, USA)
Article Search Article: fseaiware26main-pp002-p
A Dataset of Agentic AI Coding Tool Configurations
Matthias Galster, Seyedmoein Mohsenimofidi, Levi Böhme, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, and Sebastian Baltes
(Otto-Friedrich-Universität Bamberg, Germany; Ruprecht-Karls-Universität Heidelberg, Germany; Universität Bayreuth, Germany; Singapore Management University, Singapore)
Article Search Article: fseaiware26main-pp028-data-p
AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub
Daniel Ogenrwot and John Businge
(University of Nevada at Las Vegas, USA)
Article Search Article: fseaiware26main-pp027-data-p
SWE-Bench+: Enhanced LLM Coding Benchmark
Haoran Xue, Reem Aleithan, Nafid Enan, Gias Uddin, and Song Wang
(York University, Canada)
Article Search Article: fseaiware26main-pp024-data-p
ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation
Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, and Xiaodong Gu
(Shanghai Jiao Tong University, China; Hohai University, China; Chongqing University, China)
Article Search Article: fseaiware26main-pp023-data-p
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges
Ali Al-Kaswan, Maksim Plotnikov, Maxim Hájek, Roland Vízner, Arie van Deursen, and Maliheh Izadi
(Delft University of Technology, Netherlands)
Article Search Article: fseaiware26main-pp022-data-p
TOGBench: A Developer-Written Multi-variant Dataset and Benchmark Suite for Test Oracle Generation
Tasfia Tasnim, Matthew B. Dwyer, and Soneya Binta Hossain
(University of Texas at Dallas, USA; University of Virginia at Charlottesville, USA)
Article Search Article: fseaiware26main-pp018-data-p
CrossCommitVuln-Bench: A Dataset of Multi-commit Python Vulnerabilities Invisible to Per-Commit Static Analysis
Arunabh Majumdar
(Independent Researcher, India)
Article Search Article: fseaiware26main-pp012-data-p
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair
Fazle Rabbi and Jinqiu Yang
(Concordia University, Canada)
Article Search Article: fseaiware26main-pp011-data-p
RustBuildEq: A Benchmark for Binary Equivalence under Build Variability
Elliott Wen, Chenye Ni, Valerio Terragni, and Jens Dietrich
(University of Auckland, New Zealand; Massey University, New Zealand)
Article Search Article: fseaiware26main-pp010-data-p
AgentTelemetry: A Fault Detection Benchmark and Toolkit for LLM Agent Observability
Krishna Chaitanya Balusu
(Facebook, USA)
Article Search Artifacts Available Article: fseaiware26main-pp009-data-p
SecVulEval: Context-Aware Benchmarking of LLMs for Vulnerability Detection
Md Basim Uddin Ahmed, Nima Shiri Harzevili, Jiho Shin, Hung Viet Pham, and Song Wang
(York University, Canada; Queen's University, Canada)
Article Search Article: fseaiware26main-pp008-data-p
SecMutBench: Evaluating LLM-Generated Security Tests via Mutation-Based Vulnerability Detection
Mariam ALMutairi and Chang-Tien Lu
(Virginia Polytechnic Institute, USA)
Article Search Article: fseaiware26main-pp007-data-p
JunoBench: A Benchmark Dataset of Crashes in Python Machine Learning Jupyter Notebooks
Yiran Wang, José Antonio Hernández López, Ulf Nilsson, and Dániel Varró
(Linköping University, Sweden; University of Murcia, Spain)
Article Search Artifacts Available Article: fseaiware26main-pp006-data-p
REBench: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names
Jun Yeon Won, Xin Jin, Shiqing Ma, and Zhiqiang Lin
(Ohio State University, Columbus, USA; Meta, USA; University of Massachusetts at Amherst, USA)
Article Search Artifacts Available Article: fseaiware26main-pp003-data-p

proc time: 40.98