SecDev 2026 – Proceedings

Formal Is Fast: Cryptographic Code in the Age of AI (Keynote)
Roderick Chapman
(Amazon Web Services, UK)
This talk will go over our approach to the development and verification of post-quantum cryptographic code at AWS. It will cover our approach to assembly-language verification, and how we are verifying C code within AWS LibCrypto. Proof also enables “fearless optimization” of crypto code, where proofs of correctness and/or equivalence preserve functional behaviour while allowing and inspiring non-trivial performance improvements. We’ll go on to talk about how AI agents are transforming our productivity and developer engagement without compromising our stratospheric quality bar. Our approach combines automated reasoning guardrails that constrain AI behaviour to known-good outcomes with aggressive use of agents to find proofs and optimizations of our most critical code.

Publisher's Version Article: fsesecdev26main-key1-p doi:10.1145/3805773.3828609

Mapping and Bridging the Software Understanding Gap (Keynote)
Sergey Bratus
(Dartmouth College, USA)
The gap between our capabilities to build software and to understand what we’ve built, reason about it, and anticipate its emergent behaviors, is tremendous. The joint “Closing the Software Understanding Gap” memorandum by US Government agencies recognized addressing this gap as a national priority. I will argue that the keys to bridging this gap lie in rethinking ostensibly mere-engineering tasks as truly first-class computer science challenges; changing the formats in which code and data are delivered based on this new understanding; and applying strong predictive theories of software’s emergent behaviors (typically witnessed via ‘hacking’ or exploitation) to all stages of software construction, delivery, and operation.

Publisher's Version Article: fsesecdev26main-key2-p doi:10.1145/3805773.3828610

Research Papers

SGX-MB: A Secure Framework for Middleboxes Leveraging Intel SGX
Mahmoud Hofny, Lianying Zhao, and Amr Youssef
(Concordia University, Canada; Carleton University, Canada)
Enterprises deploy middleboxes, such as firewalls, content filters, and intrusion detection systems, for security and policy enforcement. However, secure communication protocols, such as TLS, prevent middleboxes from accessing plaintext traffic, which hinders their functionality. On the other hand, involving middleboxes in the secure TLS channel (e.g., via TLS interception) exposes users’ communication to threats of manipulation and disclosure.
In this paper, we present a secure middlebox framework, SGXMB, which avoids runtime access to plaintext traffic by any party by running a TLS interception proxy within Intel Software Guard Extensions (SGX). The SGX-MB’s components, including TLS interception and middlebox functions (e.g., deep packet inspection), run inside SGX to securely process the entire content based on predefined rules. SGX-MB is compatible and interoperable with existing TLS implementations. It can be enabled by installing the SGX-attested certificate of the interception proxy as a root Certificate Authority (CA) on client machines to ensure only the desired trusted middlebox is in use. We developed a proof-of-concept implementation of SGX-MB and tested it with two use cases, HTTP filtering and Intrusion Detection System (IDS). The experimental results demonstrate that SGX-MB introduces an acceptable overhead to the end-to-end TLS communication latency.

Publisher's Version Article: fsesecdev26main-p7-p doi:10.1145/3805773.3805992

Origin Story: A Comprehensive Lifecycle Analysis of Same-Origin Policy Bugs
Jakub Szymsza, Gertjan Franken, Vik Vanderlinden, Tom Van Goethem, Mathy Vanhoef, and Lieven Desmet
(DistriNet at KU Leuven, Belgium)
The Same-Origin Policy (SOP) serves as the web's fundamental security mechanism, isolating resources between different web origins to prevent malicious cross-site interactions. Despite its critical role, the SOP lacks a unified formal specification and has evolved informally over decades, resulting in a fragmented implementation landscape prone to bypasses and inconsistencies. While prior research has examined specific SOP flaws, no study has systematically analyzed how these bugs are introduced and resolved throughout their lifecycle. We address this gap by conducting the first comprehensive lifecycle analysis of 97 SOP security bugs across Chromium and Firefox, enabled by our extensions to BugHog—a framework previously used on Content Security Policy (CSP) bugs.
Our findings reveal that 94% of all SOP bugs stem from non-security code revisions, such as functional bug fixes or the addition of new web features, which contrasts sharply with previously identified CSP bug causes. This, compounded by inconsistent bug labeling, cross-vendor disagreement on what constitutes a SOP bug, and the vast scope of the SOP, makes it exceptionally difficult for developers to foresee the security implications of their changes. The severity of this issue is further underscored by our discovery of three publicly disclosed vulnerabilities that remained active across the latest versions of Chromium, Firefox, and Safari. Beyond showing that bug patterns are not generalizable across different browser security policies, we propose actionable recommendations for standardization and improved bug categorization.

Publisher's Version Article: fsesecdev26main-p9-p doi:10.1145/3805773.3805993

CFIghter: Automated Control-Flow Integrity Enablement and Evaluation for Legacy C/C++ Systems
Sabine Houy, Bruno Kreyssig, and Alexandre Bartel
(Umeå University, Sweden)
Compiler-based Control-Flow Integrity (CFI) offers strong forward-edge protection but remains challenging to deploy in large C/C++ software due to visibility mismatches, type inconsistencies, and unintended behavioral failures. We present CFIghter, the first fully automated system that enables strict, type-based CFI in real-world projects by detecting, classifying, and repairing unintended policy violations exposed by the test suite. CFIghter integrates whole-program analysis with guided runtime monitoring and iteratively applies the minimal necessary adjustments to CFI enforcement only where required, stopping once all tests pass or remaining failures are deemed unresolvable. We evaluate CFIghter on four GNU projects. It resolves all visibility-related build errors and automatically repairs 95.8% of unintended CFI violations in the large, multi-library util-linux codebase, while retaining strict enforcement at over 89% of indirect control-flow sites. Across all subjects, CFIghter preserves strict type-based CFI for the majority of the codebase without requiring manual source-code changes, relying only on automatically generated visibility adjustments and localized enforcement scopes where necessary. These results show that automated compatibility repair makes strict compiler CFI practically deployable in mature, modular C software.

Publisher's Version Article: fsesecdev26main-p10-p doi:10.1145/3805773.3805994

ASN1spect: Uncovering ASN.1 Compiler-Generated Vulnerabilities in Critical Infrastructure
Seaver Thorn, Nathaniel Bennett, Kevin Butler, Patrick Traynor, and William Enck
(North Carolina State University, USA; University of Florida, USA)
ASN.1 is widely used for communication protocols in critical infrastructure. Many projects avoid parsing vulnerabilities by using ASN.1 compilers to automatically generate parsing code directly from complex protocol specifications. However, ASN.1 compilers can themselves have vulnerabilities, propagating vulnerable parsing routines to projects that use them. This paper proposes ASN1spect, a novel approach to identifying known vulnerable code generated by vulnerable ASN.1 compilers. Our analysis of vulnerabilities related to ASN.1 type constraints show that known and silently fixed vulnerabilities propagate to actively-maintained downstream projects and remain undetected for years. While our primary focus is on silently fixed supply-chain vulnerabilities, our analysis also uncovered two previously unknown vulnerabilities in asn1c. We apply ASN1spect to 93 open-source projects containing asn1c-generated parsing code and detect vulnerable parsing code in 40 (43%). We further demonstrate proof-of-concept payloads that can cause denial-of-service and logic vulnerabilities in electrical grid and satellite communication projects. These results motivate managing code generators such as ASN.1 compilers as versioned components in the software supply chain.

Publisher's Version Article: fsesecdev26main-p11-p doi:10.1145/3805773.3805995

SoK: A Modularized Framework for Symbolic Execution and Application for Usable Tool Design
James Mattei, Andrew Lin, Jasper Geer, Jie Hu, Moritz Schloegel, Tiffany Bao, and Daniel Votipka
(Tufts University, USA; University of British Columbia, Canada; Arizona State University, USA; CISPA Helmholtz Center for Information Security, Germany)
Symbolic Execution (SE) is an important and foundational software testing technique that has grown and evolved in its use over the decades. Prior work has cataloged this evolution, but this paper seeks to identify opportunities to go beyond existing designs and push forward the boundaries of its use by breaking down critical components of SE and outlining current approaches to each. To this end, we performed a systemization of 225 SE papers from the last 15 years to identify common design patterns and use cases. From this review, we distill five distinct modules of the SE architecture and discuss current implementations for each. This division of SE into modules can highlight opportunities for future improvements to SE by helping research focus on individual components. To demonstrate the modules' utility, we use the modules to identify changes for each module necessary to improve SE usability building on a second systemization of 66 papers containing insights about tooling usability.

Publisher's Version Article: fsesecdev26main-p17-p doi:10.1145/3805773.3805996

On the Variability of Source Code in Maven Package Rebuilds
Jens Dietrich and Behnaz Hassanshahi
(Victoria University of Wellington, New Zealand; Oracle, Australia)
Rebuilding packages from open source is a common practice to improve the security of software supply chains, and is now done at an industrial scale. The basic principle is to acquire the source code used to build a package published in a repository such as Maven Central (for Java), rebuild the package independently with hardened security, and publish it in some alternative repository. In this paper we test the assumption that the same source code is being used by those alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google's Assured Open Source and Oracle's Build-from-Source projects. We study non-equivalent sources for alternative builds of 28 popular packages with 85 releases. We investigate the causes of non-equivalence, and find that the main cause is build extensions that generate code at build time, which are difficult to reproduce. We suggest strategies to address this issue.

Publisher's Version Article: fsesecdev26main-p19-p doi:10.1145/3805773.3805997

At the Precipice of Integrity Protection using Pointer Authentication
Viorel Preoteasa, Carlos Chinea Pérez, Hans Liljestrand, and Jan-Erik Ekberg
(Huawei Technologies, Finland)
Despite more than a decade of availability in CPUs, the adoption of fine-grained hardware-assisted memory safety in commercial platforms and computing systems has been slow and is today mainly available in closed mobile ecosystems. Likely reasons for this are unpractical requirements on development and the emergence of compatibility issues between code that leverages the security hardware assistance, and code that does not.
This work presents and implements a compatibility compile- and build solution for ARM Pointer Authentication (PA) that allows gradual introduction of PA-enabled code into a software stack that does not include such instrumentation from the outset. We highlight the compilation / build issues that need to be resolved in such a setting, and introduce a build methodology that automates much of this process. The methodology as presented has been applied to HarmonyOS mobile browser successfully.

Publisher's Version Article: fsesecdev26main-p21-p doi:10.1145/3805773.3805998

Adversarially Mixed Secret Key Generation for Side-Channel Defense for the Cloud
Venkat Sai Suman Lamba Karanam, Zahmeeth Sayed Sakkaff, and Pasindu Balasooriyalage
(Bowling Green State University, USA; West Virginia University, USA)
Cryptographic key generation in virtualized and cloud environments is vulnerable to side-channel attacks that exploit shared resources via co-residency to infer from entropy sources. In this work, we propose an adversarially mixed secret key generation, which augments standard Operating System (OS) generated system entropy with entropy generated from an adversarial model.
We use fast adversarial perturbation methods (FGSM and PGD) to extract entropy from the internal gradient dynamics comprising a neural network model. These adversarial perturbations are mixed with OS-provided entropy to generate secret keys, which are harder to distinguish or predict under side-channel analyses. We evaluate our method on the Chameleon Cloud and the National Research Platform (NRP) testbeds to emulate an entropy-starved virtualized environment and potential side-channel exposure. Experimental results show that secret keys generated with adversarial entropy mixing are significantly harder to detect by co-resident side-channel analyses. Our findings point to a promising direction where adversarially generated entropy into a cryptographic mixing function generates a non-deterministic, non-replayable entropy that remains opaque to co-residents.

Publisher's Version Article: fsesecdev26main-p24-p doi:10.1145/3805773.3805999

RepliGuard: Policy-Driven Replica Management Framework for Protecting against Acoustic Attacks
Jennifer Sheldon, Yungwoo Ko, Sri Hrushikesh Varma Bhupathiraju, Sara Osmanovic, Weidong Zhu, Md Jahidul Islam, and Sara Rampazzi
(University of Florida, USA; Florida International University, USA)
With the rapid expansion of data center infrastructure driven by the AI boom and the growing energy demands, companies have begun deploying underwater data centers, leveraging water’s natural cooling properties. While promising in terms of sustainability and efficiency, underwater environments introduce unique security vulnerabilities, particularly susceptibility to acoustic attacks targeting storage systems. Proposed defense strategies are limited to disk-level detection, while system-level mitigations remain absent. In this work, we present RepliGuard, a novel vulnerability-aware policy-driven replica management framework integrated with CockroachDB. Our system combines OS-level detection using Extended Berkeley Packet Filter (eBPF) probes with a policy-driven replica relocation scheme to mitigate the attack impact without requiring low-level disk access or sacrificing node functionality. Our LSTM-based model achieves a true positive rate of 99.4% and a true negative rate of 99.1% when detecting subtle acoustic attacks against hybrid SDD cache-based architectures, and takes approximately 0.5 seconds to detect subtle attacks with an inference latency of 0.27 ms per prediction compared to 30-second detection of previous works. Although CockroachDB’s automatic replica rebalancing scheme fails to mitigate the attack and instead increases the cluster’s P99 SQL service latency by 215% in our testbed, our proposed mitigation dynamically reassigns replicas to unaffected nodes, reducing the attack impact by approximately 83% while preserving node functionality and activating within 2 seconds. This work demonstrates the first vulnerability-aware, system-level strategy to ensure resiliency against subtle acoustic attacks.

Publisher's Version

Published Artifact

Artifacts Available Article: fsesecdev26main-p25-p doi:10.1145/3805773.3806000

SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions
Thushari Hapuarachchi and Kaiqi Xiong
(University of South Florida, USA)
There is recently a serious issue that Deep Neural Networks (DNNs) training uses more and more unauthorized data. A clean-label generalization attack, one type of data poisoning attacks, has been suggested to address this issue. The Neural Tangent Generalization Attack (NTGA) is considered as the first well-known clean-label generalization attack under the black-box settings, which provided an unprecedented step in data protection approaches. In this paper, we conduct a comprehensive analysis on the state-of-the-art of NTGA; to the best of our knowledge, this is the first thorough analysis regarding NTGA. First, we provide a classification of attacks against DNNs with their explanations and relations to NTGA. Then, this paper presents a taxonomy of black-box attacks and demonstrate that the NTGA is the first clean-label generalization attack under the black-box setting. We further analyze the existing studies of NTGA and give a comprehensive comparisons of their findings by conducting our own experiments to verify these findings. Moreover, our extensive experiments show that NTGA is vulnerable to adversarial training and image transformations, and applying linear separability to NTGA-generated images makes them more susceptible to such vulnerablities. We present the pros and cons of NTGA and suggest ways to improve NTGA robustness based on our analysis. Our further experiments indicate that several recently proposed clean-label generalization attacks outperform NTGA on data protection. Finally, we unveil the necessity of further research with future research insights on NTGA.

Publisher's Version Article: fsesecdev26main-p26-p doi:10.1145/3805773.3806001

A Technology-Readiness Evaluation of Private Set Intersection
Wout Ceulemans, Pieter Philippaerts, Dimitri Van Landuyt, and Wouter Joosen
(KU Leuven, Belgium)
Private Set Intersection (PSI) refers to a class of privacy-preserving techniques to allow multiple data holders to collaboratively identify their shared data records without revealing any other information or dataset properties (set sizes, or data distributions). As a form of Secure Multi-Party Computation, PSI has been under active research for over two decades, and a wide range of techniques have emerged. This technology class represents a compelling means to implement collaborative data processing and analytics of highly sensitive data.
In this paper, we survey and study the existing PSI literature through the lens of technological readiness. We refine the 9-level Technology Readiness Level (TRL) scale and apply it to the PSI approaches and implementations from both academic and grey literature sources, including open source implementations and commercial service offerings. Our results highlight a significant TRL gap between academic PSI research and practical adoption. Although interest from practitioners is growing, real-world deployments remain rare and often lack technical documentation, while academic protocols seldom progress beyond controlled experiments. Moreover, the added complexity of integrating PSI into larger systems is poorly addressed, and current literature provides limited threat analysis for operational environments.

Publisher's Version Article: fsesecdev26main-p32-p doi:10.1145/3805773.3806002

SafeAIMerge: A Tool for Integrating DAST and LLM-Generated Security Feedback into GitHub Actions Workflows
Arpit Thool, Justin Smith, and Chris Brown
(Virginia Tech, USA; Lafayette College, USA)
Continuous integration and delivery (CI/CD) emphasizes automated workflows and rapid deployment, yet the integration of security practices into these workflows remains a persistent challenge. For instance, traditional Dynamic Application Security Testing (DAST) tools are effective for identifying web application vulnerabilities, yet these complex systems are difficult to integrate in CI/CD workflows and often produce lengthy and complex reports that are difficult to understand. To address this gap, we present SafeAIMerge, a tool that integrates DAST within GitHub pull requests by leveraging LLMs to generate actionable and developer-friendly security alert summaries. We conducted a two-phase study to motivate and evaluate SafeAIMerge. First, we distributed a formative survey to capture developers’ initial perceptions of our tool (𝑛 = 46). We observed favorable perceptions and received valuable feedback to enhance our tool in real-world settings. We refined our tool, then conducted a user study (𝑛 = 12) to assess SafeAIMerge in debugging tasks. We found SafeAIMerge reduces perceived developer workload and enhances vulnerability remediation compared to baseline DAST workflows. Based on our findings, we provide insights to improve the integration of DAST practices into modern development pipelines. SafeAIMerge is accessible on GitHub (github.com/arpitthool/SafeAIMerge).

Publisher's Version Article: fsesecdev26main-p35-p doi:10.1145/3805773.3806003

Reality Check: Independent Evaluation of Modern Grey-Box Fuzzing Techniques
Pavel Frolikov
(University of California at Irvine, USA)
Coverage-guided grey-box fuzzing has emerged as a practical and widely-adopted technique for automated software testing. The topic has been quite popular, with numerous new techniques, each claiming significant improvements over the established baselines.
This paper presents an independent evaluation of six state-of-the-art grey-box fuzzers across seven widely-used benchmarks. Our evaluation shows that AFL++, LibAFL, and Predictive Context-Sensitive Fuzzing achieve significant improvements over AFL (97%, 97%, and 95% vs 93% mean performance), while Darwin and MOPT show no meaningful improvement over AFL (92% each), and EcoFuzz consistently underperforms at 88%. We find that improvements are highly target-dependent, with complex targets showing the largest performance variations (62-99% across fuzzers) while simpler targets show convergence.

Publisher's Version Article: fsesecdev26main-p36-p doi:10.1145/3805773.3806004

Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation
Priscilla Kyei Danso, Mohammad Saqib Hasan, Niranjan Balasubramanian, and Omar Chowdhury
(Stony Brook University, USA)
Propositional Linear Temporal Logic (LTL) is a popular formalism for specifying desirable requirements and security and privacy policies for software, networks, and systems. Yet expressing such requirements and policies in LTL remains challenging because of its intricate semantics. Since many security and privacy analysis tools require LTL formulas as input, this difficulty places them out of reach for many developers and analysts. Large Language Models (LLMs) could broaden access to such tools by translating natural language fragments into LTL formulas. This paper evaluates that premise by assessing how effectively several representative LLMs translate assertive English sentences into LTL formulas. Using both human-generated and synthetic ground-truth data, we evaluate effectiveness along syntactic and semantic dimensions. The results reveal three findings: (1) in line with prior findings, LLMs perform better on syntactic aspects of LTL than on semantic ones; (2) they generally benefit from more detailed prompts; and (3) reformulating the task as a Python code-completion problem substantially improves overall performance. We also discuss challenges in conducting a fair evaluation on this task and conclude with recommendations for future work.

Publisher's Version Article: fsesecdev26main-p38-p doi:10.1145/3805773.3806005

Practitioner Papers

A CNN-LSTM Security Model for SCADA Network
Olga Dye and Brian Dye
(University of Texas at Dallas, USA; University of Louisiana at Lafayette, USA)
This study proposes a lightweight neural network model to detect malicious payloads in ICS/SCADA network traffic to support zero-trust principles. We designed the model with two convolution and one LSTM layers, trained on packet payloads from the 4SICS dataset. To ensure the model learned to correctly distinguish between attack and benign data, we stripped IP and MAC addresses and timestamps from packets. We performed shuffling, sorting, and interleaving of the payloads to ensure a balanced class distribution. In contrast to prior research that transforms packet payloads into images, we developed a method allowing the reshaping of packet data into 2D 8-bit numerical arrays. For quality assurance, we evaluated the model's performance on a reserved portion of the dataset. As a result, the model accurately differentiated benign from attack traffic, demonstrated by an F1 score of 0.99. Our approach minimizes storage and processing requirements, enabling real-time deep packet inspection in ICS/SCADA networks, thereby supporting implementation of zero-trust security of legacy components that are insecure by design.

Publisher's Version Article: fsesecdev26pract-p1-p doi:10.1145/3805773.3806637

Augment Mutual TLS Authentication with HW Rooted Identity: Simplified Device Lifecycle and Interoperability
Dhananjay Phadke and Xiling Sun
(Microsoft Corporation, USA)
Control-plane devices in cloud infrastructure---Baseboard Management Controllers (BMCs), Rack Management Controllers (RMCs), and SmartNICs/DPUs within a rack---are high-value targets that traditionally rely on password-based authentication, as demonstrated by CVE-2024-54085, a critical unauthenticated Redfish access bypass. This paper presents a practitioner architecture that replaces passwords with hardware-rooted certificate-based mutual TLS authentication. Built on DICE-style identity derivation and a minimal operator PKI, this design simplifies rack provisioning and device recommissioning after repairs using a zero-touch enrollment mechanism. We identify practitioner insights from the design process and the conditions under which the model applies.

Publisher's Version Article: fsesecdev26pract-p3-p doi:10.1145/3805773.3806639

Cloud Safety: A Hardware Perspective
Raghudeep Kannavara, Matthew Dickinson, and Monty Wiseman
(Oracle, USA)
Cloud safety discussions have largely focused on software rather than hardware (HW), even though hardware design choices, firmware (FW) behavior, and platform integration decisions can drive severe customer impact. These include service interruption, silent corruption, failed attestation, unsafe recovery, and loss of access to hardware-bound secrets. At scale, even narrow design or operational mistakes can propagate across the fleet and manifest as broad platform events. This paper examines cloud safety through the lens of hardware hazards, focusing on patterns of failure rather than exhaustively classifying individual issues. Effective mitigation typically requires layered controls across hardware design, firmware, platform software, and operational processes, rather than a single corrective mechanism. This paper makes three contributions: (1) the case for treating hardware safety as a distinct engineering concern alongside security and reliability, (2) a mitigation-oriented framework spanning design, operations, operator, and customer centric hazards, and (3) illustrative cases grounded in real-world cloud infrastructure scenarios.

Publisher's Version Article: fsesecdev26pract-p4-p doi:10.1145/3805773.3806640

OpenClaw RedTeam Recon: A Local OSS-LLM-Powered Autonomous Reconnaissance Agent
Marcelo Garcia and Robson de Oliveira Albuquerque
(Federal Fluminense University, Brazil; Catholic University of Brasilia, Brazil)
Red team reconnaissance remains largely manual, as existing tools lack autonomous decision-making. Abundant empirical knowledge exists on conducting reconnaissance, suggesting that well-orchestrated autonomous AI agents -- guided by this knowledge and structured methodologies -- could perform effectively, particularly in initial reconnaissance and vulnerability assessment. Our architecture uses an Openclaw agent running on a dedicated Linux machine (for security isolation), coupled with locally installed open-source LLM models served via Ollama on a GPU-equipped Linux machine (Nvidia's RTX 4060 Ti with 16GB VRAM). We selected OpenAI’s Gpt-oss-20B (20 billion parameters, mixture-of-experts architecture), and Alibaba’s Qwen3.5-9B (9 billion parameters), because they fit entirely in our GPU memory without paging, avoiding performance degradation from memory swapping.
This exploratory study aims to contribute technical insights into the practical application of controlled agentic software powered by local open-source LLMs within security automation contexts, with findings intended to inform future research directions in autonomous vulnerability assessment systems.

Publisher's Version Article: fsesecdev26pract-p5-p doi:10.1145/3805773.3806641

SecDev 2026 – Proceedings

Frontmatter

Keynotes

Research Papers

Practitioner Papers