| |
Balakrishnan, Gogul
|
PAgE '26: "Towards Verified Code Reasoning ..."
Towards Verified Code Reasoning by LLMs
Meghana Sistla, Gogul Balakrishnan, Pat Rondon, José P. Cambronero, Michele Tufano, and Satish Chandra
(University of Texas at Austin, USA; Google DeepMind, USA; Google, USA; Meta Platforms, USA)
While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. As a result of this lack of trustworthiness, the agent's answers need to be manually verified before they can be trusted, which requires substantial effort and can result in lower developer productivity. We describe a method to automatically validate the answers provided by a code reasoning agent by extracting a formal representation of the agent's response, and using formal verification and program analysis tools to verify the reasoning steps. We applied this approach to a set of 20 program equivalence queries and found that the formal verification step successfully caught 6/8 incorrect agent judgments. This work lays the foundation for what may become a new class of high-precision, verifiable code agents, paving the way for their reliable use in critical software engineering workflows.
Article Search
Article: pldiws26pagemain-p4-p
|
| |
Cambronero, José P.
|
PAgE '26: "Towards Verified Code Reasoning ..."
Towards Verified Code Reasoning by LLMs
Meghana Sistla, Gogul Balakrishnan, Pat Rondon, José P. Cambronero, Michele Tufano, and Satish Chandra
(University of Texas at Austin, USA; Google DeepMind, USA; Google, USA; Meta Platforms, USA)
While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. As a result of this lack of trustworthiness, the agent's answers need to be manually verified before they can be trusted, which requires substantial effort and can result in lower developer productivity. We describe a method to automatically validate the answers provided by a code reasoning agent by extracting a formal representation of the agent's response, and using formal verification and program analysis tools to verify the reasoning steps. We applied this approach to a set of 20 program equivalence queries and found that the formal verification step successfully caught 6/8 incorrect agent judgments. This work lays the foundation for what may become a new class of high-precision, verifiable code agents, paving the way for their reliable use in critical software engineering workflows.
Article Search
Article: pldiws26pagemain-p4-p
|
| |
Chandra, Satish |
PAgE '26: "Towards Verified Code Reasoning ..."
Towards Verified Code Reasoning by LLMs
Meghana Sistla, Gogul Balakrishnan, Pat Rondon, José P. Cambronero, Michele Tufano, and Satish Chandra
(University of Texas at Austin, USA; Google DeepMind, USA; Google, USA; Meta Platforms, USA)
While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. As a result of this lack of trustworthiness, the agent's answers need to be manually verified before they can be trusted, which requires substantial effort and can result in lower developer productivity. We describe a method to automatically validate the answers provided by a code reasoning agent by extracting a formal representation of the agent's response, and using formal verification and program analysis tools to verify the reasoning steps. We applied this approach to a set of 20 program equivalence queries and found that the formal verification step successfully caught 6/8 incorrect agent judgments. This work lays the foundation for what may become a new class of high-precision, verifiable code agents, paving the way for their reliable use in critical software engineering workflows.
Article Search
Article: pldiws26pagemain-p4-p
|
| |
Coyne, Brendan |
PAgE '26: "Testing LLM-Generated Distributed ..."
Testing LLM-Generated Distributed Protocol Code
Brendan Coyne and Ankush Das
(Boston University, USA)
Article Search
Article: pldiws26pagemain-p11-p
|
| |
Das, Ankush
|
PAgE '26: "Testing LLM-Generated Distributed ..."
Testing LLM-Generated Distributed Protocol Code
Brendan Coyne and Ankush Das
(Boston University, USA)
Article Search
Article: pldiws26pagemain-p11-p
|
| |
Ioannidis, Eleftherios
|
PAgE '26: "Gradual Validation and Self-Healing ..."
Gradual Validation and Self-Healing for Agentic Programs
Theodoros Tsampouris, Eleftherios Ioannidis, and Andreas Symeonidis
(Aristotle University of Thessaloniki, Greece; Microsoft Research, USA)
Article Search
Article: pldiws26pagemain-p14-p
|
| |
Martínez, Guido
|
PAgE '26: "The Next Frontier for AI-Generated ..."
The Next Frontier for AI-Generated Kernels: Correctness
Guido Martínez and Tyler Sorensen
(Microsoft Research, USA)
Article Search
Article: pldiws26pagemain-p6-p
|
| |
Rinard, Martin C.
|
PAgE '26: "Testing, Credible Compilation, ..."
Testing, Credible Compilation, and Verification in the Axon Verified Compiler in Lean and Claude Code
Martin C. Rinard
(National University of Singapore, Singapore; Massachusetts Institute of Technology, USA)
Article Search
Article: pldiws26pagemain-p5-p
|
| |
Rondon, Pat |
PAgE '26: "Towards Verified Code Reasoning ..."
Towards Verified Code Reasoning by LLMs
Meghana Sistla, Gogul Balakrishnan, Pat Rondon, José P. Cambronero, Michele Tufano, and Satish Chandra
(University of Texas at Austin, USA; Google DeepMind, USA; Google, USA; Meta Platforms, USA)
While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. As a result of this lack of trustworthiness, the agent's answers need to be manually verified before they can be trusted, which requires substantial effort and can result in lower developer productivity. We describe a method to automatically validate the answers provided by a code reasoning agent by extracting a formal representation of the agent's response, and using formal verification and program analysis tools to verify the reasoning steps. We applied this approach to a set of 20 program equivalence queries and found that the formal verification step successfully caught 6/8 incorrect agent judgments. This work lays the foundation for what may become a new class of high-precision, verifiable code agents, paving the way for their reliable use in critical software engineering workflows.
Article Search
Article: pldiws26pagemain-p4-p
|
| |
Sistla, Meghana
|
PAgE '26: "Towards Verified Code Reasoning ..."
Towards Verified Code Reasoning by LLMs
Meghana Sistla, Gogul Balakrishnan, Pat Rondon, José P. Cambronero, Michele Tufano, and Satish Chandra
(University of Texas at Austin, USA; Google DeepMind, USA; Google, USA; Meta Platforms, USA)
While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. As a result of this lack of trustworthiness, the agent's answers need to be manually verified before they can be trusted, which requires substantial effort and can result in lower developer productivity. We describe a method to automatically validate the answers provided by a code reasoning agent by extracting a formal representation of the agent's response, and using formal verification and program analysis tools to verify the reasoning steps. We applied this approach to a set of 20 program equivalence queries and found that the formal verification step successfully caught 6/8 incorrect agent judgments. This work lays the foundation for what may become a new class of high-precision, verifiable code agents, paving the way for their reliable use in critical software engineering workflows.
Article Search
Article: pldiws26pagemain-p4-p
|
| |
Sorensen, Tyler |
PAgE '26: "The Next Frontier for AI-Generated ..."
The Next Frontier for AI-Generated Kernels: Correctness
Guido Martínez and Tyler Sorensen
(Microsoft Research, USA)
Article Search
Article: pldiws26pagemain-p6-p
|
| |
Symeonidis, Andreas |
PAgE '26: "Gradual Validation and Self-Healing ..."
Gradual Validation and Self-Healing for Agentic Programs
Theodoros Tsampouris, Eleftherios Ioannidis, and Andreas Symeonidis
(Aristotle University of Thessaloniki, Greece; Microsoft Research, USA)
Article Search
Article: pldiws26pagemain-p14-p
|
| |
Tsampouris, Theodoros
|
PAgE '26: "Gradual Validation and Self-Healing ..."
Gradual Validation and Self-Healing for Agentic Programs
Theodoros Tsampouris, Eleftherios Ioannidis, and Andreas Symeonidis
(Aristotle University of Thessaloniki, Greece; Microsoft Research, USA)
Article Search
Article: pldiws26pagemain-p14-p
|
| |
Tufano, Michele |
PAgE '26: "Towards Verified Code Reasoning ..."
Towards Verified Code Reasoning by LLMs
Meghana Sistla, Gogul Balakrishnan, Pat Rondon, José P. Cambronero, Michele Tufano, and Satish Chandra
(University of Texas at Austin, USA; Google DeepMind, USA; Google, USA; Meta Platforms, USA)
While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. As a result of this lack of trustworthiness, the agent's answers need to be manually verified before they can be trusted, which requires substantial effort and can result in lower developer productivity. We describe a method to automatically validate the answers provided by a code reasoning agent by extracting a formal representation of the agent's response, and using formal verification and program analysis tools to verify the reasoning steps. We applied this approach to a set of 20 program equivalence queries and found that the formal verification step successfully caught 6/8 incorrect agent judgments. This work lays the foundation for what may become a new class of high-precision, verifiable code agents, paving the way for their reliable use in critical software engineering workflows.
Article Search
Article: pldiws26pagemain-p4-p
|