Programming Languages
See recent articles
Showing new listings for Monday, 1 June 2026
- [1] arXiv:2605.31389 [pdf, html, other]
-
Title: Neuroforger: certified violation witnesses for smart contracts verification via LLMsSubjects: Programming Languages (cs.PL); Cryptography and Security (cs.CR)
Recent large language models (LLMs) incorporate reasoning capabilities that allow them to perform well in predicting whether a smart contract respects a certain property, suggesting a complementary approach to traditional formal-methods-based techniques for smart contract verification. However, the application of LLMs in such context has two major issues: 1) properties expressed in natural language are intrinsically ambiguous, and 2) answers returned by LLMs have no guarantee of correctness. In this paper, we address both issues simultaneously by: 1) introducing a new formal specification language that extends Solidity with abstract types, and 2) designing a workflow that combines LLMs with type checking and concrete execution to generate and validate violation witnesses (i.e., counterexamples). The key idea is to represent a specification as a Solidity test with (existentially quantified) variables of abstract type; finding an instantiation of these variables to concrete values (of the correct type) concretizes the test into an executable counterexample (PoC) for the target property. We implemented our procedure in the tool Neuroforger, experimentally evaluating it on a smart-contract verification dataset drawn from literature, obtaining promising results that demonstrate its potential applicability in the wild.
- [2] arXiv:2605.31517 [pdf, html, other]
-
Title: Practical Algebraic Stepping with Scoped FiltersComments: 30 pages, 10 figures, 2 tablesSubjects: Programming Languages (cs.PL)
Algebraic steppers help students learn functional programming by displaying evaluation as a sequence of small-step reductions, but even simple programs produce long traces in which key ideas are buried under mundane reductions. This paper presents the filtered stepper calculus, a formal framework that gives users scoped, pattern-based control over which reduction steps are shown or hidden. Users annotate programs with lightweight filter expressions that match on the structure of redexes. Filters compose via lexical scoping so that inner filters override outer ones. We prove preservation, progress, and a simulation theorem establishing that the filtered stepper agrees with the underlying unfiltered semantics, and mechanize all proofs in Agda. We implement the calculus in the Hazel live programming environment, including its support for stepping programs with holes and type errors. To do so, we reconcile Hazel's internal environment-based evaluator with the substitution-based presentation expected in the classroom. We deploy the system in a university programming languages course. Our evaluation shows that students adopt the stepper organically, though more advanced uses of filters may require further instruction, and that instructors can use filters to craft focused traces for use in lectures.
New submissions (showing 2 of 2 entries)
- [3] arXiv:2605.30507 (cross-list from cs.PF) [pdf, other]
-
Title: A Virtual Processor brings back the Free LunchComments: 10 pages + appendix (3 pages), 7 figures, 4 benchmarks at this https URL (GitHub) or this https URL (DOI Zenodo)Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)
This work introduces a self-optimizing virtual processor (VP) for numerical array programs that shifts parallelization from a manual developer task to a cooperative, agent-like runtime mechanism. Instead of relying on centralized task-graph scheduling, static compiler optimization, or explicitly annotated parallel constructs, the VP uses a decentralized network of cooperative execution segments, derived from the stream of numerical instructions and their data dependencies at runtime. Each segment makes only local decisions about when, where, and how to prepare and execute its computation, including task placement, kernel preparation, and data movement. No central scheduler or mapper instance determines the execution globally; instead, scheduling itself is parallelized and distributed over time - asynchronously and strictly dependency driven. The overall execution strategy emerges from concurrently executing local segments, continuously responding to data availability, cost estimates, system state, hardware capabilities, and problem size. While preserving the sequential semantics of the program our VP automatically exploits parallelism across large program regions rather than being limited to individual loop bodies, modules, or explicitly marked parallel sections; developers are not required to design or encode a parallelization strategy. The current VP primarily targets low-latency strong scaling on local heterogeneous hardware, covering workloads from small, latency-sensitive array operations to large data-parallel computations. The current implementation targets the predefined array instruction set of the this http URL domain-specific language, while the underlying concept is applicable to general array-based numerical programming models such as MATLAB and NumPy.
- [4] arXiv:2605.31569 (cross-list from cs.DC) [pdf, other]
-
Title: A Datalog Framework for Conflict-Free Replicated Data TypesComments: Paper presented at the 42nd International Conference on Logic Programming (ICLP 2026), Lisbon, Portugal, July 20 to July 23, 2026Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Distributed applications increasingly support local-first collaboration over shared data, allowing multiple users to perform updates concurrently without global coordination. Such collaboration requires careful design to capture the intended semantics of the concurrent interactions.
We introduce a declarative framework for specifying and reasoning about the semantics of conflict-free replicated data types (CRDTs) and CRDT-based applications in Datalog. The framework models CRDT semantics as executable logic programs over operation contexts, making concurrency explicit and compositional, and thus amenable to automated analysis. As one application, we use property-based testing to compare implementations. To the best of our knowledge, this is the first work to systematically use Datalog as a foundation for prototyping and analyzing complex CRDTs and their compositions.
We evaluate our methodology using a collaborative graph data editing case study and report experimentation results assessing correctness validation and scalability with an increasing number of operations and replicas.
Cross submissions (showing 2 of 2 entries)
- [5] arXiv:2510.03415 (replaced) [pdf, html, other]
-
Title: LLMs Lean on Priors, Not Programming Language SemanticsComments: Accepted at ICML 2026Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
Recent work asks whether large language models (LLMs) condition their reasoning on explicit rules rather than statistical regularities from pretraining. Program execution provides a canonical instance: formal semantics define behavior through symbolic transition rules that can be systematically altered under distribution shift. We investigate whether LLMs can condition their reasoning on formal semantics through program execution and introduce PLSemanticsBench, pairing featherweight C programs with two semantic systems -- small-step operational semantics and K semantics -- and probing four capabilities: composing rules for final states, selecting rules when state is unmutated, sustaining such conditioning over long traces, and following supplied rules under novel semantics. To decouple semantic reasoning from syntactic familiarity, we redefine familiar operators to induce symbol-meaning conflict and introduce novel symbols defined only through the supplied rules, and stress-test models on Human-Written, LLM-Translated, and Fuzzer-Generated splits with increasing structural complexity. Across 11 frontier LLMs, strong final-state accuracy under standard semantics (up to 90%) drops sharply -- by as much as 40--60% points -- under semantic mutations and increasing structural complexity. Only a handful of models achieve non-zero long-horizon conditioning accuracy, and even the best systems reach just 35%. Together, these results suggest that contemporary LLMs often rely on pretrained lexical associations rather than systematically conditioning on supplied formal rules. PLSemanticsBench is publicly available at this https URL.
- [6] arXiv:2405.16708 (replaced) [pdf, other]
-
Title: Higher-order bialgebraic semanticsComments: Extended and updated version of arXiv:2210.13387Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Compositionality proofs in higher-order languages are notoriously involved, and general semantic frameworks guaranteeing compositionality are hard to come by. In particular, Turi and Plotkin's bialgebraic abstract GSOS framework, which provides off-the-shelf compositionality results for first-order languages, so far does not apply to higher-order languages. In the present work, we develop a theory of abstract GSOS specifications for higher-order languages, in effect transferring the core principles of Turi and Plotkin's framework to a higher-order setting. In our theory, the operational semantics of higher-order languages is represented by certain dinatural transformations that we term (pointed) higher-order GSOS laws. We give a general compositionality result that applies to all systems specified in this way and discuss how compositionality of combinatory logics and the lambda-calculus w.r.t. a strong variant of Abramsky's applicative bisimilarity are obtained as instances.
- [7] arXiv:2412.18134 (replaced) [pdf, other]
-
Title: Learning Randomized ReductionsFerhat Erata, Orr Paradise, Thanos Typaldos, Timos Antonopoulos, ThanhVu Nguyen, Shafi Goldwasser, Ruzica PiskacComments: Accepted at ICML 2026 (Spotlight). 9 pages main text + appendixJournal-ref: Proceedings of the 43rd International Conference on Machine Learning, PMLR 306, 2026Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Programming Languages (cs.PL); Software Engineering (cs.SE)
Randomized self-reductions (RSRs) express $f(x)$ using $f$ evaluated at random correlated points, enabling self-correcting programs, instance-hiding protocols, and applications in complexity theory and cryptography. Yet discovering RSRs has required manual expert derivation for over 40 years, limiting their practical use. We present Bitween for automated RSR learning. First, we formalize RSR learning with sample complexity analysis under correlated sampling. Second, we develop Vanilla Bitween, which integrates multiple backends (linear regression, genetic programming, symbolic regression, and mixed-integer programming). The linear regression backend outperforms the others, discovering RSRs for 43 of 80 functions (54%) in RSR-Bench, our benchmark suite, including the first known reduction for sigmoid. Third, we introduce Agentic Bitween, a neuro-symbolic approach where LLM agents propose novel query functions beyond the fixed set ($x+r$, $x-r$, $x \cdot r$, $x$, $r$) in prior work. Agentic Bitween discovers RSRs for 64 of 80 functions (80%), outperforming pure neural baselines in both RSR discovery and verification accuracy.
- [8] arXiv:2502.04671 (replaced) [pdf, html, other]
-
Title: ProofWala: A Framework for Multilingual Proof Data Synthesis and Theorem-ProvingSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Neural approaches to theorem proving require robust infrastructure for interfacing with interactive theorem provers (ITPs), extracting structured proof data, and executing proof search at scale. However, existing tooling is often assistant-specific and oriented toward file-level execution, making repository-scale analysis and parallel experimentation challenging. We present ProofWala, a multilingual proof engineering framework built around \texttt{itp-interface}, a reusable library for programmatic interaction with ITPs. For Lean 4, we implement a meta-programmed interaction layer executing inside the elaborator, enabling semantically faithful tactic-level tracing alongside declaration- and dependency-level extraction across entire repositories. This design extends beyond traditional REPL-style interaction by supporting project-wide analysis, environment cloning, and pooled execution of proof states. The same interface abstraction supports multiple versions of Rocq, yielding a unified cross-assistant pipeline.
Built on this infrastructure, ProofWala provides standardized multilingual proof datasets, model training utilities, and parallel proof search algorithms. Using the framework, we demonstrate that multilingual training across Lean and Rocq enables cross-lingual and cross-domain transfer. We observe statistically significant improvements on Lean Mathlib and in domain adaptation (CategoryTheory), while other settings exhibit consistent upward trends. We open-source the full framework, parallel proof search module, datasets, and models across two repositories: ProofWala (this https URL) and the itp-interface library (this https URL). - [9] arXiv:2510.05115 (replaced) [pdf, html, other]
-
Title: SAC-Opt: Semantic Anchors for Iterative Correction in Optimization ModelingComments: ICML 2026 acceptedSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
Large language models (LLMs) have opened new paradigms in optimization modeling by enabling the generation of executable solver code from natural language descriptions. Despite this promise, existing approaches typically remain solver-driven: they rely on single-pass forward generation and apply limited post-hoc fixes based on solver error messages, leaving undetected semantic errors that silently produce syntactically correct but logically flawed models. To address this challenge, we propose SAC-Opt, a backward-guided correction framework that grounds optimization modeling in problem semantics rather than solver feedback. At each step, SAC-Opt aligns the original semantic anchors with those reconstructed from the generated code and selectively corrects only the mismatched components, driving convergence toward a semantically faithful model. This anchor-driven correction enables fine-grained refinement of constraint and objective logic, enhancing both fidelity and robustness without requiring additional training or supervision. Empirical results on seven public datasets demonstrate that SAC-Opt improves average modeling accuracy by 7.7%, with gains of up to 21.9% on the ComplexLP dataset. These findings highlight the importance of semantic-anchored correction in LLM-based optimization workflows to ensure faithful translation from problem intent to solver-executable code.
- [10] arXiv:2604.18587 (replaced) [pdf, html, other]
-
Title: Compile to Compress: Boosting Formal Theorem Provers by Compiler OutputsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Large language models (LLMs) have demonstrated significant potential in formal theorem proving, yet state-of-the-art performance often necessitates prohibitive test-time compute via massive roll-outs or extended context windows. In this work, we address this scalability bottleneck by exploiting an informative structure in formal verification: the observation that compilers map a vast space of diverse proof attempts to a compact set of structured failure modes. We introduce a learning-to-refine framework that leverages this compression to perform efficient learning and proof exploration. We perform tree search that corrects errors locally conditioned on explicit verifier feedback, thereby circumventing the costs associated with accumulating a long history of proof attempts. Extensive evaluations show that our method consistently amplifies the reasoning capabilities of base provers across varying scales. Notably, our approach achieves state-of-the-art performance on PutnamBench among publicly reported $\sim$8B and $\sim$32B parameter models under comparable test-time budgets, offering a scalable paradigm for next-generation verifier-guided reasoning.