[10] viXra:2605.0120 [pdf] submitted on 2026-05-31 18:47:17
Authors: Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Huiming Yang
Comments: 9 Pages.
Reasoning language models do not distinguish tokens used for computation from tokens that constitute persistent state: once generated, all hidden thoughts remain in context and influence future predictions. As a result, downstream reasoning may depend on failed attempts, dead ends, and private scratch work that should not be safely relied on later. We recast this phenomenon as a new training objective, state commitment learning: training models to explicitly distinguish information that should be committed as persistent state from temporary computation that can be discarded. We define a counterfactual criterion, persistent-state sufficiency, which makes it trainable and measurable whether an answer remains usable after hidden thoughts are erased. We then propose Counterfactual Erasure RL (CERL), which evaluates, under the same prefix, both a path that keeps hidden thoughts and a path that erases them, and gives reward only when the erasure path remains correct. We also introduce the Erasure Dependence Protocol and show across mathematics, long-chain logic, scientific QA, and multi-turn tool-use evaluation that CERL substantially reduces answer dependence on hidden thoughts without sacrificing accuracy, consistently outperforming correctness-only RL and long-answer SFT baselines.
Category: Artificial Intelligence
[9] viXra:2605.0119 [pdf] submitted on 2026-05-31 03:02:06
Authors: Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Huiming Yang
Comments: 9 Pages.
Post-training of large language models optimizes only parameters, while inference-time procedural scaffolds are typically designed independently of parameter training. This disconnect makes it difficult to automatically acquire and internalize complex strategies. We propose scaffold-mediated post-training: procedural scaffolds are organized into an evolvable graph structure that co-evolves with model parameters through discovery, distillation, and dynamic recompilation. We instantiate this paradigm as Skill Training. On FeatureBench, automatically discovered skills improve the passed rate by 8.1pp, and after progressive distillation the model still achieves a 27.7% passed rate without any external scaffold (distillation retention rate 85.2%, defined as post-distillation / with-skill passed rate), significantly outperforming standard SFT on the same data.
Category: Artificial Intelligence
[8] viXra:2605.0118 [pdf] submitted on 2026-05-31 03:04:00
Authors: Fei Ding, Yongkang Zhang, Yuhao Liao, Zijian Zeng, Huiming Yang
Comments: 9 Pages.
ECC memory embeds 8 parity bits for every 64 data bits and automatically detects and corrects errors on each read. The parity bits carry no data and only safeguard integrity, at ~12.5% overhead. Yet the reasoning chains of large language models lack such built-in self-verification: once an error occurs it propagates along the chain, and existing methods can only verify externally after generation completes. We propose the check token, establishing built-in self-verification for language model generation streams for the first time: a functional marker
Category: Artificial Intelligence
[7] viXra:2605.0117 [pdf] submitted on 2026-05-31 03:05:42
Authors: Fei Ding, Yongkang Zhang, Yuhao Liao, Zijian Zeng, Huiming Yang
Comments: 9 Pages.
Group Relative Policy Optimization (GRPO) is the dominant reinforcement learning algorithm for training reasoning capabilities in large language models, notably adopted by DeepSeek-R1. The recent improvement Dr. GRPO (COLM 2025) identifies the response-level length bias caused by per-trajectory length normalization in GRPO and proposes removing this normalization, claiming the resulting optimizer is "unbiased." We show that this claim is incomplete. Specifically, we establish an impossibility theorem: under the standard outcome reward + GRPO setting, no length-based weighting scheme can simultaneously achieve the following two properties. (P1) Gradient unbiasedness: the gradient estimator is an unbiased estimate of the true policy gradient. (P2) Length invariance: each trajectory's effective contribution to the gradient is independent of its token length. GRPO approximately satisfies P2 but violates P1; Dr. GRPO satisfies P1 but violates P2. We characterize the complete tradeoff spectrum via the parametric family f_alpha(L) = L^{alpha - 1}, where alpha = 0 recovers GRPO, alpha = 1 recovers Dr. GRPO, and provide quantitative analysis showing that Dr. GRPO's length bias can cause longer trajectories to dominate gradient updates by a factor proportional to the length ratio. Our results reveal that neither algorithm is universally "done right"; they occupy opposite ends of a fundamental and unavoidable tradeoff.
Category: Artificial Intelligence
[6] viXra:2605.0109 [pdf] submitted on 2026-05-27 22:10:46
Authors: Lixiang Li, Anjan Goswami, Md Muksitul Haque, Bharat Bhargava
Comments: 7 Pages. (Note by viXra Admin: Please submit article written with AI assistance to ai.viXra.org)
Generating high-quality PowerPoint slides from natural language instructions is a complex task that demands not only deep semantic understanding, but also aesthetic design. The essential component for building a functional and visually rich slide is the XML object. Therefore, it is intuitive that the most direct path to creating a high-quality slide is to generate it directly from the foundational XML structure. However, previous ``human instruction to slide generation" models typically rely on generating Python code, which serves as an intermediary to produce the final slide output rather than direct production of the XML object. As a result, these models lack the ability to precisely construct and control the building blocks required for a detailed slide composition. We introduce SlideTuner, a custom finetuned GPT-4o model specifically engineered to generate high-quality PowerPoint slides by generating the required XML files. Through extensive empirical experiments, we demonstrate that the fine-tuned GPT-4o model successfully and consistently produces visually coherent and aesthetically pleasing slides. The SlideTuner employs a two-stage training approach: first we apply SFT to the language model, enabling it to generate slide-rendering XML code directly from user instruction, utilizing XML data extracted from native PowerPoint slides. Second, we apply Direct Preference Optimization (DPO) to align the model's outputs with preferred visual styles, such as specific font choices. The slides produced by our model exhibit superior layout scores and style adherence. While this work focuses on font-level aesthetic control, our work establishes a foundation for future research aimed at precisely guiding slide generation toward diverse visual or structural preferences.
Category: Artificial Intelligence
[5] viXra:2605.0079 [pdf] submitted on 2026-05-19 23:10:18
Authors: Yerassyl Tursynbek, Kassekeyeva Aislu Bisenovna
Comments: 5 Pages. (Note by viXra Admin: Please submit article written with AI assistance to ai.viXra.org)
Accurate demand forecasting is a critical component of supply chain optimization, inventory management, and strategic planning in modern enterprises. Conventional statistical forecasting approaches often struggle to represent nonlinear patterns and abrupt changes in consumer behavior, which reduces their effectiveness in volatile market conditions. This study explores data-driven forecasting techniques that integrate historical sales records with factors influencing purchasing behavior, including seasonality and promotional effects. A comparative experimental analysis is conducted between classical time-series approaches and advanced data-oriented predictive models. The results demonstrate that data-driven forecasting techniques achieve higher predictive accuracy and stability, particularly when long-term temporal dependencies and irregular demand fluctuations are present. The proposed approach supports improved decision-making by reducing forecasting errors and enhancing operational efficiency. The findings highlight the potential of intelligent forecasting systems for sustainable business growth and adaptive demand planning.
Category: Artificial Intelligence
[4] viXra:2605.0063 [pdf] submitted on 2026-05-16 20:29:09
Authors: Derrick Donkor
Comments: 7 Pages. (Note by viXra Admin: Please submit article written with AI assistance to ai.viXra.org)
We introduce a finite, structure-driven framework for neural population coding based on a fixed additive lattice of 2 × 2 matrices. The model defines a predetermined set of sixteen basis matrices whose admissible linear combinations generate stimulus representations in the real number space R. Information encoding is achieved by selecting lattice combinations that maximize entropy, consistent with efficient coding principles. In this proposal, Population coding is formulated as a probabilistic inference problem governed by a Markov state-space model, where transitions occur over lattice states (distinct matrices in the lattice group), as shown in equation (1). Model parameters are inferred via maximum likelihood estimation. Since the compositional structure of the lattice is fixed a priori, the framework decouples population coding computationalcapacity from stringent network connectivity, enabling a fully computable and classical probabilistic formulation of population coding. Beyond the conventional role of population codes as output representations in [11], the proposed population-coded lattice is conceived as a structured input representation for deeplearning architectures, with cross-entropy optimization serving as an objective for pattern classification. This work demonstrates a direct relation between random analog timed-series data and it’s probability encoding, P(si(t)). This work unifies population coding and information theory within a finite matrix-basedframework, offering a computational reinterpretation of neural representation and inference.
Category: Artificial Intelligence
[3] viXra:2605.0061 [pdf] submitted on 2026-05-16 20:22:38
Authors: Xiaohao Xie, Wenhua Jiao
Comments: 10 Pages. (Note by viXra Admin: Author name is required in the article after the article title and please submit article written with AI assistance to ai.viXra.org)
Person Re-Identification (ReID) struggles with discriminative feature learning due to extreme intra-class variance and ambiguous boundary samples. Existing metric losses are often constrained by local mini-batch mining or rigid distance margins that ignore contextual data structures. To address these issues, we propose Se-ReID, a unified framework that enhances feature space representation through instance-level and centroid-level innovations. At the instance level, we introduce TriHard+ Loss with dynamic routing to prevent manifold collapse, alongside an alternative TriWeight Loss utilizing hard-adapted soft weighting to preserve dense intra-class structures. At the centroid level, we propose CentroidM Loss, which leverages learnable global proxies to transcend mini-batch limitations and effectively soften inter-class boundaries. These core metric modules are further supported by 1st & 2nd order mask techniques to eliminate sampling bias, and a streamlined cross-camera centroid retrieval strategy to filter gallery noise. Extensive experiments demonstrate that Se-ReID achieves remarkable performance on standard benchmarks (Market1501 and DukeMTMC-ReID) without relying on ReRank. Notably, it yields state-of-the-art (SOTA) results when integrated with the SOLIDER Transformer baseline, confirming its robust effectiveness and broad applicability across diverse architectures.improvements on MNIST. The code will be released.
Category: Artificial Intelligence
[2] viXra:2605.0052 [pdf] submitted on 2026-05-13 19:21:40
Authors: Raghavendra Venkateshappa
Comments: 16 Pages. (Note by viXra Admin: Please submit article written with AI assistance to ai.viXra.org)
Non-deterministic agentic AI systems present fundamental challenges for traditional performance testing methodologies that rely on deterministic metrics and reproducible measurements. We propose a novel probabilistic performance profiling framework that models agent performance as probability distributions rather than point estimates. Our approach leverages Monte Carlo sampling to generate comprehensive performance distribution profiles across diverse execution contexts, while employing Bayesian inference for continuous model refinement based on observed system behavior. The framework provides confidence intervals, performance bounds, and probabilistic guarantees that enable robust decision-making under uncertainty. Extensive evaluation on multiple agent frameworks demonstrates that our approach captures performance variability more accurately than traditional methods, providing 95% confidence intervals with mean absolute errors below 8% across different task complexities. This work establishes the foundational framework for probabilistic performance assessment in agentic systems, enabling more reliable deployment and monitoring of non-deterministic AI agents.
Category: Artificial Intelligence
[1] viXra:2605.0027 [pdf] submitted on 2026-05-09 13:34:51
Authors: Han de Bruijn
Comments: 5 Pages.
An extremely simple single-layer feedforward 2 x 2 neural network is the subject of this article. Because I feel it is important to understand some essential features of neural networks without the help of a computer. The network at hand can be completely described, mathematically, by elementary linear algebra. A working example with two inputs and one output is leading to the general case. A counter example with two outputs instead of one is presented as well. It is concluded that the network with one output has learning capability and the network with two outputs has not. The behaviour of the first network can be formulated in geometric terms: all points on a straight line through two given points in the input plane give the desired output. There are no other inputs that do the job. The network with two outputs, on the contrary, is not able to make any generalization. It does not learn from experience, so to speak. It's kind of surprising that the more intelligent network is characterized by a singular matrix, and the dumber network by a regular matrix of weights.
Category: Artificial Intelligence