KISTI Develops Innovative Technology to Evaluate AI Reasoning Processes

배혜림 2026-06-12 View. 4,940 https://doi.org/10.48550/arXiv.2605.29656

- Accepted at ICML 2026, a top-tier international conference on artificial intelligence and machine learning

- Quantitatively assesses AI cognitive processes by analyzing reasoning structures rather than merely checking for correct answers

□ The Korea Institute of Science and Technology Information (President Sik Lee, hereafter KISTI) announced on June 12 that a research team led by Dr. Heyoung Yang at the Applied Agent Research Center has developed TRACE (Toulmin-based Reasoning Assessment through Constructive Elements), a novel evaluation technology capable of assessing the reasoning processes of artificial intelligence (AI). This research achievement has been accepted for presentation at the International Conference on Machine Learning (ICML) 2026, one of the world's most prestigious conferences in the fields of AI and machine learning.

□ ICML is recognized as a globally leading conference representing the fields of artificial intelligence and machine learning. It serves as a premier academic event where the latest AI research breakthroughs from global research institutions and major big tech companies are presented annually.

□ Recently, Large Language Models (LLMs) have demonstrated high performance using Chain-of-Thought (CoT) prompting to solve complex problems step-by-step. However, conventional evaluation methods focus heavily on the correctness of the final output. This makes it difficult to understand the path the AI took to arrive at its conclusion and limits evaluation capabilities in environments where ground-truth answers are unavailable.

□ To overcome these limitations, the research team developed an evaluation technology that simultaneously analyzes an AI’s reasoning structure and self-monitoring process. This was achieved by combining the Toulmin argumentation model—a representative framework in argumentation theory—with the metacognition theory of cognitive psychologist John Flavell.

□ TRACE deconstructs reasoning sentences generated by AI into eight distinct elements: Claim, Evidence, Warrant, Backing, Evaluation, Qualifier, Rebuttal, and Monitoring. It then analyzes the validity of each element as well as the logical connectivity between sentences.

□ The research team trained the TRACE-DeBERTa model using approximately 100,000 reasoning sentences and analyzed over 26,000 reasoning instances across seven major language models. The results demonstrated a high correlation (Pearson $r=0.741$) between TRACE scores and actual benchmark accuracy.

□ TRACE demonstrated potential not only as an evaluation metric but also as an effective reward signal for LLM reinforcement learning. When TRACE was integrated into conventional Reinforcement Learning with Verifiable Rewards (RLVR) methods—which previously relied solely on the correctness of answers as a reward signal—the team confirmed that the reasoning performance of LLMs could be enhanced even further.

□ Dr. Heyoung Yang of KISTI stated "TRACE can explain at which stage the AI reasoned logically and at which stage uncertainty or self-contradiction occurred. This can complement the limitations of existing black-box evaluation methods and those that depend entirely on ground-truth answers."

□ Recent AI research is moving beyond mere performance competition toward evaluating and understanding the specific evidence and logic AI uses to reach conclusions. The acceptance at ICML 2026 signifies that TRACE has been recognized for its academic and practical value in the field of AI reasoning evaluation.

□ Paper Overview

○ Title: TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation (https://doi.org/10.48550/arXiv.2605.29656)

○ Conference: The 43rd International Conference on Machine Learning (ICML 2026)

○ Authors: Yundong Kim (First Author), Heyoung Yang (Corresponding Author)

TRACE (Toulmin-based Reasoning Assessment through Constructive Elements)

Prev Effective International Research Collaboration: Being a Strong Partner Matters More Than Finding One Next KISTI Launches HPC Infrastructure to Enhance ASEAN Digital Capabilities