LLM 추론 강화: Chain-of-Thought와 고급 프롬프팅 기법

이 글은 누구를 위한 것인가

LLM이 복잡한 수학·논리 문제를 틀리는 팀
"왜 그런 결론을 냈는지" 추론 과정을 보고 싶은 개발자
더 나은 추론을 위한 프롬프트 기법을 배우고 싶은 엔지니어

들어가며

"회의실 예약 시스템에서 3명이 각각 2시간씩 쓸 때 총 몇 시간이냐"는 질문에 LLM이 틀리는 경우가 있다. Chain-of-Thought(CoT)로 "먼저 생각하고 답하게" 하면 정확도가 크게 오른다.

이 글은 bluefoxdev.kr의 프롬프트 엔지니어링 가이드 를 참고하여 작성했습니다.

1. 추론 기법 비교

[추론 프롬프팅 방법]

Zero-shot CoT:
  "단계별로 생각하세요" 한 줄 추가
  복잡도: 낮음, 효과: 중간
  예시: "먼저 생각하고, 그 다음 답을 제시하세요"

Few-shot CoT:
  추론 예시 2-5개 포함
  복잡도: 중간, 효과: 높음
  예시: 비슷한 문제와 단계별 풀이 보여주기

Tree-of-Thought (ToT):
  여러 추론 경로 탐색
  가지치기(backtracking)로 최적 경로
  복잡도: 높음, 효과: 매우 높음
  예시: 체스, 퍼즐, 복잡한 계획

ReAct (Reason + Act):
  추론과 툴 사용 결합
  "생각 → 행동 → 관찰 → 생각..."
  에이전트 시스템에 적합

자기 검증 (Self-Critique):
  답을 생성 → 오류 찾기 → 수정
  "방금 한 답에서 실수를 찾아보세요"

[언제 어떤 기법을 쓰나]
  산수/논리: Zero-shot CoT로 충분
  전문 도메인: Few-shot CoT + 예시 필수
  다단계 계획: ReAct
  창의적 문제: ToT
  고정밀도 필요: 자기 검증 + 앙상블

2. 추론 기법 구현

import anthropic
from dataclasses import dataclass

client = anthropic.Anthropic()

def chain_of_thought(
    question: str,
    model: str = "claude-sonnet-4-6",
) -> dict:
    """Zero-shot CoT"""
    
    response = client.messages.create(
        model=model,
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""{question}

<thinking>단계별로 생각해보겠습니다:</thinking>

먼저 이 문제를 단계별로 풀어보겠습니다."""
        }]
    )
    
    full_text = response.content[0].text
    
    # 최종 답변 추출 (마지막 단락)
    paragraphs = [p.strip() for p in full_text.split("\n\n") if p.strip()]
    
    return {
        "reasoning": full_text,
        "answer": paragraphs[-1] if paragraphs else full_text,
    }

def few_shot_cot(
    question: str,
    examples: list[dict],
    model: str = "claude-sonnet-4-6",
) -> str:
    """Few-shot CoT: 예시를 포함한 추론"""
    
    examples_text = ""
    for ex in examples:
        examples_text += f"""
예시:
질문: {ex['question']}
풀이: {ex['reasoning']}
답: {ex['answer']}
---"""
    
    response = client.messages.create(
        model=model,
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""다음 예시를 참고하여 단계별로 풀어보세요.
{examples_text}

풀 문제:
질문: {question}
풀이:"""
        }]
    )
    
    return response.content[0].text

async def self_critique_and_refine(
    question: str,
    n_iterations: int = 2,
) -> str:
    """자기 검증으로 답변 개선"""
    
    # 초기 답변
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1000,
        messages=[{"role": "user", "content": question}]
    )
    current_answer = response.content[0].text
    
    for i in range(n_iterations):
        # 자기 검증
        critique_response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1000,
            messages=[{
                "role": "user",
                "content": f"""다음 답변에서 오류나 개선점을 찾으세요.

질문: {question}
현재 답변: {current_answer}

비판적으로 검토하고 더 나은 답변을 제시하세요."""
            }]
        )
        
        current_answer = critique_response.content[0].text
    
    return current_answer

def consistency_vote(
    question: str,
    n_samples: int = 5,
) -> dict:
    """자기 일관성: 다수결로 최종 답 선택"""
    
    responses = []
    for _ in range(n_samples):
        r = client.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": f"{question}\n\n단계별로 풀고 최종 답만 마지막 줄에 써주세요."
            }]
        )
        responses.append(r.content[0].text)
    
    # 최종 답 추출 (마지막 줄)
    answers = [r.strip().split("\n")[-1] for r in responses]
    
    # 다수결
    from collections import Counter
    vote = Counter(answers)
    winner, count = vote.most_common(1)[0]
    
    return {
        "final_answer": winner,
        "confidence": count / n_samples,
        "all_answers": answers,
        "vote_distribution": dict(vote),
    }

def react_step(
    thought: str,
    available_tools: list[str],
) -> dict:
    """ReAct: 생각 → 행동 결정"""
    
    tools_str = "\n".join(f"- {t}" for t in available_tools)
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""현재 생각: {thought}

사용 가능한 도구:
{tools_str}

다음 행동을 결정하세요. JSON으로:
{{"action": "도구명 또는 'answer'", "action_input": "입력", "reasoning": "이유"}}"""
        }]
    )
    
    import json
    return json.loads(response.content[0].text)

마무리

CoT는 "더 많이 생각하게" 하는 기법이다. 가장 쉬운 시작은 "단계별로 생각하세요"라는 한 줄이다. 수학 계산이나 코드 디버깅에는 자기 일관성 표결로 정확도를 5-10% 더 올릴 수 있다. 고품질이 필요한 경우에는 자기 검증 반복이 효과적이다.