멀티턴 메모리 AI 챗봇: 대화 맥락 유지와 장기 기억 구현

이 글은 누구를 위한 것인가

이전 대화를 기억하는 AI 챗봇을 구현하려는 팀
토큰 제한 내에서 긴 대화를 관리하려는 개발자
사용자별 장기 선호도를 기억하는 개인화 챗봇을 만들려는 팀

들어가며

LLM은 기본적으로 무상태(stateless)다. 각 요청이 독립적이라 이전 대화를 기억하지 못한다. 대화 히스토리를 컨텍스트에 포함시켜야 하지만, 토큰 제한이 있다. 오래된 대화는 요약해서 압축하고, 중요 정보는 별도로 저장한다.

이 글은 bluefoxdev.kr의 멀티턴 AI 챗봇 메모리 가이드 를 참고하여 작성했습니다.

1. 대화 메모리 아키텍처

[3계층 메모리 구조]

단기 기억 (Short-term):
  현재 대화 히스토리 (최근 N개)
  컨텍스트 윈도우에 직접 포함
  Redis: TTL 24시간

중기 기억 (Mid-term):
  이전 대화 요약
  LLM으로 자동 압축
  Redis/DB: TTL 30일

장기 기억 (Long-term):
  사용자 선호도, 중요 정보
  벡터 DB에 임베딩 저장
  의미 검색으로 관련 기억 조회
  영구 보존

[컨텍스트 구성]
  System: 역할 + 장기 기억 요약
  Human: 이전 대화 요약 (중기)
  Human/AI: 최근 10개 대화 (단기)
  Human: 현재 질문

[토큰 관리]
  Claude Sonnet 4.6: 200K 컨텍스트
  실용적 제한: 10K 이하 유지 (비용)
  히스토리: 최근 20개 메시지
  압축 트리거: 15개 초과 시

2. 멀티턴 챗봇 구현

import Anthropic from '@anthropic-ai/sdk';
import { Redis } from 'ioredis';

const client = new Anthropic();
const redis = new Redis(process.env.REDIS_URL!);

interface Message { role: 'user' | 'assistant'; content: string; timestamp: number; }
interface UserMemory { summary: string; preferences: string[]; importantFacts: string[]; }

class ChatbotWithMemory {
  private readonly maxRecentMessages = 20;
  private readonly summaryThreshold = 15;

  async chat(userId: string, sessionId: string, userMessage: string): Promise<ReadableStream> {
    // 1. 대화 히스토리 로드
    const history = await this.loadHistory(sessionId);
    const userMemory = await this.loadUserMemory(userId);

    // 2. 히스토리가 임계값 초과 시 요약
    if (history.length >= this.summaryThreshold) {
      await this.summarizeOldMessages(userId, sessionId, history);
    }

    // 3. 컨텍스트 구성
    const recentMessages = history.slice(-this.maxRecentMessages);
    const systemPrompt = this.buildSystemPrompt(userMemory);

    // 4. 스트리밍 응답
    const stream = await client.messages.stream({
      model: 'claude-sonnet-4-6',
      max_tokens: 2048,
      system: systemPrompt,
      messages: [
        ...recentMessages.map(m => ({ role: m.role, content: m.content })),
        { role: 'user', content: userMessage },
      ],
    });

    // 5. 응답 저장 (백그라운드)
    const fullResponse = await stream.finalText();
    await this.saveMessages(sessionId, [
      { role: 'user', content: userMessage, timestamp: Date.now() },
      { role: 'assistant', content: fullResponse, timestamp: Date.now() },
    ]);

    // 6. 중요 정보 추출 (백그라운드)
    this.extractAndSaveMemory(userId, userMessage, fullResponse);

    return stream.toReadableStream();
  }

  private async loadHistory(sessionId: string): Promise<Message[]> {
    const data = await redis.get(`chat:${sessionId}`);
    return data ? JSON.parse(data) : [];
  }

  private async loadUserMemory(userId: string): Promise<UserMemory> {
    const data = await redis.get(`memory:${userId}`);
    return data ? JSON.parse(data) : { summary: '', preferences: [], importantFacts: [] };
  }

  private buildSystemPrompt(memory: UserMemory): string {
    return `당신은 개인화된 AI 어시스턴트입니다.

${memory.summary ? `사용자 정보:\n${memory.summary}\n` : ''}
${memory.preferences.length ? `선호도:\n${memory.preferences.join('\n')}\n` : ''}
${memory.importantFacts.length ? `중요 기억:\n${memory.importantFacts.join('\n')}` : ''}

자연스럽게 이전 대화 맥락을 유지하며 답변하세요.`;
  }

  private async summarizeOldMessages(userId: string, sessionId: string, history: Message[]) {
    const toSummarize = history.slice(0, -10); // 최근 10개 제외 요약
    const recent = history.slice(-10);

    const summary = await client.messages.create({
      model: 'claude-haiku-4-5-20251001',
      max_tokens: 500,
      messages: [{
        role: 'user',
        content: `다음 대화를 3-5문장으로 요약하세요:\n\n${toSummarize.map(m => `${m.role}: ${m.content}`).join('\n')}`,
      }],
    });

    const summaryText = summary.content[0].type === 'text' ? summary.content[0].text : '';
    await redis.set(`summary:${sessionId}`, summaryText, 'EX', 86400 * 30);
    await redis.set(`chat:${sessionId}`, JSON.stringify(recent), 'EX', 86400);
  }

  private async saveMessages(sessionId: string, newMessages: Message[]) {
    const history = await this.loadHistory(sessionId);
    await redis.set(`chat:${sessionId}`, JSON.stringify([...history, ...newMessages]), 'EX', 86400);
  }

  private async extractAndSaveMemory(userId: string, userMsg: string, assistantMsg: string) {
    // 백그라운드에서 중요 정보 추출 (비동기)
    client.messages.create({
      model: 'claude-haiku-4-5-20251001',
      max_tokens: 200,
      messages: [{
        role: 'user',
        content: `이 대화에서 기억할 중요한 사용자 정보나 선호도가 있으면 추출하세요. 없으면 "없음"이라고만 답하세요.\n\n사용자: ${userMsg}\n어시스턴트: ${assistantMsg}`,
      }],
    }).then(async (res) => {
      const fact = res.content[0].type === 'text' ? res.content[0].text : '';
      if (fact !== '없음') {
        const memory = await this.loadUserMemory(userId);
        memory.importantFacts.push(fact);
        if (memory.importantFacts.length > 20) memory.importantFacts.shift();
        await redis.set(`memory:${userId}`, JSON.stringify(memory), 'EX', 86400 * 365);
      }
    });
  }
}

마무리

멀티턴 챗봇 메모리의 핵심은 3계층 구조다. 단기 기억(Redis 히스토리), 중기 기억(LLM 요약), 장기 기억(중요 사실 추출)으로 분리하면 토큰 효율을 유지하면서 개인화된 경험을 제공한다. 요약은 히헤드리스 모델(claude-haiku)로 처리해 비용을 절감하고, 중요 정보 추출은 백그라운드 비동기로 실행해 응답 지연을 방지한다.