이미지 생성 AI 프로덕션 배포: Stable Diffusion을 실서비스로 운영하기

이 글은 누구를 위한 것인가

Stable Diffusion을 서비스에 통합하려는 팀
GPU 비용을 최소화하면서 이미지 생성을 운영하고 싶은 팀
ComfyUI API 연동 방법을 찾는 개발자

들어가며

이미지 생성 AI는 GPU가 필요하고, GPU는 비싸다. 스팟 인스턴스와 큐 기반 비동기 처리로 비용을 70-80% 줄이면서 서비스 품질을 유지할 수 있다.

이 글은 bluefoxdev.kr의 AI 인프라 비용 최적화 를 참고하여 작성했습니다.

1. 이미지 생성 인프라 설계

[이미지 생성 서비스 아키텍처]

API 서버 (CPU):
  요청 수신 → 큐에 전달
  결과 조회 API
  사용자 인증, 쿼터 관리

메시지 큐 (Redis / SQS):
  생성 작업 버퍼
  우선순위 큐 (유료 사용자 우선)

GPU 워커 (Spot Instance):
  큐에서 작업 수신
  ComfyUI / Diffusers로 이미지 생성
  결과를 S3에 업로드
  작업 완료 알림

Storage:
  생성된 이미지 → S3
  CDN으로 빠른 전달

[GPU 비용 최적화]
  온디맨드 GPU: $3-5/시간
  스팟 인스턴스: $0.7-1.5/시간 (70% 절약)
  
  스팟 인스턴스 주의사항:
  - 언제든 중단될 수 있음
  - 중단 시 작업 재큐잉 로직 필요
  - 처리 중이던 작업 상태 저장

[응답 시간 전략]
  즉시 필요: 온디맨드 GPU (비쌈)
  비동기 가능: 스팟 인스턴스 (저렴)
  대기 허용: 5-10분 이내로 SLA 설정

2. ComfyUI API 연동

import asyncio
import httpx
import json
import uuid
from typing import AsyncGenerator

class ComfyUIClient:
    """ComfyUI WebSocket API 클라이언트"""
    
    def __init__(self, host: str = "localhost", port: int = 8188):
        self.base_url = f"http://{host}:{port}"
        self.ws_url = f"ws://{host}:{port}"
        self.client_id = str(uuid.uuid4())
    
    async def generate_image(
        self,
        prompt: str,
        negative_prompt: str = "",
        width: int = 1024,
        height: int = 1024,
        steps: int = 20,
        cfg_scale: float = 7.0,
        model: str = "v1-5-pruned.safetensors",
    ) -> bytes:
        """이미지 생성 및 결과 반환"""
        
        workflow = self._build_workflow(
            prompt=prompt,
            negative_prompt=negative_prompt,
            width=width,
            height=height,
            steps=steps,
            cfg_scale=cfg_scale,
            model=model,
        )
        
        # 프롬프트 큐에 추가
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.base_url}/prompt",
                json={"prompt": workflow, "client_id": self.client_id},
            )
            prompt_id = response.json()["prompt_id"]
        
        # WebSocket으로 완료 대기
        async with asyncio.timeout(300):  # 5분 타임아웃
            async for image_data in self._wait_for_completion(prompt_id):
                return image_data
    
    def _build_workflow(self, **kwargs) -> dict:
        """ComfyUI 워크플로우 JSON 구성"""
        return {
            "4": {
                "class_type": "CheckpointLoaderSimple",
                "inputs": {"ckpt_name": kwargs["model"]},
            },
            "6": {
                "class_type": "CLIPTextEncode",
                "inputs": {
                    "text": kwargs["prompt"],
                    "clip": ["4", 1],
                },
            },
            "7": {
                "class_type": "CLIPTextEncode",
                "inputs": {
                    "text": kwargs["negative_prompt"],
                    "clip": ["4", 1],
                },
            },
            "3": {
                "class_type": "KSampler",
                "inputs": {
                    "seed": -1,
                    "steps": kwargs["steps"],
                    "cfg": kwargs["cfg_scale"],
                    "sampler_name": "euler",
                    "scheduler": "normal",
                    "denoise": 1,
                    "model": ["4", 0],
                    "positive": ["6", 0],
                    "negative": ["7", 0],
                    "latent_image": ["5", 0],
                },
            },
            "5": {
                "class_type": "EmptyLatentImage",
                "inputs": {
                    "batch_size": 1,
                    "height": kwargs["height"],
                    "width": kwargs["width"],
                },
            },
            "8": {
                "class_type": "VAEDecode",
                "inputs": {"samples": ["3", 0], "vae": ["4", 2]},
            },
            "9": {
                "class_type": "SaveImage",
                "inputs": {
                    "filename_prefix": "output",
                    "images": ["8", 0],
                },
            },
        }
    
    async def _wait_for_completion(self, prompt_id: str) -> AsyncGenerator[bytes, None]:
        import websockets
        
        async with websockets.connect(
            f"{self.ws_url}/ws?clientId={self.client_id}"
        ) as ws:
            async for message in ws:
                data = json.loads(message)
                
                if data.get("type") == "executed" and data.get("data", {}).get("prompt_id") == prompt_id:
                    # 이미지 파일 다운로드
                    output = data["data"]["output"]
                    if "images" in output:
                        image_info = output["images"][0]
                        async with httpx.AsyncClient() as client:
                            img_response = await client.get(
                                f"{self.base_url}/view",
                                params={"filename": image_info["filename"]},
                            )
                        yield img_response.content

# 이미지 생성 서비스
async def generate_and_upload(request: dict) -> str:
    """이미지 생성 후 S3 업로드"""
    
    # NSFW 프롬프트 필터링
    if await is_nsfw_prompt(request["prompt"]):
        raise ValueError("NSFW 프롬프트는 허용되지 않습니다")
    
    comfy = ComfyUIClient()
    image_data = await comfy.generate_image(
        prompt=request["prompt"],
        negative_prompt=request.get("negative_prompt", ""),
        width=request.get("width", 1024),
        height=request.get("height", 1024),
    )
    
    # S3 업로드
    image_key = f"generated/{uuid.uuid4()}.png"
    await upload_to_s3(image_data, image_key)
    
    return f"https://cdn.example.com/{image_key}"

마무리

이미지 생성 AI 서비스의 핵심 비용 구조는 "GPU 시간"이다. 스팟 인스턴스 + 큐 기반 비동기 처리로 비용을 70% 줄이면서도 5분 이내 응답을 보장할 수 있다. NSFW 필터링은 법적·윤리적 필수사항이며, 상업적 모델 사용 시 라이선스 확인을 반드시 해야 한다.