Stream OpenAI API Responses in Next.js Safely

Building an AI chat application with Next.js can quickly become problematic when utilizing serverless deployments like Vercel or AWS Lambda. Standard HTTP requests wait for the entire payload to generate. If an AI response takes longer than 15 seconds, your backend execution caps out and returns an abrupt 504 Timeout Error.

To fix this, you must shift from traditional JSON payloads to Server-Sent Events (SSE) or utilize the modern Vercel AI SDK to stream text chunks iteratively to the client interface as soon as they are processed by the LLM.

In this guide, we will implement a clean architecture to handle streaming effortlessly.

1. The Route Handler (Backend Stream Setup)

Instead of returning a standard NextResponse.json(), we configure our Next.js App Router handler to return a custom ReadableStream with a text/event-stream header.

Create or update your route file (app/api/chat/route.js):

JavaScript

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export const runtime = 'edge'; // Optional: Use Edge runtime for zero cold-start times

export async function POST(req) {
    const { messages } = await req.json();

    // Trigger a streaming completion from OpenAI
    const response = await openai.chat.completions.create({
        model: 'gpt-4o-mini',
        stream: true,
        messages,
    });

    // Transform the OpenAI stream into an HTTP response stream
    const stream = new ReadableStream({
        async start(controller) {
            const encoder = new TextEncoder();
            for await (const chunk of response) {
                const text = chunk.choices[0]?.delta?.content || '';
                if (text) {
                    controller.enqueue(encoder.encode(text));
                }
            }
            controller.close();
        },
    });

    return new Response(stream, {
        headers: {
            'Content-Type': 'text/event-stream; charset=utf-8',
            'Cache-Control': 'no-cache, no-transform',
            'Connection': 'keep-alive',
        },
    });
}

2. Consuming the Stream on the Frontend Component

On the client side, we cannot use a simple await response.json(). Instead, we must read from the response’s body stream reader loop by looping through the chunks dynamically.

JavaScript

'use client';
import { useState } from 'react';

export default function ChatComponent() {
    const [input, setInput] = useState('');
    const [chatLog, setChatLog] = useState('');

    const handleSubmit = async (e) => {
        e.preventDefault();
        setChatLog(''); // Clear and prepare log

        const response = await fetch('/api/chat', {
            method: 'POST',
            body: JSON.stringify({ messages: [{ role: 'user', content: input }] }),
        });

        const reader = response.body.getReader();
        const decoder = new TextDecoder();
        let done = false;

        while (!done) {
            const { value, done: doneReading } = await reader.read();
            done = doneReading;
            const chunkValue = decoder.decode(value);
            // Append incoming tokens to state dynamically for the typing effect
            setChatLog((prev) => prev + chunkValue);
        }
    };

    return (
        <div className="p-4 max-w-xl mx-auto">
            <form onSubmit={handleSubmit} className="flex gap-2 mb-4">
                <input 
                    value={input} 
                    onChange={(e) => setInput(e.target.value)}
                    className="border p-2 w-full rounded" 
                    placeholder="Ask something..."
                />
                <button type="submit" className="bg-blue-600 text-white p-2 rounded">Send</button>
            </form>
            <div className="bg-gray-100 p-4 rounded min-h-[150px] whitespace-pre-wrap">
                {chatLog || "AI response will stream here..."}
            </div>
        </div>
    );
}

Why does my OpenAI request timeout on Vercel?

Standard Vercel serverless functions have a strict execution time limit (10–15 seconds on hobby tiers). Because deep AI completions can take longer to fully generate, waiting for a full JSON response triggers a 504 Gateway Timeout.

What does ‘export const runtime = edge’ do for AI streaming?

Switching the Next.js execution runtime to the Edge network bypasses traditional serverless timeout limitations entirely and removes cold start delays, making it ideal for processing long, multi-second LLM streaming streams.

Can I use standard Axios to parse a server-sent stream?

While classic Axios configuration can support streaming, using native fetch() alongside browser stream readers (ReadableStreamDefaultReader) is much more straightforward and lightweight for handling token typing animations on frontend components.

1. The Route Handler (Backend Stream Setup)

2. Consuming the Stream on the Frontend Component

Leave a Reply Cancel reply