Building an AI chat application with Next.js can quickly become problematic when utilizing serverless deployments like Vercel or AWS Lambda. Standard HTTP requests wait for the entire payload to generate. If an AI response takes longer than 15 seconds, your backend execution caps out and returns an abrupt 504 Timeout Error.
To fix this, you must shift from traditional JSON payloads to Server-Sent Events (SSE) or utilize the modern Vercel AI SDK to stream text chunks iteratively to the client interface as soon as they are processed by the LLM.
In this guide, we will implement a clean architecture to handle streaming effortlessly.
1. The Route Handler (Backend Stream Setup)
Instead of returning a standard NextResponse.json(), we configure our Next.js App Router handler to return a custom ReadableStream with a text/event-stream header.
Create or update your route file (app/api/chat/route.js):
JavaScript
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export const runtime = 'edge'; // Optional: Use Edge runtime for zero cold-start times
export async function POST(req) {
const { messages } = await req.json();
// Trigger a streaming completion from OpenAI
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
stream: true,
messages,
});
// Transform the OpenAI stream into an HTTP response stream
const stream = new ReadableStream({
async start(controller) {
const encoder = new TextEncoder();
for await (const chunk of response) {
const text = chunk.choices[0]?.delta?.content || '';
if (text) {
controller.enqueue(encoder.encode(text));
}
}
controller.close();
},
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream; charset=utf-8',
'Cache-Control': 'no-cache, no-transform',
'Connection': 'keep-alive',
},
});
}
2. Consuming the Stream on the Frontend Component
On the client side, we cannot use a simple await response.json(). Instead, we must read from the response’s body stream reader loop by looping through the chunks dynamically.
JavaScript
'use client';
import { useState } from 'react';
export default function ChatComponent() {
const [input, setInput] = useState('');
const [chatLog, setChatLog] = useState('');
const handleSubmit = async (e) => {
e.preventDefault();
setChatLog(''); // Clear and prepare log
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ messages: [{ role: 'user', content: input }] }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let done = false;
while (!done) {
const { value, done: doneReading } = await reader.read();
done = doneReading;
const chunkValue = decoder.decode(value);
// Append incoming tokens to state dynamically for the typing effect
setChatLog((prev) => prev + chunkValue);
}
};
return (
<div className="p-4 max-w-xl mx-auto">
<form onSubmit={handleSubmit} className="flex gap-2 mb-4">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
className="border p-2 w-full rounded"
placeholder="Ask something..."
/>
<button type="submit" className="bg-blue-600 text-white p-2 rounded">Send</button>
</form>
<div className="bg-gray-100 p-4 rounded min-h-[150px] whitespace-pre-wrap">
{chatLog || "AI response will stream here..."}
</div>
</div>
);
}
Standard Vercel serverless functions have a strict execution time limit (10–15 seconds on hobby tiers). Because deep AI completions can take longer to fully generate, waiting for a full JSON response triggers a 504 Gateway Timeout.
Switching the Next.js execution runtime to the Edge network bypasses traditional serverless timeout limitations entirely and removes cold start delays, making it ideal for processing long, multi-second LLM streaming streams.
Can I use standard Axios to parse a server-sent stream? While classic Axios configuration can support streaming, using native fetch() alongside browser stream readers (ReadableStreamDefaultReader) is much more straightforward and lightweight for handling token typing animations on frontend components.