简介
本文介绍Spring Ai流式对话的实战,以这三种方式进行实战:流式返回字符串结果、流式返回全量SSE数据、流式返回手动调用接口的结果。
备注:之前文章介绍过Spring Ai对话的实战:Spring Ai–快速入门1:对话实战 – 自学精灵
流式会话的原理是什么?
流式会话的核心:服务端持续地向客户端发送数据。实现方案有两种:WebSocket、SSE。
WebSocket:大部分人都知道,它是长连接,全双工,客户端可以持续向服务端发送数据,服务端也可以持续地向客户端发送数据。
SSE:全称是:Server-Sent Events,中文含义:服务器发送事件。服务端可以持续地向客户端发送数据,但客户端是一次性的发送请求给服务器。
两者详细的对比可以看这里:WebSocket与SSE的对比 – 自学精灵
对比后可以发现:SSE更适合用于AI流式数据返回。
Spring Ai底层就是基于SSE。
代码
其余代码和此文一致。本处只贴出不一样的地方(核心代码):
package com.knife.example.ai.chat.controller;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.Parameter;
import io.swagger.v3.oas.annotations.tags.Tag;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.MediaType;
import org.springframework.http.codec.ServerSentEvent;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.reactive.function.BodyInserters;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Flux;
import java.util.concurrent.atomic.AtomicBoolean;
/**
* 可以用apifox测试
*/
@Slf4j
@Tag(name = "流式对话")
@RequestMapping("streamChat")
@RestController
public class StreamChatController {
@Autowired
private ChatClient chatClient;
@Value("${spring.ai.openai.base-url}")
private String baseUrl;
@Value("${spring.ai.openai.api-key}")
private String apiKey;
@Value("${spring.ai.openai.chat.completions-path}")
private String completionsPath;
@Value("${spring.ai.openai.chat.options.model}")
private String model;
@Operation(summary = "流式输出_字符串")
@GetMapping(value = "streamString", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@Parameter(description = "消息") String message) {
return chatClient.prompt(message).stream().content();
}
@Operation(summary = "流式输出_全量SSE")
@GetMapping(value = "streamFullSse", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<ChatResponse>> streamFullSse(@Parameter(description = "消息") String message) {
// 使用chatClient生成对话
Prompt prompt = new Prompt(message);
return chatClient
.prompt(prompt)
.stream()
.chatResponse()
.map(chatResponse -> {
return ServerSentEvent
.builder(chatResponse)
.event("message")
.build();
});
}
@Operation(summary = "流式输出_手动调接口")
@GetMapping(value = "streamManualApi", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<String>> streamManualApi(@Parameter(description = "消息") String message) {
WebClient webClient = WebClient.builder()
.build();
AtomicBoolean firstResponseFlag = new AtomicBoolean(false);
String requestBody = String.format(
"{" +
" \"model\": \"%s\"," +
" \"messages\": [" +
" {\"role\": \"user\", \"content\": \"%s\"}" +
" ]," +
" \"stream\": true," +
" \"stream_options\": {\"include_usage\": true}" +
"}",
model,
message);
return webClient.post()
.uri(baseUrl + completionsPath)
.header("Authorization", "Bearer " + apiKey)
.header("Content-Type", "application/json")
.body(BodyInserters.fromValue(requestBody))
.retrieve()
.bodyToFlux(String.class)
.map(chunk -> {
if (!firstResponseFlag.get()) {
firstResponseFlag.set(true);
log.info("第一次响应:{}", chunk);
}
// 解析响应数据
return ServerSentEvent.<String>builder()
.data(chunk)
.build();
});
}
}
测试
用apifox进行测试。
1. 流式返回字符串结果
可以发现:Flux<String>这种写法,直接返回了回答的内容,没有额外的数据。
适合只需要结果的业务场景。


2. 流式返回全量SSE数据
可以发现:Flux<ServerSentEvent<ChatResponse>>这种写法会返回全量SSE数据,里边会包含AI的回答。而且,第一个响应和最后一个响应,真实数据的字段text,是空的。
适合需要全量SSE数据的业务场景。实际上,可以对结果进行定制,比如只取ChatResponse里的某个值,只需改写返回值泛型以及这个地方:
.map(chatResponse -> {
return ServerSentEvent
.builder(chatResponse)
.event("message")
.build();
});
第一个响应:

中间的响应

最后一个响应

这里贴出完整的单条响应:
{
"result": {
"output": {
"messageType": "ASSISTANT",
"metadata": {
"role": "ASSISTANT",
"messageType": "ASSISTANT",
"finishReason": "",
"refusal": "",
"index": 0,
"annotations": [],
"id": "chatcmpl-e20dbdd1-4e7d-9f48-9ea3-eecdf987d3c9",
"reasoningContent": ""
},
"toolCalls": [],
"media": [],
"text": "我可以"
},
"metadata": {
"finishReason": "",
"contentFilters": [],
"empty": true
}
},
"results": [
{
"output": {
"messageType": "ASSISTANT",
"metadata": {
"role": "ASSISTANT",
"messageType": "ASSISTANT",
"finishReason": "",
"refusal": "",
"index": 0,
"annotations": [],
"id": "chatcmpl-e20dbdd1-4e7d-9f48-9ea3-eecdf987d3c9",
"reasoningContent": ""
},
"toolCalls": [],
"media": [],
"text": "我可以"
},
"metadata": {
"finishReason": "",
"contentFilters": [],
"empty": true
}
}
],
"metadata": {
"id": "chatcmpl-e20dbdd1-4e7d-9f48-9ea3-eecdf987d3c9",
"model": "qwen-plus",
"rateLimit": {
"tokensLimit": 0,
"requestsReset": "PT0S",
"tokensRemaining": 0,
"tokensReset": "PT0S",
"requestsLimit": 0,
"requestsRemaining": 0
},
"usage": {
"promptTokens": 0,
"nativeUsage": {},
"completionTokens": 0,
"totalTokens": 0
},
"promptMetadata": [],
"empty": false
}
}
3 流式返回手动调用接口数据
可以发现:Flux<ServerSentEvent<String>>这种写法会返回SSE部分数据,里边会包含AI的回答。而且,第一个响应和倒数第二个响应,真实数据的字段content,是空的;最后一个响应,是个统计性的数据,连content字段都没有,里边有一些token使用量等使用数据。
适合需要高度定制的业务场景。因为此法是直接使用WebFlux进行的接口调用,自由度最大。
第一次响应:

最后两个响应:

这里贴出详细数据
第一个响应:
{
"choices": [
{
"delta": {
"content": "",
"role": "assistant"
},
"index": 0,
"logprobs": null,
"finish_reason": null
}
],
"object": "chat.completion.chunk",
"usage": null,
"created": 1768920120,
"system_fingerprint": null,
"model": "qwen-plus",
"id": "chatcmpl-174ca428-8fa3-9129-8567-69ccc39199d9"
}
中间的响应:
{
"choices": [
{
"finish_reason": null,
"logprobs": null,
"delta": {
"content": "你好"
},
"index": 0
}
],
"object": "chat.completion.chunk",
"usage": null,
"created": 1768920120,
"system_fingerprint": null,
"model": "qwen-plus",
"id": "chatcmpl-174ca428-8fa3-9129-8567-69ccc39199d9"
}
倒数第二个响应:
{
"choices": [
{
"finish_reason": "stop",
"delta": {
"content": ""
},
"index": 0,
"logprobs": null
}
],
"object": "chat.completion.chunk",
"usage": null,
"created": 1768920120,
"system_fingerprint": null,
"model": "qwen-plus",
"id": "chatcmpl-174ca428-8fa3-9129-8567-69ccc39199d9"
}
最后一个响应:
{
"choices": [],
"object": "chat.completion.chunk",
"usage": {
"prompt_tokens": 11,
"completion_tokens": 278,
"total_tokens": 289,
"prompt_tokens_details": {
"cached_tokens": 0
}
},
"created": 1768920120,
"system_fingerprint": null,
"model": "qwen-plus",
"id": "chatcmpl-174ca428-8fa3-9129-8567-69ccc39199d9"
}

请先 !