2 min read By Vamsi Karuturi · Senior Backend Engineer at Salesforce

Spring AI — Build AI-Powered Applications

Q: What is RAG and why would you use it over fine-tuning?

Answer: RAG (Retrieval Augmented Generation) gives the AI relevant context from your documents at query time, without modifying the model. Fine-tuning permanently alters the model's weights. Use RAG when: Your data changes frequently, you need citations/sources, you want to control what the AI knows, or you have a small dataset. Use fine-tuning when: You need the AI to learn a specific style/format, perform a specialized task the base model struggles with, or you need faster inference (no retrie

Q: How does Spring AI handle provider switching?

Answer: Spring AI uses a common ChatClient interface. Switching providers means changing the starter dependency and configuration properties — your application code doesn't change. This is the same abstraction pattern as Spring Data (switch from PostgreSQL to MongoDB by changing the driver, not the repository code).

Q: What are the security concerns with function calling?

Answer: The AI decides WHICH functions to call and with WHAT arguments based on user input. Risks: Prompt injection — malicious user input tricks the AI into calling destructive functions Over-permissioning — giving the AI access to functions it shouldn't have Input validation — the AI may pass unexpected arguments Mitigations: Validate all function inputs, use least-privilege (only expose safe functions), implement approval workflows for destructive actions, and log all function calls.

Integrate LLMs, embeddings, vector databases, and AI agents into Spring Boot with the same familiar programming model you already know.

Real-World Analogy

Spring AI does for AI what Spring Data JPA did for databases. Instead of manually crafting HTTP calls to OpenAI, managing token limits, parsing JSON responses, and implementing retry logic — you get a clean abstraction: inject a ChatClient, call .prompt("..."), and get a response. Switch from OpenAI to Anthropic by changing one property.

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph App["Your Spring Boot App"]
        CC["ChatClient"]
        EM["EmbeddingClient"]
        VS["VectorStore"]
    end

    subgraph Providers["AI Providers (swappable)"]
        OAI["OpenAI (GPT-4)"]
        ANT["Anthropic (Claude)"]
        OLL["Ollama (local)"]
    end

    subgraph Storage["Vector Databases"]
        PG["PGVector"]
        PIN["Pinecone"]
        CHR["Chroma"]
    end

    CC --> OAI & ANT & OLL
    EM --> OAI & ANT
    VS --> PG & PIN & CHR

    style CC fill:#EFF6FF,stroke:#DBEAFE,color:#1E40AF
    style ANT fill:#FEF3C7,stroke:#FCD34D,color:#1E40AF

Getting Started

Dependencies

XML

<!-- Spring AI BOM -->
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<!-- Choose your provider -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- OR -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>

Configuration

YAML

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7
          max-tokens: 2000
    # OR Anthropic
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
      chat:
        options:
          model: claude-sonnet-4-20250514
          max-tokens: 4096

ChatClient — Conversational AI

Basic Usage

Java

@RestController
@RequiredArgsConstructor
public class AIChatController {

    private final ChatClient chatClient;

    @GetMapping("/ai/chat")
    public String chat(@RequestParam String message) {
        return chatClient.prompt()
            .user(message)
            .call()
            .content();
    }
}

System Prompts & Context

Java

@Service
public class CodeReviewService {

    private final ChatClient chatClient;

    public CodeReviewResponse reviewCode(String code, String language) {
        return chatClient.prompt()
            .system("""
                You are a senior {language} code reviewer. 
                Focus on: security vulnerabilities, performance issues, 
                and adherence to SOLID principles.
                Respond in JSON format with fields: issues[], suggestions[], score (1-10).
                """)
            .user("Review this code:\n```%s\n%s\n```".formatted(language, code))
            .call()
            .entity(CodeReviewResponse.class);  // auto-maps JSON to Java object!
    }
}

Streaming Responses

Java

@GetMapping(value = "/ai/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamChat(@RequestParam String message) {
    return chatClient.prompt()
        .user(message)
        .stream()
        .content();  // tokens streamed as they're generated
}

RAG — Retrieval Augmented Generation

The killer pattern: give the AI context from YOUR data (docs, code, databases) so it answers accurately instead of hallucinating.

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph Ingestion["1. Ingestion (one-time)"]
        D["Documents<br/>(PDF, code, wiki)"] --> SP["Split into chunks"]
        SP --> EM["Generate embeddings"]
        EM --> VS[("Vector Store<br/>(PGVector)")]
    end

    subgraph Query["2. Query (runtime)"]
        Q["User Question"] --> QE["Embed question"]
        QE --> SR["Search similar chunks"]
        SR --> VS
        VS --> CTX["Top-K relevant chunks"]
        CTX --> LLM["LLM + Context → Answer"]
    end

    style VS fill:#FEF3C7,stroke:#FCD34D,color:#1E40AF
    style LLM fill:#ECFDF5,stroke:#6EE7B7,color:#1E40AF

Implementation

Java

@Service
@RequiredArgsConstructor
public class DocumentQAService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    // Ingest documents (run once or on upload)
    public void ingestDocuments(List<Document> documents) {
        // Spring AI handles chunking + embedding + storage
        vectorStore.add(documents);
    }

    // Answer questions using your documents
    public String askQuestion(String question) {
        // 1. Find relevant documents
        List<Document> relevant = vectorStore.similaritySearch(
            SearchRequest.query(question).withTopK(5)
        );

        // 2. Build context from retrieved docs
        String context = relevant.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n\n"));

        // 3. Ask LLM with context
        return chatClient.prompt()
            .system("""
                Answer the user's question based ONLY on the provided context.
                If the context doesn't contain the answer, say "I don't know."

                Context:
                {context}
                """)
            .user(question)
            .call()
            .content();
    }
}

Vector Store Configuration (PGVector)

YAML

spring:
  ai:
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1536          # matches OpenAI embedding dimensions
  datasource:
    url: jdbc:postgresql://localhost:5432/myapp

Function Calling — AI That Takes Actions

Let the AI call YOUR Java methods when it needs to perform actions.

Java

// Define functions the AI can call
@Configuration
public class AIFunctionConfig {

    @Bean
    @Description("Get the current weather for a city")
    public Function<WeatherRequest, WeatherResponse> getWeather() {
        return request -> weatherService.getWeather(request.city());
    }

    @Bean
    @Description("Search the product catalog by keyword and category")
    public Function<ProductSearchRequest, List<Product>> searchProducts() {
        return request -> productService.search(request.keyword(), request.category());
    }

    @Bean
    @Description("Place an order for a customer")
    public Function<OrderRequest, OrderConfirmation> placeOrder() {
        return request -> orderService.create(request);
    }
}

Java

// The AI decides WHEN to call functions based on user intent
@GetMapping("/ai/assistant")
public String assistant(@RequestParam String message) {
    return chatClient.prompt()
        .user(message)
        .functions("getWeather", "searchProducts", "placeOrder")
        .call()
        .content();
}

// User: "What's the weather in NYC and find me a rain jacket under $50"
// AI calls: getWeather("NYC") → searchProducts("rain jacket", "outerwear")
// AI responds: "It's 45°F and rainy in NYC. Here are rain jackets under $50: ..."

AI Agents with Spring AI

Autonomous agents that plan, reason, and execute multi-step tasks.

Java

@Service
public class CustomerSupportAgent {

    private final ChatClient chatClient;
    private final OrderRepository orderRepo;
    private final RefundService refundService;

    public String handleTicket(String customerMessage, String customerId) {
        return chatClient.prompt()
            .system("""
                You are a customer support agent for an e-commerce platform.
                You can look up orders, initiate refunds, and escalate issues.
                Be empathetic and concise. Follow company policy.
                Customer ID: {customerId}
                """)
            .user(customerMessage)
            .functions("lookupOrder", "initiateRefund", "escalateToHuman")
            .advisors(new MessageChatMemoryAdvisor(chatMemory))  // conversation memory
            .call()
            .content();
    }
}

Structured Output — Type-Safe AI Responses

Java

// Define your response type
public record SentimentAnalysis(
    String sentiment,       // "positive", "negative", "neutral"
    double confidence,      // 0.0 - 1.0
    List<String> keywords,  // extracted keywords
    String summary          // one-line summary
) {}

// Get structured, type-safe output
@GetMapping("/ai/analyze")
public SentimentAnalysis analyzeSentiment(@RequestParam String review) {
    return chatClient.prompt()
        .user("Analyze the sentiment of this review: " + review)
        .call()
        .entity(SentimentAnalysis.class);  // auto-maps to Java record!
}

Provider Comparison

Feature	OpenAI	Anthropic (Claude)	Ollama (Local)
Best model	GPT-4o	Claude Opus 4	Llama 3, Mistral
Streaming	Yes	Yes	Yes
Function calling	Yes	Yes	Model-dependent
Vision (images)	Yes	Yes	Some models
Embeddings	Yes	No (use separate)	Yes
Cost	$$$	$$$	Free (your hardware)
Privacy	Cloud	Cloud	Fully local
Spring AI support	Full	Full	Full

Interview Questions

What is RAG and why would you use it over fine-tuning?

Answer: RAG (Retrieval Augmented Generation) gives the AI relevant context from your documents at query time, without modifying the model. Fine-tuning permanently alters the model's weights.

Use RAG when: Your data changes frequently, you need citations/sources, you want to control what the AI knows, or you have a small dataset.

Use fine-tuning when: You need the AI to learn a specific style/format, perform a specialized task the base model struggles with, or you need faster inference (no retrieval step).

How does Spring AI handle provider switching?

Answer: Spring AI uses a common ChatClient interface. Switching providers means changing the starter dependency and configuration properties — your application code doesn't change. This is the same abstraction pattern as Spring Data (switch from PostgreSQL to MongoDB by changing the driver, not the repository code).

What are the security concerns with function calling?

Answer: The AI decides WHICH functions to call and with WHAT arguments based on user input. Risks:

Prompt injection — malicious user input tricks the AI into calling destructive functions
Over-permissioning — giving the AI access to functions it shouldn't have
Input validation — the AI may pass unexpected arguments

Mitigations: Validate all function inputs, use least-privilege (only expose safe functions), implement approval workflows for destructive actions, and log all function calls.

Spring AI — Build AI-Powered Applications

Getting Started

Dependencies

Configuration

ChatClient — Conversational AI

Basic Usage

System Prompts & Context

Streaming Responses

RAG — Retrieval Augmented Generation

Implementation

Vector Store Configuration (PGVector)

Function Calling — AI That Takes Actions

AI Agents with Spring AI

Structured Output — Type-Safe AI Responses

Provider Comparison

Interview Questions

5-Minute System Design — Weekly