Skip to content
4 min read

AI & Machine Learning

From linear regression to GPT — everything a software engineer needs to know about AI/ML, explained so a fresher gets it and a 10-year vet finds value.


The AI Landscape

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    AI(["Artificial<br/>Intelligence"]) --> ML{{"Machine<br/>Learning"}}
    AI --> ES[["Expert<br/>Systems"]]
    ML --> SL[/"Supervised"/]
    ML --> UL[/"Unsupervised"/]
    ML --> RL[/"Reinforcement"/]
    SL --> DL{{"Deep Learning"}}
    UL --> DL
    DL --> CNN(("CNN"))
    DL --> RNN(("RNN/LSTM"))
    DL --> TR{{"Transformers"}}
    TR --> LLM(["LLMs<br/>(GPT, Claude)"])
    TR --> DIF(["Diffusion<br/>(DALL-E, Stable Diffusion)"])
    TR --> MM(["Multimodal<br/>(Gemini, GPT-4o)"])

    style AI fill:#FEF3C7,stroke:#D97706,color:#000
    style ML fill:#DBEAFE,stroke:#2563EB,color:#000
    style DL fill:#EDE9FE,stroke:#7C3AED,color:#000
    style TR fill:#FCE7F3,stroke:#DB2777,color:#000
    style LLM fill:#D1FAE5,stroke:#059669,color:#000
    style DIF fill:#ECFDF5,stroke:#059669,color:#000
    style MM fill:#ECFDF5,stroke:#059669,color:#000

AI = machines that mimic human intelligence. ML = AI that learns from data instead of explicit rules. Deep Learning = ML with neural networks (many layers). Generative AI = models that create new content.

Think of it like cooking: AI is the kitchen, ML is learning recipes by tasting food, Deep Learning is a sous chef with incredible memory, GenAI is a chef that invents new dishes.


What's in This Section

# Page What You'll Learn Prerequisites Who It's For
1 Neural Networks & Deep Learning How neurons compute, backpropagation, CNN for images, RNN for sequences Basic math (matrices, derivatives) Everyone starting ML
2 Transformers & LLMs Attention mechanism, GPT vs BERT, how LLMs are trained, prompt engineering Page 1 (neural network basics) Anyone working with AI APIs
3 RAG & Vector Databases Retrieval-augmented generation, chunking, embeddings, Pinecone/pgvector Page 2 (understand embeddings) Building AI-powered apps
4 AI Agents & Tools ReAct loop, function calling, multi-agent systems, LangChain/CrewAI Page 2 + REST API knowledge Building autonomous systems
5 MLOps & Production AI Model serving, drift detection, monitoring, deployment patterns Docker, CI/CD basics Deploying ML to production
6 Fine-Tuning Guide LoRA, QLoRA, RLHF, when to fine-tune vs RAG, practical walkthrough Pages 1-2 + Python Customizing models

Recommended Reading Order

If you're a backend engineer new to AI: Pages 1 → 2 → 3 → 4. Skip 5 and 6 until you've built something.

If you're building AI features into an app: Start at Page 3 (RAG) or Page 4 (Agents) — refer back to Pages 1-2 when concepts feel unfamiliar.

If you're prepping for an AI/ML interview: All 6, in order. Budget 2-3 hours per page.


Supervised vs Unsupervised vs Reinforcement

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph SL["Supervised Learning"]
        direction LR
        S1[/"Labeled Data<br/>(Input → Output)"/] --> S2{{"Learn Mapping<br/>Function"}}
        S2 --> S3(["Predict on<br/>New Data"])
    end
    subgraph UL["Unsupervised Learning"]
        direction LR
        U1[/"Unlabeled<br/>Data"/] --> U2{{"Find Hidden<br/>Patterns"}}
        U2 --> U3(["Clusters /<br/>Anomalies"])
    end
    subgraph RL["Reinforcement Learning"]
        direction LR
        R1[/"Agent in<br/>Environment"/] --> R2{{"Take Action,<br/>Get Reward"}}
        R2 --> R3(["Maximize<br/>Total Reward"])
    end

    style SL fill:#D1FAE5,stroke:#059669,color:#000
    style UL fill:#DBEAFE,stroke:#2563EB,color:#000
    style RL fill:#FEF3C7,stroke:#D97706,color:#000
Supervised Unsupervised Reinforcement
Data Labeled (input → output) Unlabeled (just input) Environment + rewards
Goal Predict output for new input Find hidden patterns Maximize cumulative reward
Analogy Studying with answer key Sorting a messy closet Training a dog with treats
Algorithms Linear Regression, SVM, Random Forest, XGBoost K-Means, DBSCAN, PCA, Autoencoders Q-Learning, PPO, DQN, A3C
Use Cases Spam detection, price prediction, medical diagnosis Customer segmentation, anomaly detection, recommendation Game AI, robotics, ad bidding, autonomous driving

Semi-Supervised Learning

Small labeled dataset + large unlabeled dataset. Model learns patterns from unlabeled data and fine-tunes with labels. Real-world: medical imaging (labeling X-rays costs radiologist time — use 100 labeled + 10,000 unlabeled).

Self-Supervised Learning

Model creates its own labels from the data. GPT predicts the next word. BERT masks random words and predicts them. This is how LLMs train on the entire internet without human labeling. The most important paradigm shift in modern AI.


Core ML Algorithms

Classification (Predicting Categories)

Algorithm How It Works When to Use Fun Analogy
Logistic Regression Draws a line. Above = class A, below = class B. Binary classification, baseline Sorting mail: spam or not
Decision Tree Asks yes/no questions in sequence Interpretable results needed 20 Questions game
Random Forest 100+ decision trees vote. Majority wins. Most tabular problems Asking 100 experts, majority rules
XGBoost Trees built sequentially, each fixing prior mistakes Kaggle competitions, structured data Student learning from wrong answers
SVM Finds widest gap between classes Small datasets, text classification Building a wall between two groups
KNN Looks at K nearest neighbors, majority class wins Simple problems, baseline "You are the average of your 5 closest friends"

Which Algorithm Should I Pick?

Start with Logistic Regression (baseline). Try Random Forest next (usually good enough). XGBoost for competitions. Deep Learning for images/text/audio. Most production tabular ML is XGBoost or LightGBM.

Regression (Predicting Numbers)

Algorithm Best For Gotcha
Linear Regression Simple relationships (price ~ area) Assumes linear relationship
Polynomial Regression Curved relationships Overfits easily with high degree
Ridge (L2) Too many features Shrinks all weights, keeps all features
Lasso (L1) Feature selection needed Pushes some weights to exactly zero
XGBoost Regressor Complex tabular data Needs more tuning

Clustering (Finding Groups)

Algorithm How It Works When to Use Gotcha
K-Means Assign to K centroids, repeat General-purpose clustering Must choose K upfront
DBSCAN Groups dense regions, marks sparse as noise Irregular cluster shapes Varying densities fail
Hierarchical Builds tree of clusters (dendrogram) When you want to visualize hierarchy Slow on large datasets
Gaussian Mixture Soft clustering (probability per cluster) Overlapping clusters Sensitive to initialization

Bias-Variance Tradeoff

The fundamental tension in all of machine learning.

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph UF["Underfitting (High Bias)"]
        A>"Model too simple<br/>Misses real patterns"]
    end
    subgraph JR["Just Right"]
        B(["Captures patterns<br/>Generalizes well"])
    end
    subgraph OF["Overfitting (High Variance)"]
        C>"Model too complex<br/>Memorizes noise"]
    end

    UF --> JR --> OF

    style UF fill:#FEE2E2,stroke:#DC2626,color:#000
    style JR fill:#D1FAE5,stroke:#059669,color:#000
    style OF fill:#FEE2E2,stroke:#DC2626,color:#000
High Bias (Underfitting) High Variance (Overfitting)
What Model too simple. Misses patterns. Model too complex. Memorizes noise.
Training accuracy Low Very high
Test accuracy Low Low (drops from training)
Analogy Studying only chapter titles Memorizing answers without understanding
Fix More features, complex model More data, regularization, dropout

Regularization

  • L1 (Lasso) — pushes some weights to zero. Built-in feature selection.
  • L2 (Ridge) — shrinks all weights. Prevents any single feature from dominating.
  • Elastic Net — L1 + L2 combined. Best of both.
  • Dropout — randomly disable neurons during training. Forces redundancy.
  • Early stopping — stop training when validation loss starts increasing.
  • Data augmentation — artificially increase dataset (flip, rotate, crop images).

Evaluation Metrics

Classification

Metric What It Measures When to Use
Accuracy % correct overall Balanced classes only
Precision Of predicted positives, how many correct? False positives costly (spam filter)
Recall Of actual positives, how many found? False negatives costly (cancer detection)
F1 Score Harmonic mean of precision & recall Imbalanced classes
AUC-ROC Area under ROC curve Comparing models, threshold-independent
Confusion Matrix TP, FP, TN, FN breakdown Understanding error types

Accuracy Trap

Dataset with 95% negative, 5% positive. A model that always predicts "negative" gets 95% accuracy. Useless. Always check precision, recall, F1 for imbalanced datasets.

Regression

Metric Formula Interpretation
MAE Mean predicted - actual
MSE Mean (predicted - actual)² Penalizes large errors more
RMSE √MSE Same units as target variable
1 - (SS_res / SS_tot) % of variance explained (1.0 = perfect)

LLM-Specific

Metric What It Measures
Perplexity How surprised the model is. Lower = better.
BLEU N-gram overlap with reference (translation)
ROUGE Recall-oriented overlap (summarization)
Human eval Human judges rate quality. Gold standard.
LLM-as-judge Stronger model evaluates. Scalable alternative.
MMLU Multi-task benchmark across 57 subjects

AI for Software Engineers

When to Use What

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    A(["What's your problem?"]) --> B{"Structured<br/>tabular data?"}
    B -->|Yes| C(["XGBoost / LightGBM<br/>Skip deep learning"])
    B -->|No| D{"Images / Video?"}
    D -->|Yes| E{{"CNN / Vision Transformer"}}
    D -->|No| F{"Text / Language?"}
    F -->|Yes| G{"Need custom<br/>behavior?"}
    G -->|No| H(["LLM API<br/>(GPT, Claude)"])
    G -->|Yes| I{"Your own<br/>data needed?"}
    I -->|Facts| J[["RAG"]]
    I -->|Style/Skill| K[["Fine-tune"]]
    F -->|No| L{"Sequential /<br/>Time Series?"}
    L -->|Yes| M{{"LSTM / Transformer"}}
    L -->|No| N[/"Start with<br/>simpler ML"/]

    style A fill:#FEF3C7,stroke:#D97706,color:#000
    style C fill:#D1FAE5,stroke:#059669,color:#000
    style H fill:#D1FAE5,stroke:#059669,color:#000
    style J fill:#DBEAFE,stroke:#2563EB,color:#000
    style K fill:#EDE9FE,stroke:#7C3AED,color:#000

Integration Patterns

Pattern When Tools
API call Quick integration, hosted model OpenAI API, Anthropic API, Google Vertex
Self-hosted Data privacy, cost control, customization Ollama, vLLM, TGI, llama.cpp
Fine-tune Domain-specific behavior LoRA + Hugging Face, OpenAI fine-tuning
RAG Your data + LLM intelligence LangChain, LlamaIndex, Spring AI
Agents Multi-step autonomous tasks LangGraph, CrewAI, Claude Code

Essential Libraries

Library Language Purpose
scikit-learn Python Classical ML (the first thing you learn)
PyTorch Python Deep learning, research favorite
Hugging Face Python Model hub, transformers, tokenizers
LangChain Python/JS LLM app framework, chains, agents, RAG
LlamaIndex Python Data ingestion, indexing, RAG pipelines
Ollama CLI Run LLMs locally (Llama, Mistral, Phi)
Spring AI Java Spring Boot integration for LLM APIs
TensorFlow Python Deep learning, production deployment

Quick Interview Q&A

1. Supervised vs unsupervised learning?

Supervised = labeled data, predict outputs (spam detection, price prediction). Unsupervised = unlabeled data, find patterns (clustering, anomaly detection). Semi-supervised = small labeled + large unlabeled. Self-supervised = model creates own labels (how LLMs train).

2. What is the bias-variance tradeoff?

Bias = underfitting (model too simple). Variance = overfitting (memorizes noise). Goal: minimize both. Tools: regularization (L1/L2), cross-validation, ensemble methods, more data.

3. Precision vs Recall — when does each matter?

Precision: "of my predictions, how many were right?" Matters when false positives are costly (spam filter marking real email as spam). Recall: "of all actual positives, how many did I catch?" Matters when false negatives are costly (missing a cancer diagnosis).

4. XGBoost vs Random Forest vs Neural Network?

XGBoost: sequential trees, best for structured/tabular data, wins Kaggle. Random Forest: parallel trees, more robust, less tuning needed. Neural Networks: best for unstructured data (images, text, audio), need more data and compute. For tabular data, XGBoost usually beats neural nets.

5. When would you NOT use deep learning?

Small datasets (<1000 rows). Tabular/structured data (XGBoost is better). Need interpretability (healthcare, finance). Latency constraints (<1ms). Limited compute budget. When a simple model solves the problem.

6. What is transfer learning?

Use a model pre-trained on a large dataset, adapt it to your task. Example: take ResNet trained on ImageNet (14M images), freeze early layers, retrain last layers on your 500 X-ray images. Works because early layers learn general features (edges, textures).

7. How do you handle imbalanced datasets?

Metrics: use F1/AUC, not accuracy. Data: oversample minority (SMOTE), undersample majority. Model: class weights, cost-sensitive learning. Ensemble: balanced bagging. Threshold: adjust classification threshold.

8. K-Means — how do you choose K?

Elbow method: plot inertia vs K, find the "elbow" where improvement slows. Silhouette score: measures cluster cohesion vs separation (-1 to 1, higher is better). Domain knowledge: sometimes K is obvious (customer tiers: bronze/silver/gold).

9. What is feature engineering?

Creating new input features from raw data. Examples: extract hour/day from timestamp, one-hot encode categories, log-transform skewed features, create ratios (price/sqft). Often matters more than algorithm choice. "Applied ML is 80% feature engineering."

10. Explain cross-validation.

Split data into K folds. Train on K-1, test on remaining fold. Repeat K times. Average the scores. Gives reliable performance estimate. Prevents lucky/unlucky train-test splits. Standard: 5-fold or 10-fold CV.