2 min read By Vamsi Karuturi · Senior Backend Engineer at Salesforce

AWS (Amazon Web Services) for Backend Engineers

A comprehensive reference of core AWS services, architecture patterns, and interview-ready knowledge for Java/backend engineers targeting FAANG roles.

Why AWS Matters in Interviews

Most FAANG system design interviews assume cloud deployment. You are expected to pick appropriate AWS services, reason about trade-offs (cost, latency, durability), and sketch architectures using real service names rather than generic "cloud storage" boxes.

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph AWS["AWS Core Services"]
        C{{"Compute<br/>EC2, Lambda, ECS"}}
        S[/"Storage<br/>S3, EBS, EFS"/]
        D[["Database<br/>RDS, DynamoDB, Aurora"]]
        M(["Messaging<br/>SQS, SNS, Kinesis"])
        N(("Networking<br/>VPC, Route 53, ELB"))
        SEC{{"Security<br/>IAM, KMS, WAF"}}
    end

    C --> N
    C --> D
    C --> M
    S --> C
    SEC --> C
    SEC --> S
    SEC --> D

    style C fill:#E3F2FD,stroke:#1565C0,color:#000
    style S fill:#E8F5E9,stroke:#2E7D32,color:#000
    style D fill:#FFF3E0,stroke:#E65100,color:#000
    style M fill:#F3E5F5,stroke:#6A1B9A,color:#000
    style N fill:#FEF3C7,stroke:#D97706,color:#000
    style SEC fill:#FFCDD2,stroke:#C62828,color:#000

Compute

EC2 (Elastic Compute Cloud)

Virtual servers in the cloud. You pick instance type, OS, and networking.

Instance Family	Use Case	Example
t3/t4g	Burstable, dev/test, small apps	t3.medium
m6i/m7g	General purpose, web servers	m6i.xlarge
c6i/c7g	Compute-intensive, batch processing	c6i.2xlarge
r6i/r7g	Memory-intensive, in-memory caches	r6i.4xlarge
i3/i4i	Storage-optimized, data warehouses	i3.large

Bash

# Launch an EC2 instance
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type t3.medium \
  --key-name my-key-pair \
  --security-group-ids sg-0123456789abcdef0 \
  --subnet-id subnet-0bb1c79de3EXAMPLE \
  --count 1

Auto Scaling

Automatically adjusts the number of EC2 instances based on demand.

Target Tracking - Keep average CPU at 60%
Step Scaling - Add 2 instances when CPU > 80%, remove 1 when CPU < 30%
Scheduled Scaling - Scale up at 9 AM, scale down at 6 PM

Interview Insight

Always mention Auto Scaling Groups (ASG) with health checks. The ASG replaces unhealthy instances automatically, providing self-healing. Combine with ELB for zero-downtime deployments.

Elastic Load Balancing (ELB)

Type	Layer	Use Case
ALB (Application)	Layer 7	HTTP/HTTPS, path-based routing, WebSocket
NLB (Network)	Layer 4	TCP/UDP, ultra-low latency, static IP
GWLB (Gateway)	Layer 3	Third-party virtual appliances

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    U["Users"] --> ALB["ALB<br/>Layer 7"]
    ALB -->|"/api/orders"| TG1["Target Group 1<br/>Order Service"]
    ALB -->|"/api/users"| TG2["Target Group 2<br/>User Service"]
    ALB -->|"/api/payments"| TG3["Target Group 3<br/>Payment Service"]

Storage

S3 (Simple Storage Service)

Object storage with 11 9s of durability (99.999999999%).

Storage Class	Use Case	Retrieval
S3 Standard	Frequently accessed data	Immediate
S3 Intelligent-Tiering	Unknown access patterns	Immediate
S3 Standard-IA	Infrequent access, but needs fast retrieval	Immediate
S3 One Zone-IA	Reproducible, infrequent data	Immediate
S3 Glacier Instant	Archive with instant access	Immediate
S3 Glacier Flexible	Archive, minutes-to-hours retrieval	1-12 hours
S3 Glacier Deep Archive	Long-term archive	12-48 hours

Lifecycle Policy Example (CloudFormation):

YAML

Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: my-app-data
      LifecycleConfiguration:
        Rules:
          - Id: TransitionToIA
            Status: Enabled
            Transitions:
              - StorageClass: STANDARD_IA
                TransitionInDays: 30
              - StorageClass: GLACIER
                TransitionInDays: 90
            ExpirationInDays: 365

EBS vs EFS

Feature	EBS (Elastic Block Store)	EFS (Elastic File System)
Type	Block storage	File storage (NFS)
Attachment	Single EC2 instance (one AZ)	Multiple instances (cross-AZ)
Performance	Up to 256,000 IOPS (io2)	Throughput scales with size
Use Case	Databases, boot volumes	Shared content, CMS, ML training
Pricing	Pay for provisioned size	Pay for what you use

EBS Multi-Attach

io2 volumes support multi-attach (up to 16 instances), but only within a single AZ. For true shared file systems across AZs, use EFS.

Serverless

Lambda

Run code without provisioning servers. Pay per invocation and duration.

Max execution: 15 minutes
Memory: 128 MB to 10 GB
Concurrency: 1000 concurrent executions (soft limit)
Cold start: Typically 100-500ms (Java can be 1-3s)

Bash

# Create a Lambda function
aws lambda create-function \
  --function-name OrderProcessor \
  --runtime java17 \
  --handler com.example.OrderHandler::handleRequest \
  --role arn:aws:iam::123456789:role/lambda-exec-role \
  --zip-file fileb://function.zip \
  --memory-size 512 \
  --timeout 30

Java Cold Starts

Java Lambdas have notoriously high cold starts due to JVM initialization. Mitigations: use SnapStart (checkpoint/restore), GraalVM native image, or provisioned concurrency for latency-sensitive workloads.

API Gateway

Fully managed API front door. Handles throttling, auth, caching, and request transformation.

REST API - Full featured, request validation, WAF integration
HTTP API - Lower latency, lower cost, simpler (use this for most cases)
WebSocket API - Real-time two-way communication

Step Functions

Orchestrate multiple Lambda functions and AWS services into serverless workflows.

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    Start(["Order Received"]) --> Validate{{"Validate Order"}}
    Validate -->|Valid| Reserve[["Reserve Inventory"]]
    Validate -->|Invalid| Reject(("Reject Order"))
    Reserve --> Payment{"Process Payment"}
    Payment -->|Success| Ship[/"Ship Order"/]
    Payment -->|Failure| Release[["Release Inventory"]]
    Ship --> Notify(["Send Notification"])
    Release --> Notify
    Notify --> End(("Complete"))

Containers

ECS (Elastic Container Service)

AWS-native container orchestration. Two launch types:

Feature	EC2 Launch Type	Fargate Launch Type
Infrastructure	You manage EC2 instances	AWS manages infrastructure
Pricing	Pay for EC2 instances	Pay per vCPU/memory per second
Control	Full access to host OS	No host access
Best For	Persistent workloads, GPU	Variable workloads, simplicity

EKS (Elastic Kubernetes Service)

Managed Kubernetes control plane. Use when you need Kubernetes-specific features or portability across clouds.

ECS vs EKS - Interview Answer

Choose ECS if you are all-in on AWS and want simpler operations. Choose EKS if you need Kubernetes ecosystem tools (Helm, Istio, ArgoCD), multi-cloud portability, or your team already knows Kubernetes.

Fargate

Serverless compute engine for containers (works with both ECS and EKS). No need to manage underlying EC2 instances.

YAML

# ECS Task Definition (CloudFormation)
Resources:
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: order-service
      Cpu: "512"
      Memory: "1024"
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      ContainerDefinitions:
        - Name: order-service
          Image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-service:latest
          PortMappings:
            - ContainerPort: 8080
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: /ecs/order-service
              awslogs-region: us-east-1
              awslogs-stream-prefix: ecs

Databases

RDS (Relational Database Service)

Managed relational databases (MySQL, PostgreSQL, Oracle, SQL Server).

Automated backups, Multi-AZ failover, read replicas
Multi-AZ: Synchronous replication to standby for HA (automatic failover ~60s)
Read Replicas: Asynchronous replication for read scaling (up to 15 for Aurora)

Amazon Aurora

MySQL/PostgreSQL-compatible, 5x throughput of standard MySQL.

Storage auto-scales from 10 GB to 128 TB
6 copies of data across 3 AZs
Aurora Serverless v2 - Auto-scales compute capacity

DynamoDB

Fully managed NoSQL (key-value + document). Single-digit millisecond latency at any scale.

Feature	Description
Partition Key	Required. Determines data distribution
Sort Key	Optional. Enables range queries within a partition
GSI	Global Secondary Index - query on non-key attributes
LSI	Local Secondary Index - alternate sort key, same partition key
DynamoDB Streams	Change data capture (CDC), triggers Lambda
DAX	In-memory cache for DynamoDB (microsecond reads)

DynamoDB Design

Design your table for your access patterns first. Unlike RDS, you cannot add arbitrary queries later. Think: "What questions will I ask?" before defining keys and indexes.

Bash

# Create a DynamoDB table
aws dynamodb create-table \
  --table-name Orders \
  --attribute-definitions \
    AttributeName=customerId,AttributeType=S \
    AttributeName=orderId,AttributeType=S \
  --key-schema \
    AttributeName=customerId,KeyType=HASH \
    AttributeName=orderId,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

Messaging

SQS (Simple Queue Service)

Fully managed message queue. Decouples producers from consumers.

Feature	Standard Queue	FIFO Queue
Throughput	Unlimited	3,000 msg/s (with batching)
Ordering	Best-effort	Strict FIFO
Delivery	At-least-once	Exactly-once processing
Use Case	High throughput, order not critical	Financial transactions, commands

Pub/sub messaging. Fan-out to multiple subscribers (SQS, Lambda, HTTP, email).

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    P["Order Service"] --> SNS["SNS Topic<br/>order-events"]
    SNS --> Q1["SQS: Inventory Queue"]
    SNS --> Q2["SQS: Notification Queue"]
    SNS --> Q3["SQS: Analytics Queue"]
    SNS --> L1["Lambda: Audit Logger"]

EventBridge

Serverless event bus for building event-driven architectures. Richer filtering, schema registry, and integration with 3^rd-party SaaS.

Kinesis

Real-time data streaming for high-volume, continuous data.

Service	Use Case
Kinesis Data Streams	Real-time processing with custom consumers
Kinesis Data Firehose	Load streaming data into S3, Redshift, Elasticsearch
Kinesis Data Analytics	Real-time analytics with SQL or Apache Flink

SQS vs Kinesis

SQS: Message queue, messages deleted after consumption, simpler. Kinesis: Stream, data retained (1-365 days), multiple consumers read same data independently, ordering within shard. Choose Kinesis when you need replay, multiple independent consumers, or real-time analytics.

Networking

VPC (Virtual Private Cloud)

Isolated network within AWS. You control IP ranges, subnets, route tables, and gateways.

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
    subgraph VPC["VPC 10.0.0.0/16"]
        subgraph Public["Public Subnets"]
            PS1(["10.0.1.0/24<br/>AZ-a"])
            PS2(["10.0.2.0/24<br/>AZ-b"])
        end
        subgraph Private["Private Subnets"]
            PR1[["10.0.3.0/24<br/>AZ-a"]]
            PR2[["10.0.4.0/24<br/>AZ-b"]]
        end
        subgraph Data["Data Subnets"]
            DS1[/"10.0.5.0/24<br/>AZ-a"/]
            DS2[/"10.0.6.0/24<br/>AZ-b"/]
        end
    end

    IGW{{"Internet Gateway"}} --> Public
    Public -->|"NAT GW"| Private
    Private --> Data

    style Public fill:#E8F5E9,stroke:#2E7D32,color:#000
    style Private fill:#E3F2FD,stroke:#1565C0,color:#000
    style Data fill:#FFF3E0,stroke:#E65100,color:#000

Component	Purpose
Public Subnet	Resources with direct internet access (ALB, NAT GW)
Private Subnet	Application tier (EC2, ECS tasks), no direct internet
Data Subnet	Databases, ElastiCache (most restricted)
Internet Gateway	Allows VPC to communicate with the internet
NAT Gateway	Allows private subnet instances to reach internet (outbound only)
Route Table	Rules that determine where traffic is directed

Security Groups vs NACLs

Feature	Security Group	NACL
Level	Instance (ENI) level	Subnet level
Rules	Allow only	Allow and Deny
State	Stateful (return traffic auto-allowed)	Stateless (must define inbound and outbound)
Evaluation	All rules evaluated together	Rules evaluated in order (lowest number first)

Route 53

DNS service with routing policies:

Simple - Single resource
Weighted - Split traffic by percentage (A/B testing)
Latency-based - Route to lowest-latency region
Failover - Active-passive DR
Geolocation - Route based on user location

IAM (Identity and Access Management)

Core Concepts

Concept	Description
User	Person or application (long-term credentials)
Group	Collection of users with shared permissions
Role	Temporary credentials assumed by services or users
Policy	JSON document defining permissions

Policy Structure

JSON

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "10.0.0.0/16"
        }
      }
    }
  ]
}

Best Practices

Least Privilege - Grant only permissions required for the task
Use Roles, Not Users - EC2 instances, Lambda, ECS tasks should assume roles
No Root Account for Daily Use - Enable MFA on root, create IAM users
Policy Conditions - Restrict by IP, time, MFA status, tags
Service Control Policies (SCPs) - Guardrails across AWS Organization accounts

Common Mistake

Never embed AWS access keys in application code or config files. Use IAM roles for EC2/ECS/Lambda. For local development, use AWS SSO or named profiles with ~/.aws/credentials.

CloudFormation (Infrastructure as Code)

Declare your entire infrastructure in YAML/JSON templates. Supports drift detection, change sets, and rollback.

YAML

AWSTemplateFormatVersion: "2010-09-09"
Description: Microservice infrastructure

Parameters:
  Environment:
    Type: String
    AllowedValues: [dev, staging, prod]

Resources:
  OrderQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: !Sub "${Environment}-order-queue"
      VisibilityTimeout: 60
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt OrderDLQ.Arn
        maxReceiveCount: 3

  OrderDLQ:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: !Sub "${Environment}-order-dlq"
      MessageRetentionPeriod: 1209600  # 14 days

Outputs:
  QueueUrl:
    Value: !Ref OrderQueue
    Export:
      Name: !Sub "${Environment}-order-queue-url"

CloudFormation vs Terraform

CloudFormation is AWS-native with deep integration (drift detection, stack sets). Terraform is cloud-agnostic with a larger provider ecosystem. In interviews, mention you are comfortable with both but pick one based on whether the system is multi-cloud.

Common Interview Questions

Q: How would you design a highly available web application on AWS?

Multi-AZ deployment with ALB, Auto Scaling Group across 2+ AZs, RDS Multi-AZ for database, S3 for static assets with CloudFront CDN, Route 53 for DNS failover.

Q: How does S3 achieve 11 9s of durability?

S3 automatically replicates objects across a minimum of 3 Availability Zones. It uses checksums to detect corruption and self-heals automatically. Cross-Region Replication (CRR) adds another layer for DR.

Q: SQS vs Kafka - when do you choose which?

SQS: Fully managed, zero ops, per-message pricing, ideal for decoupling microservices. Kafka (MSK): Higher throughput, message replay, ordering guarantees, event sourcing, stream processing. If you need a simple task queue, use SQS. If you need an event log, choose Kafka/Kinesis.

Q: How do you handle secrets in AWS?

Use AWS Secrets Manager (auto-rotation, RDS integration) or SSM Parameter Store (simpler, free for standard parameters). Never store secrets in environment variables baked into container images or CloudFormation templates.

Q: Explain the shared responsibility model.

AWS is responsible for security of the cloud (physical infrastructure, hypervisor, managed services). You are responsible for security in the cloud (data encryption, IAM policies, network configuration, OS patching for EC2).

Q: How would you reduce Lambda cold starts for a Java service?

(1) Enable SnapStart (checkpoints JVM state). (2) Use provisioned concurrency for latency-critical paths. (3) Keep deployment package small. (4) Consider GraalVM native image. (5) Use tiered compilation flags.

Q: Design a fan-out notification system.

SNS topic receives the event. Multiple SQS queues subscribe (email service, push notification service, analytics pipeline). Each consumer processes independently. DLQs catch failures. This provides decoupling, independent scaling, and retry isolation.

Key Takeaways for Interviews

Always design for multi-AZ availability (at minimum) and discuss multi-region for DR.
Use managed services over self-hosted when possible (RDS over self-managed MySQL on EC2).
Apply least privilege IAM policies and prefer roles over access keys.
Decouple services with SQS/SNS to improve resilience and independent scaling.
For serverless: Lambda + API Gateway + DynamoDB is the canonical stack.
Cost optimization: right-size instances, use Savings Plans, lifecycle S3 to cheaper tiers.

AWS (Amazon Web Services) for Backend Engineers

Compute

EC2 (Elastic Compute Cloud)

Auto Scaling

Elastic Load Balancing (ELB)

Storage

S3 (Simple Storage Service)

EBS vs EFS

Serverless

Lambda

API Gateway

Step Functions

Containers

ECS (Elastic Container Service)

EKS (Elastic Kubernetes Service)

Fargate

Databases

RDS (Relational Database Service)

Amazon Aurora

DynamoDB

Messaging

SQS (Simple Queue Service)

SNS (Simple Notification Service)

EventBridge

Kinesis

Networking

VPC (Virtual Private Cloud)

Security Groups vs NACLs

Route 53

IAM (Identity and Access Management)

Core Concepts

Policy Structure

Best Practices

CloudFormation (Infrastructure as Code)

Common Interview Questions

5-Minute System Design — Weekly