Key Value Databases (Redis, Memcached)

Brief Summary of Database/Technology Key-value databases are structured similarly to JavaScript objects or Python dictionaries, where each key is unique and maps to a value. All data is held in the machine’s memory rather than on disk, enabling sub-millisecond response times. This in-memory architecture makes them exceptionally fast for read and write operations since there’s no need for disk I/O round trips.

Redis

Real-World Example Instagram Feed Caching: Instagram uses Redis to cache user feeds and reduce load on their primary databases. When a user opens Instagram, their feed could require aggregating posts from hundreds of accounts they follow, applying ranking algorithms, and filtering content—an operation that would be too slow if performed on-demand from disk-based databases. Instead, Instagram pre-computes and caches feeds in Redis. When a user with millions of followers posts a photo, that content is immediately pushed to Redis caches of their followers’ feeds. This allows Instagram to serve feeds in under 100ms even during peak traffic, handling billions of feed requests daily. Without Redis, their PostgreSQL databases would be overwhelmed with read queries, and users would experience multi-second load times.

Redis - Instagram Feed Cache

KeyValue (JSON Array)TTL
feed:user:123456[{post_id: abc123, user: 789, likes: 1250}, {post_id: def456, user: 321, likes: 890}]300s
feed:user:789012[{post_id: xyz999, user: 111, likes: 3400}, {post_id: lmn888, user: 222, likes: 567}]300s

Example Cases for the Database

  • Caching layer to reduce database load and improve application performance
  • Message queues for asynchronous task processing
  • Sub systems for real-time messaging and notifications
  • Session storage for web applications

When not to Use the Database Avoid key-value databases when you need complex queries, data relationships, or persistent storage guarantees. They’re not suitable for data that requires ACID transactions or when memory constraints are a concern.

ProsCons
Extremely fast (sub-millisecond)Limited to available memory
Simple data structureNo complex querying capabilities
High throughputVolatile without persistence config
Easy to implementNo built-in relationships

Wide Column Databases (Cassandra, HBase)

Brief Summary of Database/Technology Wide column databases extend the key-value model by adding a second dimension. Each key in the key space can hold one or more column families within rows, allowing for flexible data organization. These databases sacrifice the ability to perform joins in exchange for exceptional horizontal scalability and replication capabilities, making them ideal for distributed systems handling massive datasets.

Cassandra

Real-World Example Netflix Viewing History: Netflix uses Apache Cassandra to store viewing history and preferences for over 200 million subscribers worldwide. Every time you pause, rewind, or finish watching a show, that event is recorded. Netflix needs to track not just what you watched, but timestamps, device information, watch duration, and playback positions across every episode. This generates billions of write operations daily. With Cassandra’s wide column structure, Netflix can use a user ID as the partition key and store time-series data (viewing events) as columns within that partition. When you open Netflix, the app queries your viewing history from the geographically nearest Cassandra cluster for fast loading of “Continue Watching.” Cassandra’s masterless architecture means no single point of failure—if a data center goes down, Netflix can still serve your viewing data from replicas in other regions, ensuring 99.99% uptime for this critical feature.

Cassandra - Netflix Viewing History

user_idwatch_timestamptitle_iddevice_typewatch_durationplayback_position
550e8400-e29b2025-10-15 20:30:00stranger-things-s4e1smart_tv36003245
550e8400-e29b2025-10-15 19:15:00breaking-bad-s1e1mobile28002800
550e8400-e29b2025-10-14 21:00:00the-crown-s5e3laptop32001500

Example Cases for the Database

  • Time series data storage (Netflix watch history, user activity logs)
  • IoT sensor data collection and analysis
  • Event logging and analytics at massive scale
  • Real-time analytics and monitoring systems

When not to Use the Database Avoid when your application requires complex joins between tables, strict ACID transactions across multiple rows, or when data is highly normalized and interconnected.

ProsCons
Excellent horizontal scalabilityNo join operations
Handles massive datasetsComplex query limitations
High write throughputEventual consistency model
Built for replicationDenormalized data required

Document Oriented Databases (MongoDB, CouchDB)

Brief Summary of Database/Technology Document-oriented databases store data as documents (typically JSON-like structures) without requiring a predefined schema. Documents are grouped into collections and can be indexed and organized hierarchically. This flexibility allows each document to have different fields, making these databases ideal for evolving data models. However, the lack of joins means data is often denormalized, leading to faster reads but more complex write operations when related data needs updating across multiple documents.

Real-World Example Uber Trip Data: Uber uses MongoDB to store trip information because each ride has vastly different attributes depending on the service type and region. A standard UberX ride might store: pickup/dropoff locations, driver ID, passenger ID, fare breakdown, and route taken. But an UberEats delivery includes restaurant details, food items, preparation time, and delivery instructions. An UberPool ride has multiple passengers with different pickup/dropoff points. In a relational database, this would require either many nullable columns or complex join tables. With MongoDB, each trip is a flexible document that can contain exactly the fields needed for that specific ride type. When displaying trip history in the Uber app, MongoDB can quickly retrieve all of a user’s trips without joins. The trade-off: when a driver updates their profile photo, Uber must update that information across potentially millions of trip documents where that driver appears, rather than just one row in a drivers table.

// MongoDB - Uber Trip Data
{
  "_id": "trip_789xyz",
  "tripType": "uberEats",
  "timestamp": "2025-10-16T14:30:00Z",
  "customer": {
    "id": "user_123",
    "name": "John Doe",
    "phone": "+1234567890"
  },
  "driver": {
    "id": "driver_456",
    "name": "Jane Smith",
    "photo": "https://...",
    "rating": 4.8
  },
  "restaurant": {
    "name": "Pizza Palace",
    "location": {"lat": 37.7749, "lng": -122.4194}
  },
  "items": [
    {"name": "Pepperoni Pizza", "quantity": 1, "price": 18.99},
    {"name": "Garlic Bread", "quantity": 2, "price": 5.99}
  ],
  "delivery": {
    "pickup": {"lat": 37.7749, "lng": -122.4194, "time": "2025-10-16T14:45:00Z"},
    "dropoff": {"lat": 37.7849, "lng": -122.4094, "time": "2025-10-16T15:10:00Z"},
    "instructions": "Leave at door, ring bell"
  },
  "fare": {
    "subtotal": 24.98,
    "delivery_fee": 3.99,
    "service_fee": 2.50,
    "tip": 5.00,
    "total": 36.47
  },
  "status": "completed"
}

Example Cases for the Database

  • Content management systems with varying content types
  • Product catalogs with diverse attributes
  • User profiles and settings
  • Mobile applications requiring offline sync
  • Flexible data models that change frequently

When not to Use the Database Avoid for highly interconnected data that changes frequently, such as social media interactions where comments, likes, and relationships need constant updates across multiple entities. Not ideal when data consistency across related documents is critical.

ProsCons
Flexible schemaComplex writes with denormalized data
Fast read operationsNo native join support
Easy to scale horizontallyData duplication and inconsistency risk
Natural fit for JSON/object dataDifficult with highly relational data

Relational Databases (PostgreSQL, MySQL, CockroachDB)

Brief Summary of Database/Technology Relational databases organize data into tables where each row has a primary key and can reference other tables through foreign keys. Data is structured in its smallest normal form to minimize redundancy. These databases are ACID compliant, ensuring that transactions maintain data consistency and integrity. The trade-off for this reliability is that they require a predefined schema and are traditionally more difficult to scale horizontally, though modern solutions like CockroachDB address scalability concerns.

Real-World Example Stripe Payment Processing: Stripe uses PostgreSQL to handle payment transactions because financial data demands absolute consistency and ACID guarantees. When a customer makes a purchase, Stripe must: (1) verify the customer’s payment method, (2) create a charge record, (3) deduct from the merchant’s fee balance, (4) create a transfer record, and (5) update account balances—all within a single atomic transaction. If any step fails (network issue, insufficient funds, etc.), the entire transaction must roll back to prevent data inconsistencies. Imagine if a charge succeeded but the merchant’s balance wasn’t credited—that’s money lost in the system. PostgreSQL’s foreign keys ensure referential integrity: you can’t delete a customer who has associated charges. Its strict schema prevents accidentally storing invalid data types. The cost is that scaling PostgreSQL requires sophisticated replication strategies, but for Stripe, correctness is non-negotiable—a 0.01% error rate on billions of dollars in transactions would be catastrophic. Every penny must be accounted for, which is why relational databases remain the gold standard for financial systems.

PostgreSQL - Stripe Transactions** Customers Table

customer_idemailnamecreated_at
cust_123abcalice@example.comAlice Johnson2024-03-15
cust_456defbob@example.comBob Smith2024-05-22

Charges Table

charge_idcustomer_idpayment_method_idamountcurrencystatuscreated_at
ch_789xyzcust_123abcpm_111aaa99.99USDsucceeded2025-10-16 14:30
ch_012uvwcust_456defpm_222bbb149.50USDsucceeded2025-10-16 15:45

Transfers Table

transfer_idcharge_idmerchant_idamountfeenet_amountstatus
tr_333cccch_789xyzmerch_99999.992.9097.09paid
tr_444dddch_012uvwmerch_888149.504.34145.16paid

Example Cases for the Database

  • Financial transactions and banking systems
  • E-commerce order management and inventory
  • Booking and reservation systems
  • Enterprise resource planning (ERP) systems
  • Any application requiring strict data consistency

When not to Use the Database Avoid when you need massive horizontal scaling without distributed database solutions, when your schema changes extremely frequently, or when eventual consistency is acceptable. Not ideal for unstructured data or when sub-millisecond latency is required.

ProsCons
ACID complianceHarder to scale horizontally
Strong data consistencyRequires predefined schema
Mature ecosystem and toolingCan be slower for massive scale
Support for complex queriesSchema migrations can be complex
Data integrity guaranteesTraditional vertical scaling

Graph Databases (Neo4j, Amazon Neptune)

Brief Summary of Database/Technology Graph databases model data as nodes (entities) with edges (relationships) connecting them. Instead of using join tables to establish relationships like in relational databases, graph databases store relationships as first-class citizens with edges pointing directly between nodes. This structure makes traversing complex relationships much more efficient and eliminates the need for expensive join operations when querying connected data.

Real-World Example LinkedIn Connection Recommendations: LinkedIn uses a graph database to power their “People You May Know” feature and calculate network distances between users. Each LinkedIn member is a node, and connections between members are edges. When LinkedIn suggests people you may know, it traverses your graph: it looks at your first-degree connections (friends), then their connections (friends-of-friends), identifies mutual connections, and ranks suggestions by the number of shared connections, companies, schools, and skills. In a relational database, this would require multiple self-joins on a users table—for instance, finding third-degree connections would need a query joining the connections table to itself three times, which becomes exponentially slower as the network grows. With LinkedIn’s 900+ million users, these queries would be impossibly slow. Neo4j can traverse these relationship paths in milliseconds because edges are stored as pointers. When you view someone’s profile, LinkedIn instantly shows “How you’re connected” (e.g., “You → John Smith → Jane Doe → This Person”), a query that would crush a traditional SQL database but is native to graph operations.

// Neo4j - LinkedIn Connection Recommendations
 
// Nodes
CREATE (u1:User {id: 'user_123', name: 'Alice Johnson', title: 'Software Engineer'})
CREATE (u2:User {id: 'user_456', name: 'Bob Smith', title: 'Product Manager'})
CREATE (u3:User {id: 'user_789', name: 'Carol Wang', title: 'Data Scientist'})
CREATE (u4:User {id: 'user_101', name: 'David Lee', title: 'Engineering Manager'})
CREATE (c1:Company {name: 'TechCorp'})
CREATE (c2:Company {name: 'StartupXYZ'})
CREATE (s1:School {name: 'MIT'})
 
// Relationships
CREATE (u1)-[:CONNECTED_TO {since: '2020-03-15'}]->(u2)
CREATE (u2)-[:CONNECTED_TO {since: '2019-07-22'}]->(u3)
CREATE (u3)-[:CONNECTED_TO {since: '2021-01-10'}]->(u4)
CREATE (u1)-[:WORKS_AT]->(c1)
CREATE (u4)-[:WORKS_AT]->(c1)
CREATE (u1)-[:STUDIED_AT]->(s1)
CREATE (u3)-[:STUDIED_AT]->(s1)
 
// Query: Find 2nd degree connections for Alice with shared context
MATCH (alice:User {id: 'user_123'})-[:CONNECTED_TO]-(friend)-[:CONNECTED_TO]-(suggestion)
WHERE NOT (alice)-[:CONNECTED_TO]-(suggestion) AND alice <> suggestion
WITH alice, suggestion, friend,
     [(alice)-[:WORKS_AT]->(c)<-[:WORKS_AT]-(suggestion) | c.name] as sharedCompanies,
     [(alice)-[:STUDIED_AT]->(s)<-[:STUDIED_AT]-(suggestion) | s.name] as sharedSchools
RETURN suggestion.name, suggestion.title, 
       collect(DISTINCT friend.name) as mutualConnections,
       sharedCompanies, sharedSchools
ORDER BY size(mutualConnections) DESC

Example Cases for the Database

  • Social network platforms (friend connections, recommendations)
  • Fraud detection systems analyzing transaction patterns
  • Recommendation engines
  • Network and IT infrastructure topology
  • Knowledge graphs and semantic data
  • Supply chain and logistics optimization

When not to Use the Database Avoid when your data is primarily tabular with few relationships, when you need simple CRUD operations on independent entities, or when the learning curve and specialized tooling are concerns for your team.

ProsCons
Efficient for highly connected dataSteeper learning curve
Eliminates complex joinsLess mature ecosystem than RDBMS
Intuitive relationship modelingCan be overkill for simple data
Fast traversal of relationshipsSpecialized query language (Cypher)

Search Engines (Elasticsearch, Apache Lucene)

Brief Summary of Database/Technology Search engines like Elasticsearch (built on Apache Lucene) are similar to document-based databases but include sophisticated text processing and indexing capabilities. Under the hood, they analyze and tokenize text content, creating inverted indexes that enable fast full-text searches. They include ranking algorithms, fuzzy matching for typos, synonyms, and other features that dramatically improve search user experience. However, these capabilities come at the cost of higher computational and storage requirements.

Real-World Example GitHub Code Search: GitHub uses Elasticsearch to enable developers to search across billions of lines of code in millions of repositories. When you search for a function name like “getUserProfile” across GitHub, Elasticsearch must handle typos (“getUserProifle”), case variations (“getuserprofile”), and rank results by relevance—prioritizing exact matches in popular repositories over partial matches in obscure projects. Traditional databases would require exact string matching and couldn’t handle fuzzy searches efficiently. Elasticsearch tokenizes code during indexing, breaking “getUserProfile” into searchable terms and creating an inverted index that maps each term to document locations. It applies relevance scoring based on factors like term frequency, repository stars, and recency. Elasticsearch also powers GitHub’s filter system, allowing searches like “language:Python stars:>1000 created:>2023”, combining full-text search with structured filters. The cost: GitHub runs massive Elasticsearch clusters consuming significant resources, but the developer experience would be impossible with a standard database—imagine trying to find a specific function across all of GitHub using SQL LIKE queries.

// Elasticsearch - GitHub Code Search Index
 
// Index Mapping
{
  "mappings": {
    "properties": {
      "repository": {"type": "keyword"},
      "file_path": {"type": "keyword"},
      "code_content": {
        "type": "text",
        "analyzer": "code_analyzer",
        "fields": {
          "exact": {"type": "keyword"}
        }
      },
      "language": {"type": "keyword"},
      "stars": {"type": "integer"},
      "last_updated": {"type": "date"}
    }
  }
}
 
// Sample Document
{
  "_index": "github-code",
  "_id": "repo123_file456",
  "_source": {
    "repository": "facebook/react",
    "file_path": "packages/react/src/ReactHooks.js",
    "code_content": "export function useState(initialState) {\n  const dispatcher = resolveDispatcher();\n  return dispatcher.useState(initialState);\n}",
    "language": "JavaScript",
    "stars": 215000,
    "last_updated": "2025-10-15T08:30:00Z"
  }
}
 
// Search Query
GET /github-code/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "code_content": {
              "query": "getUserProfile",
              "fuzziness": "AUTO"
            }
          }
        }
      ],
      "filter": [
        {"term": {"language": "Python"}},
        {"range": {"stars": {"gte": 1000}}}
      ]
    }
  }
}

Example Cases for the Database

  • Full-text search for e-commerce products
  • Log aggregation and analysis (ELK stack)
  • Application search functionality
  • Document management systems
  • Real-time analytics dashboards

When not to Use the Database Avoid when simple exact-match queries suffice, when budget constraints are tight (expensive to run at scale), or when you don’t need advanced search features. Not ideal as a primary data store for transactional data.

ProsCons
Powerful full-text searchExpensive to run at scale
Ranking and relevance scoringComplex to configure and tune
Handles typos and fuzzy matchingHigh resource consumption
Real-time indexing capabilitiesNot suitable as primary database
Rich query DSLEventual consistency issues

Multimodel Databases (ArangoDB, FaunaDB)

Brief Summary of Database/Technology Multimodel databases support multiple data models (document, graph, key-value, etc.) within a single database system. They allow developers to work with different data paradigms without managing multiple database technologies. Many modern multimodel databases integrate with GraphQL, enabling developers to describe exactly how they want to retrieve data using a flexible query language that spans different data models.

Real-World Example E-Commerce Product Catalog with Recommendations: An online retailer could use ArangoDB to handle both product catalogs (document model) and recommendation engines (graph model) in one system. Each product is stored as a flexible document containing different attributes—a laptop has specs like RAM and processor, while a shirt has size and fabric. Simultaneously, the graph model tracks relationships: “customers who bought X also bought Y,” “this product is similar to that product,” and “users who viewed this item.” When a customer views a laptop, the system retrieves the product document (document query) and simultaneously traverses the graph to find related products and personalized recommendations (graph query)—all in a single database query using ArangoDB’s AQL language. Without a multimodel database, you’d need MongoDB for products and Neo4j for recommendations, requiring complex data synchronization and multiple queries. The trade-off: while ArangoDB handles both workloads adequately, a specialized graph database like Neo4j might perform relationship traversals faster, and MongoDB might handle larger document collections more efficiently. Multimodel databases excel when you need “good enough” performance across multiple paradigms without operational complexity.

// ArangoDB - E-Commerce Multimodel
 
// Document Collection: Products
{
  "_key": "laptop_001",
  "_id": "products/laptop_001",
  "name": "MacBook Pro 16-inch",
  "category": "Electronics",
  "price": 2499.99,
  "specs": {
    "ram": "32GB",
    "processor": "M3 Pro",
    "storage": "1TB SSD",
    "screen": "16-inch Retina"
  },
  "inStock": true
}
 
{
  "_key": "shirt_042",
  "_id": "products/shirt_042",
  "name": "Cotton T-Shirt",
  "category": "Clothing",
  "price": 24.99,
  "attributes": {
    "size": "M",
    "color": "Navy Blue",
    "fabric": "100% Cotton"
  },
  "inStock": true
}
 
// Graph Collection: Relationships
// Edge: purchased_together
{
  "_from": "products/laptop_001",
  "_to": "products/mouse_015",
  "weight": 145,
  "type": "purchased_together"
}
 
// Edge: viewed_together
{
  "_from": "products/laptop_001",
  "_to": "products/laptop_002",
  "weight": 89,
  "type": "similar_product"
}
 
// AQL Query: Get product with recommendations
FOR product IN products
  FILTER product._key == 'laptop_001'
  LET recommendations = (
    FOR v, e IN 1..1 OUTBOUND product purchased_together
      SORT e.weight DESC
      LIMIT 5
      RETURN {product: v.name, score: e.weight}
  )
  RETURN {
    product: product,
    recommendations: recommendations
  }
 

Example Cases for the Database

  • Applications requiring multiple data paradigms (documents + graphs)
  • Microservices architectures needing flexible data access
  • Projects wanting to reduce database infrastructure complexity
  • Applications with evolving data requirements

When not to Use the Database Avoid when a single specialized database would suffice and perform better, when you need the absolute best performance for a specific paradigm, or when your team lacks experience with multiple data models.

ProsCons
Flexibility across data modelsJack of all trades, master of none
Reduced infrastructure complexityPotentially lower performance per model
Single query language across modelsSteeper learning curve
Good for evolving requirementsLess specialized optimization

Linked Map of Contexts