Designing a system that supports millions of users is challenging, and it is a journey that requires continuous refinement and endless improvement. In this chapter, we will go over scaling a system that can support a small set of users, to millions of users.

Single Server Setups

The absolute first step in designing a system is using a single server setup. Here, the content/system that you are serving is being hosted on a single server. Lets take a look at the flow of information in this setup:

  1. Users will access websites through domain names, which are configured by a Domain Name Service (DNS). Example: https://sfparks.info
  2. From there, when trying to access the Domain Name, an Internet Protocol Address (IP Address) is returned to the browser or mobile app that represents the server’s IP address.
  3. Once the IP address is obtained, Hypertext Transfer Protocol (HTTP) requests are sent directly to your web server.
  4. The web server then returns the HTML pages or JSON response for rendering.

This is a simple setup that doesn’t really have much business logic or interaction with the user. To increase the complexity, let’s add a database.

Adding a Database

With the growth of the user base, one server is not enough, and we need multiple servers: one for web/mobile traffic, the other for the database. We want to separate the web/mobile traffic from the database traffic so that we can scale them independently.

What Databases Should we Use?

There are many different types of databases that you can choose from, but ultimately it really depends on the use case that you are going for. In our simple example, we would probably choose some form of Relational Database like MySQL, PostgreSQL, or Oracle. There are also various Non-Relational databases like CouchDB, Neo4J, Cassandra, DynamoDB, etc. The main difference is that join operations are usually not present within the non-relational DBs.

To choose what DB to use, take a look at What Database to Use When?. As a guideline, start with PostgreSQL and adjust based off of your requirements and needs.

Scaling - Vertical vs. Horizontal

  • Vertical Scaling means the process of adding more power (CPU, RAM, etc) to your servers. Basically, this means beefing your existing servers up.
    • Usually used when traffic is low due to how simple it is to accomplish.
    • Has a hard limit since it is impossible to add unlimited resources to a server.
  • Horizontal Scaling allows for you to scale out, by adding more servers into your pool of resources.
    • More desirable for large scale applications due to the limitations of vertical scaling.

Looking at our current example, we could vertically scale our server. When we get more users, we will have to horizontally scale and implement something called a load balancer.

Load Balancing

A load balancer evenly distributes incoming traffic among web servers that are defined in a load-balanced set. Here’s our new flow of information:

  1. Users will access websites through domain names, which are configured by a Domain Name Service (DNS). Example: https://sfparks.info
  2. From there, when trying to access the Domain Name, an Internet Protocol Address (IP Address) is returned to the browser or mobile app that represents the load balancer’s public IP address.
  3. Once the IP address is obtained, Hypertext Transfer Protocol (HTTP) requests are allocated by the load balancer using private IP addresses to a server that has capacity.
  4. The web server then returns the HTML pages or JSON response for rendering.

Database Replication

As the user base continues growing, we would want to implement database replication to sustain the traffic that will come into our DBs and create duplication in case if a database goes down. Database replication usually occurs in a Master/Slave pattern

  • The master database usually only supports write operations (INSERT, DELETE, UPDATE)
  • The slave database gets copies of the data from the master and only supports read operations

There are inherent advantages to having database replication within your environment:

  • Better performance since queries are able to be processed in parallel (reads/writes).
  • Reliability increases since data loss isn’t the biggest concern since the data lives in different places across different servers.
  • High Availability exists since your product remains in operation even if a database is offline.

Recovery Situations:

  • If a slave DB goes offline, a new slave DB will be introduced to the pool taking the place of the previous DB. The master might accept read requests temporarily.
  • if a master DB goes offline, a slave DB will be promoted to the master and another slave will take the place of the previous DB that was promoted.
    • Recovery scripts will have to be run to ensure that the data is up to date within the promoted slave DB as there might be a bit of a gap between that DB and the master.

New Flow of Information:

  1. A user gets the IP address of the load balancer from DNS.
  2. A user connects the load balancer with this IP address.
  3. The HTTP request is routed to a server in the pool of resources.
  4. A web server reads requested user data from a slave database.
  5. A web server routes any data-modifying operations to the master database. This includes write, update, and delete operations.

Improving Client-Side Operations

Now that we have the existing server-side operations handled reasonably, it is time to improve the performance of client-side response/load times. We can start doing this by adding a cache layer and shifting static content to the content delivery network (CDN).

Caches

A cache is a temporary storage area that stores results of expensive responses or frequently accessed data in memory so that subsequent requests are served more quickly.


Linked Map of Contexts