How To Stay Online, Scale-Up, And Keep Latency Down
Launching a video game may be among the most challenging operational tasks today. Demand can be hard to predict.
Launches are binary events that defy load testing. Setting up the right cloud infrastructure takes foresight and planning, and while over-provisioning wastes money, under-provisioning is probably even worse. Expect upset customers screaming at you on Twitter, Discord, and Reddit when a game starts glitching under the massive weight of millions of users.
Here’s a quickstart guide to best practices for planning your cloud infrastructure to support a successful game launch.
Plan and diagram your game application architecture
Sketch what your application will look like, including process flows, service interactions, and endpoints.
- List each cloud platform you plan to use and the data centers on that platform. If you are using a serverless or a Backend-as-a-Service platform (e.g. Firebase), make sure to note those.
- List each software package you intend to use along with the version number.
- List every third-party service you intend to use.
- Make sure to list all the third-party services and related APIs you will be connecting with, along with links. A key consideration is your content delivery network (more on that later).
- Also include any software appliance components, such as application delivery controllers (ADCs) or load balancers (LBs), that you expect to leverage as part of your application delivery and performance process.
- Map your security components, including your web application firewall, your distributed denial-of-service (DDoS) protection, and other services to protect your application.
- Finally, map your monitoring and orchestration components to ensure that you can visualize the load (RPS), capacity (utilization), and performance (latency) of all critical systems and initiate the infrastructure expansion (scale up or scale out) ahead of any performance problems.
Create a backup and recovery play for your game
Make a playbook so you can recover quickly if things go bad on launch day.
- Determine where your primary backup will reside (in one or multiple clouds).
- Create a planned recovery sequence to follow in case of an outage.
- Create the scripts to enable your DevOps platform to redeploy quickly (within minutes).
- Determine which of the third-party services will need to be restarted in case of failure.
- Have a list of contact numbers and emails of all your third-party providers so that you can reach them quickly in case of a crash.
- Run a test of your backup plan in a staging environment that replicates your production architecture in part or in whole.
Forecast the capacity requirements of your application (including scaling events)
Without a capacity forecast, you are in the dark about how and what to provision to properly balance performance and cost.
- Estimate the number of users you plan to get on your game per geographic region. To be safe, use double that number as your core capacity requirement.
- Have some users beta-test your game to help you measure the load requirements per user.
- Tune asset delivery and asset packaging to reduce latency and the application load for players. This may include using better compression or batching downloads of assets. Deploy the update and measure the revised load per user.
- Add in ADCs or LBs to get a better sense of how much performance can be improved or accelerated.
- Apply those load requirements to calculate the amount of CPU, bandwidth, and backup you will require. Where appropriate, make sure you plan to distribute those geographically to match demand.
- Run scaling tests on your infrastructure to understand the amount of time required to add capacity or scale to each element of your game architecture. This may influence key choices, especially for LBs and ADCs, which can cause large differences in the amount of time required to scale out or scale-up.
- Create proactive programmatic rules and policies by monitoring triggers and orchestration platforms to initiate scale-out or scale-up actions ahead of capacity or performance problems. Make sure to consider whether your application will be hosted by one cloud or span multiple clouds. This is a critical factor in monitoring and scaling because it may add complexity to parts of your deployment that are not natively designed for multi-cloud application architectures.
Learn more about Snapt's Load Balancers and Security for Video Game Development and Live Services.
Final Pre-Flight Checklist
The launch is imminent. Here’s a quick list of final checks.
- Validate that all your monitoring and game analytics systems are working as expected. You want to make sure you measure all system performance parameters but also capture all user interactions to fine-tune gameplay.
- Check all your images and assets to ensure that everything is compressed and in the format you need. You’d be amazed at how one or two improperly formatted elements can slow performance.
- Test your recovery plan one last time to practice and to make sure that it works as you anticipate.
- Load test your game again just as a final check. This includes load testing to induce scaling events.
- If you are planning to run your game on multiple CDNs or multiple clouds, make sure that your orchestration and management layer is configured to properly prioritize, sequence, or round robin the different providers.
- Let your contacts at the big cloud companies and other major services you plan to use know the launch date, and that they should be on alert should you need help.
Launching a game is always a stressful experience. Things never go quite as planned. But by planning out all the important elements of your infrastructure and playbook, you can minimize the negative impact of unforeseen events and make sure the gamers who flock to your launch will have an excellent experience with no blocked access to user accounts or game server, lightning-fast load times, and no lag during gameplay.