Designing for Distributed Systems: Lessons from Digital Healthcare

by Bethany Hill on Tips and Tricks • July 30, 2020 Designing for Distributed Systems: Lessons from Digital Healthcare

Our first blog of this series outlined the various challenges posed to DevOps and IT leaders resulting from the COVID-19 pandemic. In this blog we take a closer look at designing distributed systems for digital healthcare.

If you run DevOps or infrastructure for a company in the healthcare space, your world likely changed forever in March 2020. The COVID-19 pandemic immediately disrupted the normal delivery of health services and shifted a massive burden of service delivery into the virtual world. 

In many parts of the world, hospitals and general practice were closed to anything but COVID-19 cases, antenatal and birth care, or the most serious health emergencies. Overnight, millions of patients in locked-down zones who had never used telemedicine began to access their doctor and other medical services online rather than visit in-person. 

For businesses and organizations involved in telemedicine, this Black Swan event forced massive scale-ups of cloud infrastructure – and a rapid lesson in the importance of building applications that can scale quickly and take advantage of distributed computing.

Pandemic Taxes Health-Startup Infrastructure

Nearly every participant in the online health space was impacted by the rapid and wholesale shift in consumer behavior towards telemedicine. This included direct-to-consumer urgent care startups like Plush Healthcare, drug price comparison engine GoodRx, tech-forward national healthcare practices like OneMedical, and major publicly traded healthcare providers such as Teladocs. Startups for reproductive health and mental health services also experienced a spike. 

In fact, the spike was so large that it might be unprecedented in modern digital commerce. According to Fair Health, a non-profit organization that tracks anonymized health insurance claims data in the United States, telemedicine insurance claims rose by 4347% in March 2020 compared to the same month in 2019. In April 2020, the increase was a staggering 8336% compared to the same month in 2019. 

Not surprisingly, the surge initially created problems for many telemedicine services providers with infrastructure capacity and application delivery. As reported in March by Stat, the respected health publication, “Telehealth services are sagging under the weight of an unprecedented surge in patients as hospitals scramble to shift routine care online in response to the coronavirus pandemic. The crisis is stressing major telehealth providers’ technical infrastructure...”. 

Things could get worse. Even though telemedicine usage soared, it still accounts for less than 10% of all doctor visits. There is plenty of room in the adoption curve for a big second spike should COVID or the next pandemic to cause consumers to take their healthcare online.

Designing a Distributed Infrastructure for the Next Pandemic

To pandemic-proof an application, DevOps teams ought to pursue a distributed infrastructure to improve availability, scalability, performance and resilience. 

For the purpose of this article, we will focus scenarios in the United States. For the most part, due to different regulations, laws and insurance situations, healthcare startups do not provide service across international borders. However, for other applications, internationally distributed infrastructure might be viable option.

In addition, COVID-19 mainly induced a spike in patient-facing digital health services. With this in mind here are some specific considerations for application delivery and architecture design:

  • Geographically Aware Load Balancing: This helps ensure users receive an optimal experience by routing them to infrastructure that is geographically close to them. Granted, geographically distribution is a basic requirement, even for applications delivered within the United States. That said, building geo-awareness to enable smart scaling-up and scaling-down is a bit more complicated for a number of reasons. As we saw over the course of the pandemic, some parts of the United States remained less affected in March but became virus centers in June. Witness Texas and Arizona. The opposite was true in New York and Massachusetts. Under these circumstances, it’s a benefit to have infrastructure that is geographically aware and can automatically adjust application delivery resources.
  • Graceful Multi-Site Failover: This means your application will automatically route users to other datacenters in the event of an outage. Chances are, if you are running a healthcare application, you are running HA already and have plans for failover. The real questions you want to consider are: how graceful will that failover be, and will your users experience any outages or service interruptions during failover? For this reason, your failover should also work in conjunction with your scale up-capabilities.
  • Timezone-Aware Automated Scaling: This means you can elastically scale up or down in a matter of minutes to support a variable number of users. Rapid scale-out is not only useful during emergencies or unexpected bursts. Scale-out on planned intervals can save money and shift capacity to match needs. For example, doctors’ offices tend to open at specific hours of the day – this means capacity requirements are likely to go down after working hours are over. This reduction would follow the sun across time zones, so a planned, staggered scale-up on business days followed by a staggered scale-down at the end of the day might make sense. With slow scaling capabilities, healthcare companies might end up paying for an extra hour per day of unneeded capacity, so rapid scaling is essential for controlling costs. 
  • Simultaneous Multi-Cloud Capability: Locating all application delivery in a single cloud introduces a single point of failure, which is an unacceptable risk for critical services like healthcare. Putting everything in one cloud also introduces a business risk: price increases from the cloud provider could blow up your cost structure. That said, maintaining two sets of identical application stacks running simultaneously in two different clouds is expensive and complicated. Also, passing state from one cloud to the other can be complex. For these reasons, it is critical to use containers and other technologies that allow for more cloud agnostic deployments and loosely coupled systems with seamless shifting of compute resource between multiple clouds.
  • Regulatory Compliance (eg. HIPAA): This goes without saying for health care startups. Every component of a health company's tech stack that touches personal data must comply with this strict U.S. law. Fines for violations can be stiff, running into the seven figures. 


COVID-19 has put digital health companies on high alert that their infrastructure will likely be subject to big bursts in usage. Designing systems to accommodate rapid shifts in demand, both in terms of capacity and geography, is critical to making sure that digital health consumers receive top-grade experiences, regardless of what challenges may be looming behind the home screen of the app.