Our first blog of this series outlined the various challenges posed to DevOps and IT leaders resulting from the COVID-19 pandemic. In this blog we take a closer look at designing application infrastructure for intelligence with AI and automation .
When COVID-19 struck, one Snapt client providing live streaming to many millions of customers was faced with two challenges: how to leverage agile DevOps methodologies to scale-up and scale-down more intelligently; and how to maintain both resilience and security in the face of huge spikes in demand? Scaling-up more instances in the cloud delivers more capacity but it does so in a crude and potentially expensive manner.
For security, as well, what works on lower traffic volumes might not work on higher traffic volumes or in a shifting environment; many savvy cyber-attackers are targeting industries experiencing big bumps in demand precisely because they can exploit the chaos associated with scaling. The rapid growth in users also generates a richer target environment.
Hiring more DevOps engineers or boosting the Site Reliability Team is not an option on short notice – and might be overkill. Agile DevOps means being nimble, after all. Staff turnover creates even more risk because tribal knowledge walking out the door during this critical and less forgiving period can create larger disruptions.
To address these challenges effectively, DevOps and IT leaders need to leverage intelligence rather than blunt instruments.
ML and AI Now Viable for ADCs and Application Intelligence
Machine Learning (ML) and Artificial Intelligence (AI) are increasingly included in agile DevOps methodologies and agile software development. ML and AI are addressing problems of resilience, security, scale and performance in today’s modern IT infrastructure. We see a growing use of ML and AI in a variety of disciplines. Development teams are using AI in test automation for test creation, regression testing and code debugging. AI and automation are accelerating the Continuous Improvement (CI) / Continuous Delivery (CD) process) by automating cloud and virtual environment provisioning, sniffing out bad and unauthorized code, and enforcing CI/CD deployment rules, milestones and policies.
For delivering applications, ML and AI can play an important role in managing and automating distributed containers and application delivery controllers (ADCs) on multiple clouds. While this might seem daunting, modern ADCs are baking intelligence layers into their core functionality and features. Even better, networked ADCs can generate ML insights and guidance based on observations and data from many more deployed containers. This capacity for collective insight makes possible more proactive risk and capacity management, improved security, and automatic resiliency and agility that previously required human hands or one-off scripts.
ML and AI 101: How It Works and Why More Data is Better
Broadly speaking, ML uses algorithms to identify patterns in very large datasets. ML algorithms can be designed to ingest data continuously to automatically “train” the algorithm on verified data. For example, an algorithm intended to identify different breeds of dogs in a video will be able to identify variation among dogs with greater accuracy as it analyzes more videos containing dogs of different breeds. If the dog recognition algorithm analyzes only videos with Labradors, it will only be able to identify dogs that look like Labradors. However, as its “learning logic” is applied to an expanded array of videos with many different breeds of dogs, it will expand its general understanding of dogs to include the traits of all the breeds it is exposed to. The algorithm will change its behavior. It will identify collies as dogs, chihuahuas as dogs, and bullmastiffs as dogs. And if it sees a cat, it will not recognize the cat as a dog.
By seeing more and more images of dogs, the algorithm will get better at recognizing each breed of dog. Initially, most algorithms require some human training – called reinforcement. A person labels images of each breed, for example. Over time, ML algorithms can truly teach themselves and build on past learnings. That said, edge cases can stump the algorithms. An image of a hairless dog might be unrecognizable to an algorithm trained to recognize dogs with hair. This is why human oversight at a high level remains important for algorithmic application delivery.
The ratio of false positives and false negatives tends to decline sharply with exposure to more data. This is why ML that ingests data from multiple ADCs – or even from many thousands of ADCs sharing information in a collaborative network – can achieve near real-time performance with sub-second latencies for responses to threats, demand surges and more.
AI takes the learnings from ML algorithms and applies additional logic and intelligence to drive decisions. For example, if an ML algorithm detects that the same IP address has tried a SQL injection attack on 10 different ADC hosts and recognizes a pattern, that information will be used in an AI-powered policy engine to automatically define a rule stating “If IP address X requests access to a host, deny access automatically and block the request.” This is a crude example but both ML and AI are particularly effective at automating things humans are very bad at. Humans are terrible at identifying patterns or noticing changes in large seas of data. Humans are terrible at creating and managing policies and rules that must frequently change and involve calculating complex dependencies. In application delivery and agile DevOps methodologies, both of these capabilities can alleviate grunt work, reduce error, reduce security risk, save money, and improve customer experience.
Real Applications for ML and AI In Application Delivery
Going back to our live streaming customer, let’s consider how ML and AI can be deployed for improved application delivery. Here are some real achievable ways that application delivery can be improved through the use of intelligence and automation via ML and AI.
Keep in mind that these examples are only possible if a DevOps or SecOps team has ADCs that share intelligence and data with very low latency (between 5 and 1 seconds). For this to be possible, you need a true control plane/data plane application and network architecture with containerized ADCs acting as data planes and sharing a networked control plane where user and application state is captured and preserved, machine learning operates, and policies are propagated to the ADC nodes on the network. These examples are broadly applicable beyond this specific example.
Self-Driving ADC Network Management
An ML algorithm can study how network engineers manage their resources: routing rules, bandwidth metering, QOS policies and SLAs. From these learnings, the ML can feed an AI system insights on how to direct the application layer traffic based on past patterns observed. For example, if ML noticed a pattern of traffic increasing on certain days of the week while decreasing on others, then it could recommend to the AI deployment system rules to increase capacity or increase allocated bandwidth for applications on the up days and decreases on the down days.
Networked and Geographically-Aware ADC Load Balancing
Global Server Load Balancing is a well-known technology for High Availability and consistent application performance. With an ML engine studying patterns in the background, the AI could surface and prepare for consistent progressions of load balancing that conform to characteristics in traffic, geo, agent-type, and other data points. This can allow an AI agent to warm-up additional load balancing capacity in anticipation of the coming progression of capacity and geography.
Intelligent Web Application Firewalls and DDoS Mitigation
Most ADCs today have in-line Web Application Firewalls (WAFs) and Distributed Denial of Service mitigation to block attacks. Legacy ADCs that are not designed for cloud native deployments and have big footprints cannot deploy in wide distributions of hundreds of networked nodes. Modern distributed ADC capabilities can collect attack information from hundreds or thousands of nodes, or even share information across organizations.
For example, if a particular host in a cloud data center in China attempts to install ransomware on five different endpoints using an attack signature common to the prominent Chinese Advanced Persistent Threat group APT40, the pattern of that attack can be instantly broadcast to all networked ADCs. These ADCs will be able to recognize and block the attack before it happens or they can raise their confidence rating requirements on traffic passing from IP addresses at this particular Chinese cloud data company. Or if one ADC node identifies a packet storm as a DDoS attempt, it can alert all other ADC nodes to warn their upstream ISPs or CDNs to block that traffic.
Capturing and Codifying Tribal Knowledge
ADCs with an ML capability can record deployment patterns and details formerly locked in obscure config files, scripts or – worse – inside the heads of DevOps engineers. By identifying the patterns that lead to good outcomes and past deployment successes, ML can provide AI applications with the information to create or suggest “golden recipes” or to generate semi-complete or fully functional scripts that codify the tribal knowledge and turn it into institutional and programmatic knowledge.
Conclusion: Why Distributed ADCs Have ML/AI Superpowers
￼The four use cases above are only a handful – there are many more possible applications ￼ Pretty much everything that an ADC touches can be improved by a well-tuned ML algorithm and smart AI-powered applications of policy and business logic.
The catch? If your ADC is not distributed, then you will struggle to collect sufficient data to properly train your ML systems. Or it will take a really, really long time. Likewise, if your ADCs are not distributed, are not tightly networked, do not have low latency inter-node communication and information sharing – then you will not be able to trust AI to make real-time decisions.
ML tuned badly can deliver poor customer experiences by blocking connections unnecessarily, to name one example. The upside of a well-tuned ML backend for networked ADCs is smart, adaptive automation and intelligence that makes agile DevOps and agile SecOps even more agile, responsive and effective.
When infrastructure is stressed by a crisis like a pandemic, these advantages could make the difference between success or failure.
Want to learn more about The Practical Application of Machine Learning and AI to Application Delivery Controllers?