Developing incident response (IR) playbooks specifically for the cloud is critical for all organizations with assets deployed in PaaS and IaaS cloud environments. Because cloud environments have unique assets, services, event types and controls, security teams need detection and response playbooks that align with the actual environment, which often differs significantly from on-premises infrastructure. This piece details why cloud IR is important and recommends ways to get started building a strong cloud IR program.
Why is Cloud IR Important?
As organizations grow their cloud deployments, their security teams are coming to the realization that many of their tried-and-true on-premises IR playbooks aren’t working as well (or at all) in cloud environments. There are many reasons for this. For example, the cloud has:
- New workload types: Organizations have a number of new workload types in the cloud that don’t necessarily reflect what they’ve been running on site (e.g., cloud-hosted container services, serverless, etc.).
- New services: Many cloud services don’t exist on site, including cloud storage types, monitoring services, IAM and more.
- New event types: There is a vast array of new event types in the cloud that don’t exist on premises. The cloud is a magical land of APIs that seem to be constantly chatting with one another. What these events mean is a whole new area of study and experience for security teams that need to know what to look for and what to respond to.
With a new environment and new assets and services, it’s no surprise we need new playbooks to accommodate them. That’s not to say all the work we’ve done in traditional IR playbook development goes out the window. Those same concepts, event types and categories (e.g., unauthorized access, credential theft, etc.) can still apply, but a lot of adaptation is needed to bring existing playbooks up to speed, in addition to building new ones altogether.
Download: Create Incident Response Metrics Worth Reporting
Changes in IR Playbooks for the Cloud
Some differences between cloud IR and traditional IR in an on-premises environment include:
- Differences in responsibility: Cloud deployments necessarily involve the shared responsibility model, which means some assets and services in the cloud may be wholly or partially under the management of the cloud service providers. If you experience an intrusion in a SaaS cloud, for example, there may be very little you can do in the way of investigation, and you will have very little visibility and/or telemetry related to events and indicators to trigger an IR effort in the first place. Within a more diverse IaaS cloud, however, many objects and assets are entirely under your control and largely your responsibility.
- Differences in tooling: The tools and controls used within data centers aren’t always the best fit for cloud environments. Some aren’t compatible or have implementation or performance challenges, while others aren’t attuned to cloud API calls and working models to contextually detect cloud attack types and intrusion indicators.
- Focus on cloud-native services: Because the entire cloud fabric is software-based, there’s much more emphasis on using cloud-native services as “guardrails” and critical elements of the IR workflow (in some cases, focused primarily on automation and orchestration efforts). Also, some new costs can arise with log/event generation and security services.
When building playbooks for cloud IR, be sure to include cloud-service API integration and automation capabilities in your workflows. It’s much easier to build highly automated if-then actions in the cloud than in on-premises data centers, and native tools for doing this are often readily available. For example, an alert from a cloud tool could trigger a change to a workload security group that isolates it from anything but the IR team for investigation, or it could even trigger a lambda function or cloud config rule that rebuilds the workload from an approved image. Similarly, automated acquisition of evidence artifacts, like disk and memory (or even network packet captures in some cloud environments), can be enabled to save the IR team a lot of time.
Watch Out for API and Service Changes
Focusing on automation and using cloud provider APIs is critical when developing cloud IR playbooks, but it’s also important to build in a much more frequent playbook review cycle due to changes the providers make on a regular basis. Be sure to regularly review updates and specific detection and response actions, as well as any updates to scripting and code in use that may change or become incompatible as the provider changes up the environment; this review is particularly important for runtime support and language support in serverless functions.
Additionally, take the time to codify your playbooks in infrastructure as code as much as possible. The more you can build playbooks in actual cloud template formats, the more consistent and easily reviewable they will be down the line. Many organizations have spent a lot of time and effort building service-by-service components of their playbooks within the native interfaces of different cloud environments, only to have to start (almost) from scratch when the environment changes or functionality is updated by the provider.
Cloud IR Best Practices
It’s important to commit time and operational capacity to developing your cloud IR playbooks and processes. To ensure your cloud IR planning gets started on the right path:
- Send all cloud IR team members to cloud provider training, if at all possible: This doesn’t need to be security-specific training, but it should familiarize the team with the types of services, objects, APIs, commands and other cloud-centric concepts they’ll need to properly build out a comprehensive cloud IR function. Education is always an ongoing challenge due to the high rate of technology change in the cloud.
- Ensure you have IAM and role-based access control enabled for IR teams when needed: This is a very important planning step. You don’t want to stop and spend the time needed to create a least-privilege model for IR analysts in the heat of battle. Create least-privilege accounts to perform specific actions in the cloud when needed, and define a role for them, ideally for cross-account access. Also, enable MFA for these accounts.
- Enable write-once storage for logs and evidence: This is good to do now, even if you aren’t currently storing evidence in the cloud. You can use a bucket versioning tool for secure retention and recovery.
- Enable cloud-wide logging if available: Also enable triggered metric-based alarms tools.
By performing all these actions upfront, you can cloud-enable your IR function. Be sure to revisit these types of actions periodically as part of the preparation phase of your IR workflow.
Although reasonable efforts will be made to ensure the completeness and accuracy of the information contained in our blog posts, no liability can be accepted by IANS or our Faculty members for the results of any actions taken by individuals or firms in connection with such information, opinions, or advice.