Homemade Threat Detection with Snowflake: Part Two
In the previous post, I showed how cloud activity logs can be collected to Snowflake and combined with threat intelligence for detecting “known bad” indicators of compromise in your environment. But what about threats that are not yet known? Reliably detecting those baddies requires some custom-made detections that are tailored to the unique “known good” of your environment.
Activity Whitelisting: Tripwire for the Cloud
A classic example of detecting threats by watching for anything that’s not “known good” is the original Tripwire Linux agent. As described in its documentation, “A Tripwire check compares the current filesystem state against a known baseline state, and alerts on any changes it detects”. Security teams that managed to successfully roll out Tripwire across their Linux servers did not need antivirus updates or threat intelligence because any unexpected deviation would be flagged for investigation.
Can the Tripwire approach, known as whitelisting, work for detecting threats at the cloud infrastructure level?
What You’ll Need
Whitelisting is not a mainstream threat detection in cloud security. The main reason is probably that vendors would have a hard time selling you whitelisting for the cloud. Unlike signatures and anomaly-based detections, whitelisting policies need to be tailored to each environment using analytics. They also require weeks or months of activity data to be collected before getting started.
The good news is that once you have the data collected in Snowflake, it just takes a little bit of SQL and one to two days for following the process detailed below.
Starting the Whitelist Policy Creation
Imagine that you’re tailoring a black Balcienega fitted dress for a client’s red carpet event. Relatable right?
Like fitting the dress to the wearer, the policy that you define needs to cover nearly all the activity in your environment and nothing else. If the policy is too “tight” then you’ll be bombarded with alert noise. Too “loose” and you’ll fail to detect the threats as they appear.
Start by selecting the cloud account for which you’ll be creating the policy. Ideally, this account is both highly sensitive and dedicated to a relatively specific purpose. Run a few SELECT statements on the past 60 days of activity logs from this account, using GROUP and COUNT to learn what the users are up to in there.
Once you understand the shape of the activity you’re seeing, you need to classify the event types into a manageable number of activities. For example, you might identify three main activities in your cloud account:
- Jane and Felicity manage users, performing actions such as creating new users and setting their password policies.
- Various users launch and terminate EC2 instances.
- Apps on servers use IAM instance roles read and write from S3 buckets.
- On a few occasions, some new S3 buckets were created.
Defining the Whitelist Policy
We’re going to create the whitelist policy as a SQL statement that defines what is allowed for the account. When we regularly run the SQL as a query against recent log data, we’ll be able to detect threat actors that don’t behave within the confines of the policy.
Start with a statement that excludes “read only” activity and expected behavior for the account. Define what’s expected in terms of the user, the action, and possibly other properties such as the source IP address. That will make it harder for attackers to bypass the control by compromising an authorized administrator.
To continue the example from above, you may start with a statement such as:
SELECT user_name, event_name, source_ip, account_id FROM logs.cloudtrail
WHERE account_id = ‘111111’
AND NOT (-- we are expecting read only activity
event_name in (‘list_instances’, ‘list_buckets’, ‘describe_bucket’
)
AND NOT ( --we are expecting Jane and Felicity user administration
user_name in (‘jane’, ‘felicity’)
AND event_name in (‘create_user’, ‘delete_user’, ‘set_password_policy’)
AND source_ip = ‘4.3.2.1’
);
You’ve just whitelisted Jane and Felicity’s user administration from their VPN IP address. But running that SQL against your cloud logs in Snowflake will likely return thousands of results, each of which would be an alert so we’re not done yet!
Carving Out Exceptions
As you get to know the activity in the cloud account, you’ll gain a deep appreciation for just how much work your DevOps team is putting in. You’ll also notice the additional activity sets that need to be carved out as “known good” in your whitelist policy.
Extending the example, we collect the vast majority of events in the environment into a few definitions of expected behavior.
SELECT user_name, event_name, source_ip, account_id FROM logs.cloudtrail
WHERE account_id = ‘111111’
AND NOT ( -- allowing read only activity
event_name in (‘list_instances’, ‘list_buckets’, ‘describe_bucket’)
)
AND NOT (-- allowing Jane and Felicity user administration
user_name in (‘jane’, ‘felicity’)
AND event_name in (‘create_user’, ‘delete_user’, ‘set_password_policy’)
AND source_ip = ‘4.3.2.1’
)
AND NOT ( -- allowing users to do EC2 stuff
event_name in (‘launch_instance’, ‘start_instance’, ‘stop_instance’, ‘terminate_instance’)
)
AND NOT (-- allowing assumed roles to read from S3
user_name like (‘%/assume-role/%)
AND event_name in (‘read_object’)
);
When Is The Policy Finished?
As you add more exceptions to the whitelist policy and rerun the SQL against the historical log data, you’ll receive fewer and fewer results. Since we don’t want to make a bunch of noise with our new whitelist policy, keep adding exceptions until you’re getting fewer than 60 results for the 60 day window in which you’re querying.
This relatively quiet level is better than it seems. Review the timestamps of the results that the query returns and you should see that the would-be alerts are clustering around a few time windows. Most days will receive no noise from this rule and when something unusual happens it will generate a number of related alerts that should be triaged together.
Hopefully you can extend this policy to additional similar accounts that have similar sets of expected behaviors. While some tweaking is to be expected, this approach will cut the effort needed to achieve widespread coverage of your whitelist control. In the end, you may have some accounts that are too Wild West for whitelisting but these aren’t where your Crown Jewels are kept right?
Tips from Experience
Going through this exercise will get you intimately familiar with your cloud environment. Make sure that from the beginning, your query excludes benign, read-only activity that will make it hard to get to know the changes in your environment.
Also, don’t automatically add every action you see to the whitelist policy. For example, you may see an action getting flagged because it’s not from one of the IP addresses in the approved list and that’s an opportunity to remind admins that they should be managing environments from behind the VPN.
On the other hand, don’t be too strict. If you see something that isn’t exactly best practice and you’re considering whether to include it in the approved actions, ask yourself what you would do if it showed as an alert. If you know you’ll just close out the alert ticket then put it in the whitelist.
Lockdown!
After a day or two of querying your account’s activity logs and building a whitelist policy that defines an approved baseline, you’re ready to start detecting threats. Load the query into a scheduled rules engine such as SnowAlert and investigate alerts as they arrive. Your policy may need some tuning initially but should remain fairly constant for a mature environment.
Is the manual effort of defining a whitelist worth it? For highly-sensitive and well-architected cloud accounts, an activity whitelist for threat detection is a powerful control that spots threats more reliably than generic signatures or automated anomaly detection. It’s a great way to apply security analytics for cost-effective cloud security.