Okta Data in Snowflake for Account Compromise Detection

4 min readMay 28, 2019

The user accounts controlling your critical cloud infrastructure should be at least as well protected as your Instagram account, right?

But consider this Instagram alert for “Suspicious Login Attempt” shared by a well-known influencer. He was put on alert that someone had stolen his password based on unusual login activity. The hacked user could then take steps to restore his account security including reseting his password and ensuring that multi-factor authentication (MFA) is in place. A win for the security analytics at Instagram that raised the alarm.

Unfortunately for security teams, AWS and Azure do not provide similar functionality. It’s left to us to spot unusual login attempts against our cloud infrastructure and lock compromised accounts before they’re used against us.

This is a challenge that most security teams are not equipped to address. At the same time, it’s becoming a more pressing issue as companies embrace multi-cloud architectures and new users are provisioned faster than configurations can be closely reviewed.

As usual (#1), there’s no silver bullet but security analytics can reduce the risk of a compromised account going undetected. At Snowflake’s security team, we’re learning from our Okta data what is “usual” for our internal users and that let’s us catch unusual logins across various services.

As usual (#2), the first step is getting all the relevant data into Snowflake. For this capability, we collect AWS CloudTrail data, Azure Login data, and Okta System Log data. The Okta data collection requires less than 100 lines of open source code and we run that code automatically at short intervals. Once in Snowflake, the data can be analyzed and joined with other datasets.

Here is where some data science comes into play, but there’s no machine learning magic involved. We run an R script (also available as open source within the SnowAlert project) to baseline the Okta logins and establish a statistical record of what is “normal” and expected for our internal users. This baseline is updated automatically every day so we don’t need to maintain a “whitelist” of known good login sources.

The result is a table in Snowflake where successful logins can be graphed by IP address, resulting in a clear picture of “normal”.

Normal on the extreme left, unusual sources to the right

Clearly, this data shows that a tiny portion of IP addresses are the source for the vast majority of our successful Okta logins. The IPs on the extreme left of the graph represent our offices and corporate VPN addresses. Since these all originated successful Okta logins, we can have some level of confidence that they cover all the “good” places that our internal users come from.

Have established what is normal, we can join this dataset with small subsets of data that we’re particularly concerned with: logins to cloud administration consoles. Only authorized admins should be logging into these consoles so we have created SnowAlert rules to notify the security team if someone logs into the production AWS or Azure consoles from a source that’s not well within the statistical baseline.

Part of SnowAlert rule for “AWS Console from Uncommon Okta IP”

Example of a resulting ticket that required investigation

The result of this alert rule is an alert ticket for the team to investigate. Usually we find out that the flagged admin has logged in from a remote location and forgot to VPN through the usual channels. However, this alert does not fire often so we consider it to be high fidelity and if a cloud admin’s account is actually compromised, we have a shot at catching it before real harm is done. Also, we’re able to easily extend this capability to new infrastructure whether it’s GCP or another SaaS solution that we want to monitor.

By combining activity log data in Snowflake, we’re able to learn things about our users that we didn’t know when the data was siloed. These insights mean our cloud admin accounts are monitored and protected… at least as well protected as their Instagram accounts.

Okta Data in Snowflake for Account Compromise Detection

Written by Omer Singer