Homemade Threat Detection with Snowflake: Part One

5 min readOct 7, 2019

Homemade threat detection can be more effective and less costly than buying canned analytics from a vendor. Using Snowflake might seem scary at first but but if you can explain your strategy in plain English then you don’t even need advanced SQL skills to get started. Instead, partner up with your in-house data team to create threat detection with less noise from false positives. You’ll also be cutting down on trips to Finance for more security budget.

To take the guesswork out of getting started, I’ll share a series of posts with ways to use Snowflake for threat detection in the cloud. Each of these approaches can improve your ability to catch compromised accounts and insider threats. All can be built on your existing Snowflake data warehouse without an additional line item or SKU.

Homemade Threat Intelligence

Easy to create and few false positives

The recipe for threat intelligence is basic and the results are great for going on a security alert diet. Maybe that’s taking the cooking metaphor too far but the truth is that threat intelligence applied to cloud API activity is less prone to false positives than at endpoints. Your laptop users might be browsing to questionable sites but “known bad” IP addresses should never execute API calls in your cloud.

The only significant source of noise in this approach might stem from “dynamic” cloud IP addresses being used for something nasty and then later reused by your legitimate services. Still, alerts using threat intelligence are easy to create and bring great ROI.

Drawbacks

Threat intelligence has some limitations that mean it can’t be your only approach to threat detection. Threat feeds tend to lag days or weeks behind threat actor campaigns so by the time you get the intel it might be too late to act.

Each source of threat intel covers a different set of threats (malware, cybercrime groups, APTs, etc.) so results may vary depending on the sources you’ve integrated and your industry.

Despite these drawbacks, you want to know for example if someone is issuing API commands from behind a Tor anonymizer as was the case in the Capital One breach. The threat actors aren’t expecting you to be watching for that.

Getting the Data

Collecting Cloud Activity Logs to Snowflake

AWS CloudTrail, Azure Activity Logs and Google Stackdriver data is easy to collect using SnowAlert Data Connectors. The open source project is available on GitHub and includes code for setting up Snowflake’s native data ingestion services. External tables, streams, tasks and SnowPipe can be combined to automatically collect cloud log data to Snowflake into a schema that supports security analytics.

Many new connectors since launching in June

Getting Threat Intelligence in Snowflake

You can never collect enough threat intelligence but a good place to start is with a fresh list of Tor exit nodes. These are the IP addresses from which bad guys will emerge when they’re hiding their identity with the most popular anonymizer solution. While you won’t know who they are, you know they have no business issuing API commands in your environment.

Several researchers publish daily updates on the full set of Tor exit nodes IP addresses. One such list is on Github and can be fetched using a scheduled Lambda function. Your Lambda (or Cloud Function in Azure) can use Python to download the list and insert it into Snowflake.

For those looking to take a shortcut to threat intelligence in Snowflake, several vendors offer fully curated datasets that are served over Snowflake Data Sharing. Both Recorded Future and IntSights can share bad IP addresses straight into your Snowflake so you don’t need to do any collection beyond a one-time data share SQL command.

Applying Threat Intelligence for Threat Detection

So you have the data, now what?

Joining the Datasets

Once your cloud logs and indicators of compromise (IOCs) are continuously streaming into your Snowflake database, join the datasets to detect active threats. As shown in the code below, this takes a basic JOIN statement on the IP address value in the two datasets. Any time that an IP in your administration logs is also present in your threat intelligence- sound the alarm!

FROM data.cloudtrail AS cloudtrail
INNER JOIN threat_intel.public.iocs AS iocs
ON cloudtrail.SOURCE_IP_ADDRESS = iocs.IOC:value::string
WHERE 1=1
AND iocs.IOC:last_seen > dateadd(day,-60, current_timestamp())
AND iocs.IOC:severity != ‘Low’
AND cloudtrail.EVENT_TIME > dateadd(hour,-1, current_timestamp())

Use your Snowflake worksheet to test this join and see that results are as expected. Excessive false positives can be trimmed by adjusting the severity threshold or the “last seen” time window. Threat intel gets stale like bagels left on the counter so make sure that you’re automatically collecting fresh data at least once a day.

Running the Detection Logic

The JOIN statement above will return a result only if you have a “known bad” IP pulling the strings in your cloud environment. However, it needs to run regularly in order to catch the threats when they show up. Luckily for us, Snowflake support scheduled tasks for running queries on a regular basis. SnowAlert can also be used to automatically run alert rules on Snowflake from a Docker container.

How often you want to run the rules should be balanced between cost and detection time. Running the JOIN every few minutes will mean shorter time to detection but higher costs as Snowflake credits are consumed only when your warehouse is active for running queries. Most organizations find hourly checks to be a good balance, with significant cost savings from suspending the warehouse between runs.

The last piece to consider is how the results of the alert query will get tracked. Initially, a BI dashboard can reflect the results directly from your database. For better results, use a tool like SnowAlert to open a Jira ticket or send a Slack message when a threat is detected.

Conclusion

As you’ve just read, you don’t need another vendor in order to apply threat intelligence against your log data. Snowflake is a fast and cost-effective solution for joining datasets such as threat intelligence and activity logs. A number of Snowflake customers are already using it for security analytics and built-in features like “Streams and Tasks” make it easy to get started.

Thinking about bringing threat intel into your Snowflake? Tweet me at @snomersinger