SQL for CIS Compliance Validation: Why and How

No one wants to manually validate their cloud configurations. That doesn’t scale, it’s likely to miss bad configurations, and it’s mind-numbingly boring.

An alternative is to use a tool like AWS Security Hub or Azure Security Center that promises compliance reports out of the box. Unfortunately, buying the cloud provider’s generic solution means settling for “one size fits all” and little transparency into how findings are determined. It’s also tricky to get vendor findings from their siloed web console to the teams that need to use them: admins, compliance, and executives.

As Snowflake continues its explosive growth, we on the security team have had to look for a third path- or make it.

In-house configuration validation

The advantages of doing our own configuration validation were clear:

  • Reduce costs: Azure Security Center, for example, charges around $14/month per server. For organizations with thousands of servers, compliance reports are a six figure line item.
  • Tailored to our needs: we don’t have one kind of cloud environment- we have many and they’re all unique in their own special way. A development account needs to have certain controls in place but not the same ones as Gov cloud infrastructure.
  • High fidelity: the process of creating our own validations revealed unexpected gaps in some areas and other areas where exclusions needed to be contextually applied. Noisy reports perpetuate risk while accurate findings drives remediation.

And there’s another slightly scary but very exciting advantage to this approach. Doing configuration validation brings us into contact with raw data, not just findings in a report. This opens opportunities for data-driven risk reduction that go way beyond benchmark validation.

Converting compliance controls to SQL

Our in-house configuration validation starts with the basics. I’ve written about CIS Benchmarks as a great place to start with compliance controls that are required for a number of certifications but also improve cloud security posture and reduce breach risk.

To illustrate the process of converting compliance controls to SQL, I’ll walk you through our workflow step by step. Note that these steps assume that you have the raw data in a database. Configuration data collection is a parallel effort that will be covered in an upcoming post.

Step 1: Understand the control

Here is where the security team brings its domain expertise to the table. Security teams don’t need to be SQL experts if they’re collaborating with a data team but they do need to understand the control that they’re validating. Having to read and understand the CIS benchmarks made us better at using them for reducing risk.

In this example, the control that we’re converting to SQL is CIS AWS 1.22: Ensure IAM policies that allow full “*:*” administrative privileges are not created. The CIS for AWS PDF includes a detailed description:

Basically this control is telling us to not be lazy with our IAM policies. Rather than creating one policy that allows users to do anything anywhere, policies should be limited to well-scoped required permissions. Users can always be assigned multiple policies if needed, and if an AWS user is compromised then the impact to the organization will be limited.

Let’s not have permission policies that look like this

Step 2: Identify the data source

In the description for this CIS control, the audit process is described as a series of commands that leverage the AWS API. Since we want to have the raw data ourselves and then automate the analytics in a way that makes sense for our unique organization, we want to make a list of the APIs needed for validating this control.

It looks like we need the following AWS APIs:

  • aws iam list-policies
  • aws iam get-policy-version

If we have the active policy details for each policy, we’ll be able to check their values against the best practice recommendations.

Step 3: Get the data

Getting the data from each API endpoint requires looping through each of the organization’s accounts (a typical enterprise has hundreds) and calling the API endpoint for the account. When the data is returned, it needs to be loaded into a database table with a corresponding name.

The data collection code required for CIS validation will be released in the next release of SnowAlert and described in an upcoming post.

Once the raw configuration data is pulled into Snowflake, it looks like this:

Policy data pulled into Snowflake for analysis
Each JSON document specifies what the policy allows

Once the data is retrieved, the analytics party can begin.

Step 4: Translate recommendations to SQL

The CIS benchmark lays out the recommendation in plain English: IAM policies that have a statement with “Effect”: “Allow” with “Action”: “*” over “Resource”: “*” should be removed.

How would this be phrased in SQL? Here is where the collaboration between the security team and the data team shines. A data analyst with Snowflake experience would quickly arrive at the syntax for pulling the policies that violate the recommendation.

SELECT account_id, policy_arn
FROM "SNOWALERT"."DATA"."AWS_IAM_GET_POLICY_VERSION", lateral flatten( input => document:Statement ), latest_policies
effect = 'Allow'
action like '%"*"%'
OR action = '*'
resource like '%"*"%'
OR resource = '*'
AND policy_arn <> 'arn:aws:iam::aws:policy/AdministratorAccess'

Not all CIS rules require Snowflake’s “lateral flatten” feature but it demonstrates the JSON handling that Snowflake brings to the table. It would be hard to analyze cloud configurations in a database that doesn’t have strong JSON support.

You should approach this project as a collaboration between the data and security teams. But since the CIS Benchmarks are widely adopted and validating them is a top job for many compliance teams, we’re going to open-source our SQL implementation of CIS for AWS, Azure and GCP. That should save you at least 80% of the rule creation effort.

Step 5: Schedule validation

Rules defined as SQL queries need to be executed on a recurring basis. For most organizations, daily validation is sufficient but the scheduled runs can be as frequent as needed.

In our case, we use SnowAlert to run the queries on a regular basis but Snowflake Tasks or a script with a cronjob can be used as well.

Scheduled Violation Query in SnowAlert

Step 6: Review results

You’re going to start getting interesting results shortly after you’ve scheduled the validation queries. To make these findings actionable for all the stakeholders involved (security, compliance, infrastructure, executives), create a BI dashboard that represents color coded results:

  • Green: configurations meet the requirement
  • Red: configurations don’t meet the requirement
  • Yellow: requirement not yet implemented

In the same BI dashboard, enable users to drill down into why a specific control is failing by adding a table with violation details that can be filtered by rule name or other properties such as account ID.

Using validation results to drive action

Once findings are available in a BI visualization, use them to drive coordinated remediation action. Your compliance team can lead the enforcement effort but the reports should be seen as “self service”- no middlemen required. If certain findings need to be tuned based on your environment, you now have a flexible tool where these tweaks can be applied in SQL.

Too often there are dangerous misconfigurations hiding in a sea of flagged issues. Use your access to the raw configuration data to achieve a report that everyone believes in and metrics that everyone buys into. High fidelity automated compliance reports are a great way to save time and money on security control validation.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Omer Singer

I believe that better data is the key to better security. These are personal posts that don’t represent Snowflake.