Better Security Metrics with SLAs

Omer Singer
7 min readApr 21, 2020

The most powerful tool for creating actionable security metrics is the SLA. Unfortunately, most vendors don’t provide an SLA status feature. As a result, security teams fail to align cross-organizational efforts and continue to manually review risk findings.

Let’s change that! With live vendor data accessible on Snowflake Data Exchange, you can quickly create a layer of SLA insights on top of your security reports. Improved clarity and security posture are bound to follow.

Certain cloud configurations need urgent attention

Why SLAs provide clarity

Cloud infrastructure changes quickly and is inevitably affected by configuration drift. Misconfigurations put the company at risk so security teams use a variety of solutions to identify issues like public S3 buckets, open security groups and missing patches. These findings are typically spread across multiple vendors and might number in the hundreds or thousands.

The security team is responsible not just for identifying these issues but also prioritization: which ones should be addressed first. This triage process is time consuming and repetitive. How can data-driven automation help?

If all of the company’s security findings are available in one data platform, they can be analyzed by logic that encodes the security considerations of the subject matter experts. An SLA, in this case, is a SQL language definition of what the security team sees as most risky and urgent to fix. It also represents the amount of time that the organization believes is ok for a given issue to persist in the environment.

For example, consider an issue that should have been fixed within 30 days but has been around for 72 days and counting. If you had a list of findings that reflected security expectations then everyone could get aligned on risk management status. From executives to IT admins, it becomes clear where preventative efforts are succeeding or failing. This clarity is super constructive yet absent in most security programs.

Now is the time

We’re all stuck at home and many of us are being told to make do with the solutions we’ve already purchased. So there’s never been a better time to squeeze more value from your vendor data. In the absence of in-person meetings, tools that improve alignment are especially valuable. By the time the office reopens, you might find you don’t need as many meetings.

Also, Snowflake’s Data Exchange team is reaching out to vendors on behalf of customers and helping them to list their data on Snowflake’s marketplace. If you have a vendor whose findings you want in your SLA calculations, please reach out to me or your Snowflake rep.

Getting the data

In my previous post on self-service compliance dashboards, I showed how Lacework customers can request their compliance findings via Snowflake Data Exchange. Risk findings from other vendors including Vulcan Cyber and Obsidian Security are also available at no extra charge.

Lacework customers can request their data via Snowflake Data Exchange

Once a Data Exchange connection is established, historical and new compliance findings are automatically available within your Snowflake.

Setting Expectations

If you are not yet collaborating with your data team, this project can be a start. Your security engineers do not need to become SQL experts. To collaborate with your company’s data team, security engineers just need to specify their expectations in plain English.

For example, let’s start with an SLA formula that keeps things simple and relies only on Lacework data to determine severity and remediation expectations. You can define that critical violation types must be addressed within 60 days, high within 74 days, medium within 90 days, and everything else within 180 days. Baby steps.

Building the analytics

At this point, your friendly data team (the folks that use Snowflake every day) can take over. They’ll want to start with the full set of reports being shared by Lacework:

These reports are shared as a single VARIANT column, meaning they contain a JSON object with all the findings of each report. Snowflake makes it easy to extract these findings from the reports using the FLATTEN function:

Notice how we went from 2,772 results (the number of reports) to 357,356 results (the number of findings across all reports). However, many of these findings are actually “negative” results from a check that ran and found no problems. So let’s create a new view that will return only results where problems were identified.

Now that we have a complete set of cases where problems were found in our cloud configuration, it’s time to add a key ingredient that is missing in the raw data. For each type of issue (for example, an AWS S3 bucket where logging is not enabled) currently affecting our environment, we need to determine how long this has been happening.

SQL calculating the age for each finding

At this point we have a SQL view that we can query directly or in a BI tool that will reflect the age of each finding. However, we’re interested in automating as much of the analytics as possible. Let’s automatically calculate how much time each finding has before it becomes an SLA violation.

Defining SLA logic as code

Remember that the security team does not need to be proficient in SQL to get started with security analytics. Your strategy should be to establish communication and collaboration with the data analytics organization.

In this case, the SLA logic has been previously defined as an acceptable number of days depending on the finding’s Lacework severity. SLA logic can be more nuanced and can factor in additional data sets but for this example we’re keeping it simple.

In Snowflake, the security team’s requirements are encoded as shown in the screenshot below.

SLA definition and resulting days allowed for each finding type

Applying SLAs to active findings

Now that we a SQL view that returns the age of each active finding, and another view that returns the expected SLA for each finding type, we can bring the two pieces together. We measure current status by comparing the actual results to the expected values.

The important concept here is “Days Remaining”. This value should guide remediation teams to fix those issues that are the worst offenders, while trying to prevent SLA breaches from happening by remediating those where time is almost running out.

SLA health represented by days remaining to SLA breach

Visualizing SLA health

The “sla_health” SQL view will return a list of each compliance violation and how many days remain within its SLA for remediation. In order to make these results easily digestible for all stakeholders, we’ll push them into a shiny visualization.

The dashboards below are built in Sigma Computing, a cloud-native solution comparable to BI tools like Tableau and PowerBI. Building these dashboards took less than an hour using the SQL views from Snowflake — and they look great. They can summarize the remediation performance for one cloud account or a hundred, and your CISO can access them on her iPhone.

Live link (try it on mobile too): https://app.sigmacomputing.com/embed/1H08dE7OyBl68HXXJvet2F

Outcome

Reporting SLA status for security findings can provide a number of benefits for your security program, including:

  • Fewer issues to investigate: many findings will be remediated within SLA.
  • Clarity on persistent problems: these might warrant a dedicated meeting or a should be suppressed at the source.
  • More flexibility for audit teams: SLA status can determine which subset of findings must be reported as violations.
  • Better visibility for leadership: a single visualization can illustrate whether findings are being remediated on time.

Lacework customers can get their data in Snowflake through the Data Exchange at no extra cost. Hopefully we’ll see more security teams taking advantage of SLAs and posting about their experience so that our whole community benefits from this powerful tool.

--

--

Omer Singer

I believe that better data is the key to better security. These are personal posts that don’t represent Snowflake.