Watch Security Data Lakes Branch Out in 2022

Omer Singer
5 min readJan 17, 2022

Having transformed threat detection and response, this year security data lakes will branch out to the rest of the enterprise security program.

About last year’s prediction

Last January I wrote that 2021 would be the year that security data lakes go mainstream. Two trends were colliding in a big way: enterprises getting comfortable with modern cloud data platforms like Snowflake, and modular detection and response solutions hitting the market. Together these would enable broad security data lake adoption among cybersecurity teams eager for better visibility and automation.

Did it happen? Both in terms of adoption and investment, security data lakes went mainstream in a big way. For example:

All of these examples involve threat detection and response happening on a security data lake. For the first time, this approach is succeeding for security teams outside the financial and technology megacorps.

It started with open XDR

For many security teams tasked with protecting enterprises with substantial cloud presence, SIEM has become a four letter word. Ballooning costs limit visibility, while weak analytics limit automation and dashboards designed for operational monitoring hold back data-driven strategies.

So it’s no surprise that modular threat detection and response solutions running on the customer’s security data lake, referred to as open XDR, were a hit in 2021. A recent analyst survey found detection and response to be by far the top use case for SIEM. Hence the shift from standalone SIEM to a highly automated detection and response solution on the existing data cloud deployment.

More use cases will follow

Modern cloud data platforms play a variety of roles across the enterprise. That will also be the case when they’re used as a security data lake. So from threat detection and response, watch for security data lake implementations to branch out to a multitude of security use cases.

Based on what some early adopters are doing, I expect that security data lakes will increasingly support posture management, asset inventory, identity and access, appsec, and more. Far beyond what SIEM covers, the typical security data lake will drive all of these use cases from a single source of truth in the cloud data platform.

This will be a big shift. Most security data lake implementations, even those that are a few years old, are dedicated to threat detection and response. What will drive the multi-purpose trend in 2021?

  • Existing data: As open XDR solutions collect more logs to the data platform, security teams will find that they have all or almost all the data they need to take an analytics-based approach to other use cases.
  • Awareness: When a GRC analyst hears from her friend in SecOps about all the success they’re having with the data platform, she will ask herself why the same approach wouldn’t work for SoD validation or SOX automation.
  • Relationships: The security data lake model is bringing together cybersecurity and data analytics teams, and these new relationships will enable data science and BI projects for cyber defense.
  • Platform capabilities and content: The “cloud wars” are raging and new features, reference architectures and native integrations will find their way to security orgs. Snowflake’s recent support for running Python, for example, is relevant for security use cases that need to go beyond SQL.
  • Ecosystem expansion: The new generation of cybersecurity companies deliver solutions with an understanding that customers want flexibility, not more silos.

What this means for the industry

A world where terabytes of relevant datasets are readily available for a POC is an exciting one for cybersecurity vendors. Imagine a trial where an algorithm is given access to the prospect’s last two years of activity logs, inventory and configurations. The power of working within a data cloud instead of over dozens of APIs should not be underestimated. A solution with the right data model could show value in minutes instead of weeks. That level of visibility means expert systems can automate tasks that are solely manual today.

Challenges will still exist, of course. There is a growing need for standardization that will make it easier for multiple solutions to make sense of security data loaded in a warehouse by a third party. Over time, innovative vendors will find ways to delight customers with insights across data collected by (not just from!) a variety of solutions. These providers may prove to be the growth leaders in the otherwise crowded cybersecurity industry.

What this means for security teams

The trend towards security data lakes supporting multiple use cases will give a welcome boost to security teams. CISOs have no shortage of projects planned for 2022 and beyond. This emerging model means not starting from scratch on each project, with existing datasets and visualizations increasingly reused across initiatives.

Expect faster progress and better insights in key areas such as IaaS permission rightsizing and supply chain security, each of which rely on multi-purpose datasets such as asset inventory, user 360 and activity logs. Also expect to see increased interaction between the cyber and data orgs as security teams adopt concepts such as star schemas and materialized views.

This opportunity will be limited to security teams that have their own security data lake. Any backend dedicated to a particular solution won’t support a data reuse strategy. With many security teams considering a new architecture in 2022, this is a worthwhile consideration to take into account. This year, the most successful security organizations won’t be the ones asking whether to use a security data lake- but how many projects and workloads can run on their security data lake.



Omer Singer

I believe that better data is the key to better security. These are personal posts that don’t represent Snowflake.