Edge Computing Brings Real-Time Alerts to Security Data Lakes

If you’re like me, you’ve heard some buzz about a concept called “edge computing” but are skeptical about another fluffy term in cybersecurity. Is this a trend worth our attention?

Read on for an application of edge computing that is addressing one of the key concerns for security data lakes.

The dark side of fast alerting on limited volumes

Machine Data Growth 50x Business Data

Security teams that budgeted for last year’s log volumes find that they can’t come close to covering what their systems are now generating. Like a couple on a queen size bed sharing a twin size blanket, not everyone can be covered and some things will be exposed.

In recent conversations with security organizations, I’ve heard stories of teams having to choose the one key source that they’ll get to collect while the rest of their log data is left behind. Some log sources such as network flow data can be super valuable in the event of a breach but would hog the whole license- so they’re not analyzed.

While security teams are adapting to this situation by shifting analytics from disk-based solutions like Splunk and Elasticsearch to cloud storage-based solutions like Snowflake, they have run into a trade-off when it comes to alert latency.

Volume vs. Latency Trade-off

Security data lakes such as those built on the Snowflake platform support streaming data but were not designed for real-time analytics. Streaming to the data platform also tends to be batched for price performance. As such, there is usually a delay of several minutes before the data is available and a latency of several more minutes before an alert query runs on schedule.

Forced to choose between volume and latency, security teams usually prefer to alert a little later rather than being blind to an entire data source. This is the right choice from a security perspective but is also an opportunity for innovation.

Alerting at the source

This is where innovation is happening in the form of anomaly detection with distributed technologies like federated learning. According to Hacker Noon,

In the traditional AI methods sensitive user data are sent to the servers where models are trained…

In contrast to the traditional AI methods, Federated Learning brings the models to the data source or client device for training and inferencing. The local copies of the model on the device eliminate network latencies and costs incurred due to continuously sharing data with the server.

Both real-time security alerts and operational monitoring can be enabled with this approach. By using “fast” edge computing technology to complement “deep” security data lake analytics, the volume vs. latency trade-off can be eliminated.

Edge computing in action

To fix this situation, the customer rolled out Edge Delta agents to servers where logs and metrics data would be pre-processed using federated learning and rule logic. Only in case of an anomaly or threat detection would the relevant server’s logs be collected to the SIEM. This meant faster alerting and that collected volumes would drop significantly.

Where data volume limits had caused blind spots, now all devices could be monitored.

What about the rest of the events, those not tied to anomaly detections? All event logs would be shipped in parallel to the customer’s security data lake. There they could be cost-effectively analyzed for compliance, threat hunting and incident response.

Security data lake ecosystem is evolving rapidly

I believe that better data is the key to better security. These are personal posts that don’t represent Snowflake.