Continuum AI is now public. Try out the first confidential LLM platform!
Blog
Felix Schuster
Confidential computing is a powerful technology and tool. It can be used to protect a large variety of workloads in a range of settings. However, it doesn't address all cybersecurity problems and not all confidential-computing solutions are a good fit for every setting.
In this post, I'll give a brief introduction to what confidential computing can do and discuss what I call the "three levels of confidential computing". If you're interested in learning more about the basics of confidential computing, we have a whitepaper for that.
The three levels of confidential computing, where the green box symbolizes a confidential-computing environment (CEE), which is a secure enclave or a confidential VM:
From a data security perspective, confidential computing can shield workloads against the infrastructure they are running on. It is an effective tool to keep infrastructure-based threats like malicious co-tenants, cloud admins, and datacenter employees out. (In essence, all the threats that prevent people from moving sensitive workloads to the cloud.) In contrast, it doesn't help at all with protecting your "front door". For example, if your app has a vulnerability in its login form, confidential computing won't help. An attacker can still exploit the vulnerability.
Ok, so do all confidential-computing solutions protect against all the infrastructure-based threats? No 🙂 To illustrate, what different solutions provide, I like to dissect the space into three levels. Let's take a closer look...
The simplest thing confidential computing can do is to protect key management systems (KMS). This is what I call level 1. For example, by running HashiCorp Vault inside a confidential-computing environment (CCE, a secure enclave or a confidential VM), you can get strong protection for your cryptographic keys against the infrastructure-based threats mentioned above.
In many cases, this approach doesn't require changes to actual business logic. In the best case, you can "hot swap" the KMS with only minimally (or not at all) touching your running applications.
On the flipside, protecting a KMS with confidential computing doesn't improve the security of your actual workload. This means that your data is still flowing around unprotected and doesn't benefit from confidential computing.
The next step is to actually protect data during processing. For this, your workload needs to run within a CCE. This is what I call level 2. The approach here is to protect containers, services, or apps individually. Say, you'd like to protect a database. Running it in a CCE and ensuring data is securely stored on disk is all that is required to keep the aforementioned infrastructure-based threats out.
But what happens when you need to add something like a separate web frontend to your database? The first idea could be to just put the web frontend into another CEE. Would that solve the problem already? Not really. The web frontend would still need to make sure that the database is trustworthy before sharing data and vice versa. They would need to use remote attestation with a suitable policy for that.
It becomes more complicated once we think about the users of the web frontend. They would certainly like to verify that the system as a whole (i.e., the web frontend and the database) is "confidential" before sharing their sensitive data. For this, they would at least need to verify the web frontend and its policy for verifying the database. And of course we need to consider software updates...
Already quite complicated. But it gets even more complicated once we add more services and an orchestration framework like Kubernetes with access to sensitive data and control over services. If done in an ad-hoc manner, things become almost impossible to verify and reason about. It's highly likely that there will be gaps in the protection. This is where "level 3" comes into play.
In level 3, all parts of a distributed system (e.g., the web frontend, the DB, and the underlying Kubernetes, and the KMS) are protected using confidential computing in a structured way and are verified using remote attestation according to a given policy. The users of the distributed system don't need to verify each part individually. Instead, they only verify a central entity, which then in turn verifies all the other parts.
In a way, level 3 is about putting entire distributed systems into one giant virtual CCE. This giant virtual CCE is composed out of possibly many physical CCEs and may shrink and grow throughout its lifecycle. What is important is that from a user perspective, the physical CCEs are hidden and that there are now gaps.
I've been in confidential computing for 10 years now and have always focused on level 3. In fact, the very first confidential-computing system my former colleagues at Microsoft Research and I designed in 2013 was already a level 3 system. It was called VC3 for verifiable confidential cloud computing. (The corresponding paper has been cited >700 times since then.) It wrapped an entire MapReduce computation into one giant virtual CCE. To verify the integrity of the computation, the user only needed to verify a single node. Another instance of an early level 3 system was Opaque by Zheng et al., which provided end-to-end confidential Spark SQL.
Fast forward to 2023: at Edgeless Systems, our products focus on level 3 and specifically on Kubernetes. Our flagship product Constellation wraps an entire Kubernetes cluster into one giant virtual CCE. Everything is end-to-end protected, runtime encrypted, and verified. In classic level 3 manner, the user only needs to verify a single node to ensure the integrity of the cluster. This is done automatically by the user's CLI during the setup of the cluster. After setup, the user can essentially install any Kubernetes-ready application on Constellation, using standard tooling and without ever having to deal with anything specific to confidential computing. From what I can tell, Constellation is the most advanced level 3 solution out there and I'd like thank our amazing team at Edgeless Systems for bringing it to life.
Our product MarbleRun takes a similar, but different approach. It also creates a giant virtual CCE around a distributed app, but keeps infrastructure components like Kubernetes outside. This comes with certain trade-offs, which I'll discuss in a separate post.
The most important takeaway: different confidential-computing solutions serve different purposes. A solution that is designed to protect your keys (level 1) will not provide you with runtime protection for your data. A solution that is designed to protect single containers, nodes, or apps (level 2) won't magically give you end-to-end protection in a setting where multiple of those are combined. For this, a level 3 solution that is designed from the ground up for this setting is required.
It's important to note that most solutions are not level 3. For example, confidential containers on AKS or confidential nodes on GKE are still level 2, which means that you don't get end-to-end confidentiality for your workloads unless you are an expert and go to great lengths.
If you're interested in trying a true level 3 solution, check out Constellation. It's open source, supports Azure, AWS, and GCP, and can be set up in minutes within your cloud subscription.
Until next time ✌️
Author: Felix Schuster