Tutorial: “Encrypted LocalAI” and how to deploy your own LLM confidentially

What is LocalAI

LocalAI is a popular open source OpenAI alternative, compatible with OpenAI API specifications. Notably, it supports different types of open source AI models like Llama and Mistral, without the need for a GPU. It can be used for developing new AI-enabled applications using the OpenAI programming libraries or extending virtually any app which already integrates with it.

To accommodate production-scale applications, LocalAI can be deployed using scale-out replicas orchestrated through Docker or Kubernetes.

What is confidential computing?

Confidential computing is a technology that encrypts data in use, guarding against infrastructure threats in cloud applications. Together with encryption at rest and in transit it ensures data confidentiality, even against privileged individuals, and most importantly you can verify this remotely, for enhanced security assurance. For a deeper dive into confidential computing, read our whitepaper.

The concept of confidential AI - challenge and use cases

What is confidential AI? Simply put, it's the concept of employing confidential computing technology for verifiable protection of data throughout the AI lifecycle, including when the data and models are in use. Explore more in our blog post on "How confidential computing and AI fit together".

So, how can we make LocalAI confidential? Solution: Constellation

Constellation is a Kubernetes distribution that is designed for confidential computing. Any application that runs inside a Constellation cluster is runtime-encrypted and shielded from the infrastructure.

With Constellation, you can run LocalAI on the public cloud with the assurance that your inference data stays always encrypted and is inaccessible by the cloud provider or any attackers or third parties coming through the infrastructure.

This approach enables working with LocalAI or similar language model-based chatbot, ensuring data privacy and security, and facilitating the implementation of very large language models that would surpass the capacity of a typical local setup.

Result

Leveraging confidential computing with LocalAI on Constellation enables running in scalable, cost-efficient cloud environments with minimal maintenance, ensuring data privacy, and empowering organizations with efficient and secure language model processing tools.

Tutorial: how to deploy LocalAI on Constellation

LocalAI can be installed inside Kubernetes with a Helm chart.

You will require:

SSD storage class, or disable mmap to load the whole model in memory

Prerequisites and overview

In order to run LocalAI on Constellation, you will need:

A subscription and credits with one of the supported cloud providers (GCP, AWS, Azure)
A domain registrar to set up a domain name for your cluster
kubectl and helm installed on your machine

The process is composed of three key steps:

Setting up Constellation
Deploying infrastructure helm chart
Deploying LocalAI helm chart

For the sake of clarity, we have written the instructions below as someone using Azure with a GoDaddy registrar, however, this tutorial can be completed with any of the major cloud providers and a registrar of your choice. If you choose a different registrar, you will have to adapt the external-dns helm chart accordingly.

Set up Constellation

First, download and install the Constellation CLI.

Next, create the Constellation cluster. The process is described in detail in the Constellation docs.

constellation config generate azure

constellation iam create azure --region=westus --resourceGroup=constellTest --servicePrincipal=spTest --update-config

constellation create -y

constellation apply

export KUBECONFIG="$PWD/constellation-admin.conf"

You can now use the kubeconfig to query the cluster, e.g. with kubectl. The config ensures that the connection is confidential and terminates inside the correct cluster.

Deploy LocalAI

In the case of our example setup (Azure with GoDaddy) we've provided a Helm chart that installs and configures external-dns and ingress-nginx in the freshly created cluster.

To use the helm chart, you need to make a couple basic edits after cloning the repo:

Replace the values in .env with your GoDaddy API credentials and a fitting owner ID. The owner ID is used by external-dns to differentiate the DNS entries from different clusters at your DNS provider (GoDaddy). So you should use a different value for every cluster you use.
Replace your.domain oflocalAI.testing.your.domain in values.yaml with a domain you own. You can also change the subdomain if you want.
Create a file godaddy_creds with your godaddy API credentials:
apikey=
secretkey=

With your credentials in place, you can go ahead and run the necessary helm commands.

Run:

./install --install-infra -ns localai --hostname localai.your.domain

Installation complete

Your confidential LocalAI setup is now in place. When the process has completed you can go to your.domain and start firing queries to your 100% encrypted LocalAI.

“Encrypted LocalAI” and how to deploy your own LLM confidentially