Running Marquez on AWS
This guide helps you deploy and manage Marquez on AWS EKS.
PREREQUISITES
AWS EKS Cluster
To create an AWS EKS cluster, please follow the steps outlined in the AWS EKS documentation.
CONNECT TO AWS EKS CLUSTER
Make sure you have configured your AWS CLI, then create or update the kubeconfig file for your cluster:
$ aws eks --region <AWS-REGION> update-kubeconfig --name <AWS-EKS-CLUSTER>
Verify that the context has been switched:
$ kubectl config current-context
arn:aws:eks:<AWS-REGION>:<AWS-ACCOUNT-ID>:cluster/<AWS-EKS-CLUSTER>Using
kubectl
, verify that you can connect to your cluster:$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 1mNote: If you're having issues connecting to your cluster, please see Why can't I connect to my AWS EKS cluster?
AWS RDS
Next, create an AWS RDS instance as outlined in the AWS RDS documentation. This database will be used to store dataset, job, and run metadata collected as OpenLineage events via the Marquez HTTP API.
CREATE AWS RDS DATABASE
- Navigate to the AWS RDS page and create a PostgreSQL database, leaving the database template as Production.
- Use
marquez
as the database identifier and set the master username tomarquez
. - Choose a master password to use later in your Helm deployment (see password in
values.yaml
). - Leave public access to the database off.
- Choose the same VPC where your AWS EKS cluster resides.
- In a separate tab, navigate to the AWS EKS cluster page and make note of the security group attached to your cluster.
- Navigate back to the AWS RDS page and, in the security group section, add the AWS EKS cluster’s security group from step 6.
- Next, under the Additional Configuration tab, enter
marquez
as the initial database name. - Finally, select Create Database.
CONNECT TO AWS RDS DATABASE
Create a
marquez
namespace:$ kubectl create namespace marquez
Next, run the following command with your AWS RDS
host
,username
, andpassword
:kubectl run pgsql-postgresql-client --rm --tty -i --restart='Never' \
--namespace marquez \
--image docker.io/bitnami/postgresql:12-debian-10 \
--env="PGPASSWORD=<AWS-RDS-PASSWORD>" \
--command -- psql marquez --host <AWS-RDS-HOST> -U <AWS-RDS-USERNAME> -d marquez -p 5432
Deploy Marquez on AWS EKS
INSTALLING MARQUEZ
Get Marquez:
$ git clone git@github.com:MarquezProject/marquez.git && cd charts/marquez
Install Marquez:
helm upgrade --install marquez .
--set marquez.db.host=<AWS-RDS-HOST>
--set marquez.db.user=<AWS-RDS-USERNAME>
--set marquez.db.password=<AWS-RDS-PASSWORD>
--namespace marquez
--atomic
--waitNote: To avoid overriding deployment settings via the command line, update the marquez.db section of the Marquez Helm chart's
values.yaml
to include the AWS RDShost
,username
, andpassword
in your deployment.Verify all the pods have come up correctly:
$ kubectl get pods --namespace marquez
UNINSTALLING MARQUEZ
helm uninstall marquez --namespace marquez