OCC Gen3 x Krumware

Solutions

Possible Issues for spinning up VPC

If you get an error about --create-bucket-configuration
please change line 116 in file cloud-automation/gen3/bin/workon.sh and replace this line with the following
gen3_aws_run aws s3api create-bucket --acl private --bucket "$GEN3_S3_BUCKET" --region $(aws configure get $GEN3_PROFILE.region)
If the error says there is an issue with the backend for S3, please do the following
After the gen3 workon command, do gen3 cd and ls. You should be able to see a file named backend.tfvars
Open that file and add the following line in the file
profile = <>
Once done, re-run the gen3 workon command

How to Update AWS and Kubectl

Windows:
Open Command Prompt or PowerShell:
Press Win + R, type cmd or powershell, and hit Enter.
Update kubectl:
Run kubectl version --client to check the current version.
If outdated, download the latest version from the Kubernetes website.
Replace the old kubectl.exe with the new one in your PATH environment variable.
Update AWS CLI:
Run aws --version to check the current version.
If outdated, download the latest version from the AWS website.
Install it and update your PATH if necessary.
macOS:
Open Terminal:
Use Spotlight (Cmd + Space) and search for "Terminal" to open it.
Update kubectl:
Run kubectl version --client to check the current version.
Use Homebrew to update: brew upgrade kubectl or specify a version like brew install kubectl@.
Update AWS CLI:
Run aws --version to check the current version.
Use Homebrew to update: brew upgrade awscli.
Ubuntu (Linux):
Open Terminal:
Use Ctrl + Alt + T or search for "Terminal" in your applications.
Update kubectl:
Run kubectl version --client to check the current version.
Use curl or wget to download the latest version from the Kubernetes GitHub releases.
Install the downloaded binary and update your PATH if necessary.
Update AWS CLI:
Run aws --version to check the current version.
Use pip to update AWS CLI: sudo pip install --upgrade awscli.

**## Variables/Setup needed before starting the process

An AWS account number
AWS IAM User named helm_gen3_user with administrator privileges. Please create AWS Secrets (AWS Secert key and ID) for this user.
A common one-word name for the VM, VPC, and EKS that can be used to name these AWS resources
Think of an AWS profile name that you want to use on your local machine to identify the AWS Account profile (Numbers 4 and 5 can be the same)
Name for your Gen3 Data commons including any subdomain (i.e. abc.com, def.org)
DNS setup is used to assign your website name (i.e. abc.com, def.org) which can be used to route to your Gen3 Data Commons
ACM Certificate for your main domain which is decided in step 5
Google OAuth Client ID and Client secret for Gen3 Login
Indexd prefix for Gen3 data commons (You can get more information here)
VPC CIDR ID, CIDR for your commons to live on.
Make sure your terraform is on version 0.11.15 (You can use tfenv to deploy this version of terraform https://github.com/tfutils/tfenv )
VPC ID and VPC CIDR for a new VPC or use any existing VPC from your AWS account that can be used to access Kubernetes
IP addresses that can control the VM**

Note:

For ask number 2, Create a new user in AWS with administrator privileges. Create a new user named helm_gen3_user with admin privileges in the new AWS Account. Make sure that the user can execute CLI commands. Once the user is created, please create the Access key from the security credentials tab.
For 3 and 4 here I would be using demo-gen3-aws as a name for the AWS profile, gen3-admin as the VM name, VPC demo-gen3, and EKS: demo-gen3_eks.
For 5 and 6, I would use demo-gen3.occ-pla.net as the domain for this demonstration
For point 7, I would get the *.occ-pla.net certificate in AWS Certificate Manager by requesting a public certificate in the new AWS account. Example value is arn:aws:acm:us-east-1:XXXXXXXXX:certificate/asdaf-asfafa-asfca-af-asfada-da-v
For 8, Gen3 uses any public or private OAuth client to authenticate the user. Generally, we use Google. We need the Client ID and Client secret for OAuth login. Information can be found here on how to set it up.
I would be using dg.D3M0 for the data guid for this example for point 9.

Objective

Embarking on the implementation of Gen3 in AWS using Gen3 Helm, this document is collaboratively shared with the krum.io team. The goal is to facilitate joint efforts between OCC and the Krum team. The process of spinning up Gen3 is structured into four distinct steps for clarity and effective collaboration.

The product team of Gen3 has written the steps in the documentation here, https://github.com/uc-cdis/cloud-automation/blob/master/doc/csoc-free-commons-steps.md, we will be using this as a base for this process

Step 1: Set up VM allowing access to AWS Account and Gen3 cloud-automation (Presented by Krum.io)

This is a preliminary step, we can also rename this as a setup stage. This step provides essential configuration details for initializing Gen3.

Step 2: Spin up VPC (Presented by Krum.io)

Step 2 involves utilizing Terraform and Gen3 scripts to spin up a virtual private cloud and databases. This process enables the establishment of the necessary infrastructure for your Gen3 environment.

Step 3: Spin up EKS (Presented by Krum.io)

In this step, EKS and additional configurations are initiated, paving the way for the subsequent deployment of Gen3 microservices. This sets the foundation for effectively establishing and utilizing Gen3 microservices within the system.

Step 4: Helm charts to implement Gen3 (Presented by Open Commons Consortium)

The final step in the Gen3 setup involves deploying a set of fundamental Gen3 microservices using Helm charts. This step specifically addresses the configuration of Helm values and guides initiating a basic data commons, completing the overall Gen3 implementation process.

Step 3: Spin up EKS

In this step, we would execute the commands to set up EKS for our Gen3 with EC2 and other components using cloud automation scripts and Gen3 commands. This step would require the following variables out of the variable/setup information mentioned above

Name of AWS profile (defined in step 1 above)
Name of VPC (Point 3 in Variables asked)
VPC CIDR (Point 10 in Variables asked)
VPC ID got in the output of Step 2

Note: Make sure your terraform is on version 0.11.5 to use this script. To create EKS and resources, I’d be performing the following steps with demo-gen3_eks for EKS name and demo-gen3-aws AWS profile

Initialize the base module to run terraform to spin up EKS using the gen3 workon command. Gen3’s workon command sets up a terraform workspace folder for a given AWS profile and infrastructure type (for example, commons EKS, VPN server, data bucket). You can use this command anytime to access the Terraform backend. Ensure your ~/.bashrc changes made in step 1 are present and that ~/.bashrc is sourced. Generally, the command follows the syntax of gen3 workon <> <>_eks but for this instance, I would be executing the following command,

gen3 workon demo-gen3-aws demo-gen3_eks

Note: make sure that the name of Kubernetes cluster ends with _eks while executing workon command

Once terraform is successfully initialized, Give the following command to access the terraform files of the EKS by following command

gen3 cd

The ls command will show the list of EKS config files with an extension of tfvars and many more

Open config.tfvars file, and the variables vpc_name and some others would be populated automatically to the values that we need. There are some other variables whose value should be changed like below,

Variable name	Value
ec2_keyname	demo-gen3
peering_cidr (point 12 in Variables asked)	172.X.X.0/16
peering_vpc_id (point 12 in Variables asked)	vpc-XXXXXXXXXX
sns_topic_arn	Remove lines with these variables
cidrs_to_route_to_gw	[“”]
eks_version	1.29
jupyter_asg_desired_capacity	1
jupyter_asg_max_size	10
jupyter_asg_min_size	1
workflow_asg_desired_capacity	3
workflow_asg_max_size	50
workflow_asg_min_size	3
deploy_workflow	true

Once the variables are set up, you use the following commands to terraform plan and apply them to compile and run terraform

gen3 tfplan

This command is equivalent to the terraform plan which validates the configuration and creates an execution plan, which lets you preview the changes that Terraform plans to make to your infrastructure

gen3 tfapply

This command is equivalent to terraform apply which command executes the actions proposed in a Terraform plan to create gen3 infrastructure.

If you run into any errors, please fine-tune the values in the config.tfvars file and re-run the

gen3 tfplan

and

gen3 tfapply

command

This terraform script can take about 20-40 min to execute depending on the network and system, once the terraform runs successfully, you will see the following output. Please note down all the values under Outputs with you for future reference.
Once your EKS is deployed successfully, give the following command to update the kubeconfig file for your cluster. This will help you to manage the EKS cluster from your system. The command is aws eks --region <> update-kubeconfig --name <>_eks --profile <>. For the EKS spun up in this step, the command would be

aws eks --region us-east-1 update-kubeconfig --name demo-gen3_eks --profile demo-gen3-aws

If the following error is observed, follow steps A - D below

Error: Error applying plan:

  

1 error occurred:

        * module.eks.null_resource.config_setup: Error running command 'bash test-validate_output_EKS/init_cluster.sh': exit status 1. Output: error: flag needs an argument: 'f' in -f

See 'kubectl apply --help' for usage.

error: error validating "test-validate_output_EKS/aws-auth-cm.yaml": error validating data: failed to download openapi: Get "[https://CD857734E2F2BC3EB155D0F2C0F34E45.yl4.us-east-1.eks.amazonaws.com/openapi/v2?timeout=32s](https://cd857734e2f2bc3eb155d0f2c0f34e45.yl4.us-east-1.eks.amazonaws.com/openapi/v2?timeout=32s)": getting credentials: decoding stdout: no kind "ExecCredential" is registered for version "client.authentication.k8s.io/v1alpha1" in scheme "pkg/client/auth/exec/exec.go:62"; if you choose to ignore these errors, turn validation off with --validate=false

a.) Update your kubeconfig as shown in the previous step:

aws eks update-kubeconfig --region <cluster-region> --name <cluster-name>

b.) Take the cluster ARN and use it as your context in the following command:

kubectl config set-context <cluster-arn>

c.) From your EKS Terraform config directory, run the following command:

kubectl apply -f <outputs-directory>/aws-auth-cm.yaml

d.) Validate that the nodes have joined the cluster:

kubectl get nodes

The output should look like this:

❯ kubectl get nodes

NAME                           STATUS   ROLES    AGE   VERSION

ip-10-12-114-33.ec2.internal   Ready    <none>   27m   v1.29.0-eks-5e0fdde

ip-10-12-148-93.ec2.internal   Ready    <none>   27m   v1.29.0-eks-5e0fdde

gen3/test-validate_es:

Step 4: Helm charts to implement Gen3

This step would determine the steps to implement Gen3 microservices to run Gen3 services on the EKS we created. This document would be using scripts and steps from the gen3-helm git repository. This step would require the following variables out of the variable/setup information mentioned below

Name of EKS form Outputs of Step 3

Each microservice within the Gen3 Helm charts can have its values.yaml file located in the gen3-helm/helm directory of the Git repository. Helm deployment can either be done via configuring charts for individual microservices, or it can utilize the master Gen3 charts. Customizing values in the values.yaml file enables you to tailor microservice configurations, influencing the deployment or update processes of the Helm charts.

While there are various methods to deploy Gen3 using Helm charts, this document focuses on presenting the comprehensive master Gen3 Helm chart that includes all necessary microservices, using AWS external secrets for databases and credentials.

To update the Gen3 chart, edit the values.yaml file at gen3-helm/helm/gen3. Customize values for specific microservices by naming them. Consult the README file in the chart folder to find chart configuration variables.

Gen3 Helm Prerequisites

The first step for our helm installation of Gen3 is to prepare our deployment to authenticate appropriately with AWS Secrets Manager.
If you already have a basic helm installed on your computer, execute this script - external_secrets.sh from repository support-scripts-gen3 to install the external_values helm chart. This will allow us to use external secrets in helm charts. Below are the input variables required for running the script.

<>: AWS account number in which the commons would spin up
<>: AWS region in which EKS is spun up.
<>: Name for an IAM Policy
<>: Name of IAM user who would be using the above IAM policy to spin up helm charts, the name of secrets_user created in cloud-automation
<>: AWS profile from the local/VM which can access the AWS Account number given, if the profile value is default

Note (To-do): We require the AWS Secret Key and Secret Access Key for the IAM user mentioned in this step. These credentials will be utilized in the next step.

Our infrastructure deployment from cloud-automation will have provisioned a “secrets_user” in IAM which is used in the previous step. We want to provide security credentials for this user ahead of time to enable our deployment. This will allow us to authenticate with secrets-manager without maintaining AWS creds in the values file.

kubectl create secret generic prelim-aws-config \  
    --from-literal=access-key=’<<aws-access-key-id>>’ \  
    --from-literal=secret-access-key=’<<aws-secret-access-key>>’

The next step is to configure our Authentication service to utilize our desired Identity Provider (Google):
Visit and modify the fence-config-template.yaml
Replace <<your-website-URL>>
Replace <<your-google-client-id>>
Replace <<your-google-client-secret>>
Store the updated configuration in AWS Secrets Manager
Name the secret <<your-vpc-name>>-fence-config

Gen3 Helm Installation

The very first thing we must do is add or clone the Gen3 Helm repository. This is how we package up all the components that make up Gen3 and make them accessible to the public.

git clone [https://github.com/occ-data/gen3-helm  
](https://github.com/occ-data/gen3-helm)cd gen3-helm/helm

Once you have your repo added, you can install it with the command

helm upgrade --install dev ./gen3 -f /path/to/updated/values.yaml

Enable access to our deployments over HTTPS by setting up a load balancer that references a valid, signed certificate for your domain. This can be done via two options:
(CDTS Official) Gen3 facilitates this process via cloud-automation with the following command:

  gen3 kube-setup-ingress

This command will perform a helm installation of the alb-ingress-controller after pre-provisioning IAM roles/policies for the ingress controller, ACL rules for a WAF, etc.  
  
Your deployment should now be available over HTTPS.  
  
Note:  We can perform this with more standard tooling such as cert-manager and an ingress controller with a more robust solution/community behind it. (Explained in step b)

(Alternative to step a) Install an ingress controller (NGINX) to your cluster to provision an ALB. Automate certificate signing requests by installing cert-manager and preparing issuers for your desired Certificate Authority.
Install NGINX. This will provision an ALB resource in your AWS account

helm upgrade --install ingress-nginx ingress-nginx \  
      --repo https://kubernetes.github.io/ingress-nginx \  
      --namespace ingress-nginx \  
      --create-namespace

Install cert-manager. This will automate certificate signing requests

  helm repo add jetstack https://charts.jetstack.io \  
     --force-update

  helm install   cert-manager jetstack/cert-manager \  
     --namespace cert-manager \  
     --create-namespace \  
     --version v1.14.4 \  
     --set installCRDs=true

Create your issuers. For our example, we will use LetsEncrypt
Staging Certificate (useful for debugging certificate signing requests)

kubectl apply -f - <<EOF

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

  name: letsencrypt-staging

spec:

  acme:

    email: email@example.com

    preferredChain: a    privateKeySecretRef:

      name: letsencrypt-staging

    server: https://acme-staging-v02.api.letsencrypt.org/directory

    solvers:

      - http01:

          ingress:

            class: nginx

EOF

Production Certificates (use this once your staging certificate has been successfully signed)

kubectl apply -f - <<EOF

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

  name: letsencrypt-prod

spec:

  acme:

    email: email@example.com

    preferredChain: ''

    privateKeySecretRef:

      name: letsencrypt-prod

    server: https://acme-v02.api.letsencrypt.org/directory

    solvers:

      - http01:

          ingress:

            class: nginx

EOF

Running these scripts will allow helm charts to update ingress configuration for revproxy via Helm chart values

(Note: If you are following the example helm charts provided in the demonstration, The configuration below is already provided in the chart)

For example: in the YAML file for values, Values.revproxy.ingress has the class name nginx with annotations
cert-manager.io/cluster-issuer:"letsencrypt-prod"

This annotation specifies the cluster issuer which we previously deployed

acme.cert-manager.io/http01-edit-in-place: "true"

revproxy:

enabled: true

ingress:

enabled: true

className: "nginx"

annotations:

acme.cert-manager.io/http01-edit-in-place: "true"

cert-manager.io/cluster-issuer: "letsencrypt-prod"

hosts:

- host: gen3-demo.cloud.krum.io

paths:

- path: /

pathType: Prefix

tls:

- secretName: cm-gen3-certs-prod

hosts:

- gen3-demo.cloud.krum.io

Upon finishing this configuration, you can proceed to deploy the Gen3 Helm charts. We offer a simplified Helm chart with configurations for running a basic Gen3 version. You can find the helm chart in the helm chart - gen3_values.yaml within the support-scripts-gen3. There are some variables in the charts which are explained in the readme file here.

Details about each microservice deployed in the sample Helm chart are provided after completing step 6. You can customize the Helm chart according to your specific requirements. For more information about available microservices and their configuration, refer to CONFIGURATION.md, which contains detailed instructions and a comprehensive range of configuration options for each service.

After setting up your Helm environment, either by following the provided example or using the values.yaml file in the gen3-helm directory on your system (located at demo-gen3/gen3-helm/helm/gen3/values.yaml for the Gen3 Helm code), you can execute the helm upgrade commands to initiate the Helm process.

Note: To deploy Gen3 Helm beyond this document, ensure that you have the necessary prerequisites installed and configured. The following section outlines a set of microservices and instructions for updating the chart to run a basic version of Gen3. For detailed configuration of any unlisted microservices, consult the configuration documentation for further insights.

After making the required updates in the Helm charts or if you wish to apply overrides, you can achieve this by supplying the values.yaml file. (Note: Before executing this command, ensure you are in the demo-gen3/gen3-helm/helm/gen3/ folder. You can confirm your current directory using the pwd command.)

helm upgrade --install dev gen3/gen3 -f ./values.yaml

If you want to provide overrides you can do so by passing in one, or several values.yaml files. For example, if you want to pass in user.yaml and fence-config (NB! New format, check out sample files in this folder)

helm upgrade --install dev gen3/gen3  -f values.yaml -f fence-config.yaml -f user.yaml

Note: Each time you modify the values.yaml file, you need to execute the helm upgrade command to ensure that the changes are applied to both the Helm charts and Kubernetes.

Before we start examining the pre-configured file in detail, let's take a look at the list of sub-gen3 helm charts for gen3 microservices that are utilized to set up a basic Gen3 data commons as outlined in this script.

global (this is a helm chart variable, not a gen3 microservice)
arborist
aws-es-proxy
etl
fence
guppy
indexd
peregrine
portal
revproxy
sheepdog

Global

Global configuration to execute helm charts is defined in the global section. This part includes variables needed to connect to various AWS resources like the Postgres database, AWS Certificate, Secrets Manager, etc. It also involves tasks such as fetching the data dictionary, defining GitOps processes, and hosting configurations.

The first variable in global is aws in which the secret for AWS credentials is given. The AWS credentials are stored in Kubernetes secret prelim-aws-config

aws:

   enabled: true

   useLocalSecret:

     enabled: true

     localSecretName: prelim-aws-config

Note: All database credentials are stored in AWS Secrets manager with the naming convention of {{environment}}-{{service}}-creds, these secrets are then fetched and stored into the Kubernetes secrets because of the flag externalSecrets set to yes. As shown in the configuration below, postgres db and external secrets are enabled.

postgres:

  dbCreate: true

externalSecrets:

  deploy: true

The host configuration is also part of the global chart,

Dev: The value "yes" or "no" in the development environment setting determines whether Postgres and Elasticsearch should be deployed or not.
Environment: name of the VPC/environment
Hostname: URL of gen3 data commons without http:// or https:// (point 5 in Variables asked)
revproxyArn: ARN of the certificate for the domain/website, (point 7 in Variables asked)
dictionaryUrl: S3 URL from which data dictionary will be fetched

dev: false

environment: <<your-vpc-name>>

hostname: <<your-website-URL>>

revproxyArn: arn:aws:acm:us-east-1:XXXXXXXXXX:certificate/XXXXXXXXX # <<your-website-aws-certificate-arn>>

dictionaryUrl: https://s3.amazonaws.com/XXXXXXXX/XXXXX/XXXXXXXX/XXXX.json  # <<your-data-dictionary-URL>>

portalApp: gitops

publicDataSets: true

Alongside these, the other values are also required for the global helm chart for gen3, the description of these values is provided here

tierAccessLevel: libre

tierAccessLimit: "1000"

netPolicy: true

dispatcherJobNum: 10

ddEnabled: false

manifestGlobalExtraValues:

  fence_url: https://<<your-website-URL>>/user

Arborist

Arborist is a key part of the Gen3 stack, managing access control through policies, roles, and resources. It organizes resources hierarchically and grants permissions based on defined roles. In the Gen3 stack, the arborist works closely with the fence, which handles user authentication and issues JWT tokens containing authorization policies. Other services can check user authorization by sending the JWT to an arborist, ensuring secure access control without storing user credentials. The arborist is an open-source microservice and can be located on its git code base here

In helm charts, the arborist helm chart can be deployed either as an entity or part of gen3 helm charts. The list of values for helm chart arborist can be found here

To use Arborist in Helm, the basic variables are to enable the service and give database credentials. Since we have database secrets stored in AWS Secrets Manager, the only thing needed in our version of the helm chart is to enable arborist.

arborist:

  enabled: true

Aws-es-proxy

aws-es-proxy is a small web server application sitting between your HTTP client (browser, curl, etc...) and Amazon Elasticsearch service. This service is responsible for connecting the elastic search to gen3.

Aws-es-proxy can be deployed as an individual chart or as a part of the gen3 helm chart. In the example helm chart, aws-es-proxy would need to be enabled, and elastic search endpoint should be provided

aws-es-proxy:

enabled: true

esEndpoint: <>

externalSecrets:

awsCreds: <>-aws-es-proxy-creds

ETL

ETL helm sub chart in Gen3 helm contains etl configuration to tables and fields to ETL to ElasticSearch. The field mapping to get values from the graph model into elastic search with transformation. As shown below, the ETL helm chart contains esEndpoint to local cluster (this is localized due to aws-es-proxy) and etlMapping to show mappings to graph model and transformation of values (i.e gender and race)

etl:

  enabled: true

  esEndpoint: elasticsearch.default.svc.cluster.local

  etlMapping:

    mappings:

      - name: dev_case

        doc_type: case

        type: aggregator

        root: case

        props:

          - name: submitter_id

          - name: project_id

        flatten_props:

          - path: demographics

            props:

              - name: gender

                value_mappings:

                  - female: F

                  - male: M

              - name: race

                value_mappings:

                  - american indian or alaskan native: Indian

              - name: ethnicity

              - name: year_of_birth

        aggregated_props:

          - name: _samples_count

            path: samples

            fn: count

Fence

Fence microservice provides an authentication and authorization framework for all Gen3 services & resources. The fence separates protected resources from the outside world, allowing only trusted entities to enter. The fence is an open-source Gen3 microservice whose git code base can be found here

This document would only focus on deploying the basic configuration of the fence. The helm deployment configuration for the fence can be found in gen3-helm github. This link consists of the table with details about all the variables defined for further development.

In the basic configuration of the fence helm chart which is shown below, the image of the microservice is defined. The image needs to be updated periodically as microservices are constantly changing. Alongside the external secret defined for fence db with the naming convention of {{environment}}-{{service}}-creds, and fence-config secret, Other externalSecrets are given such as

createK8sFenceConfigSecret: this will create the Helm fence-config secret
createK8sJwtKeysSecret: this will create the Helm fence-jwt-keys secret
createK8sGoogleAppSecrets: this will create the Helm fence-google-app-creds-secret and fence-google-storage-creds-secret secrets

fence:

  enabled: true

  image:

    tag: 2024.04

  externalSecrets:

  fenceConfig: <<your-vpc-name>>-fence-config

    createK8sFenceConfigSecret: false

    createK8sJwtKeysSecret: true

    createK8sGoogleAppSecrets: true

  USER_YAML: |

User YAML Here

Guppy

Guppy is a server that supports GraphQL queries on data from Elasticsearch. Guppy can be set up either as a standalone helm chart or as part of the gen3-helm setup. In the basic script we're showcasing here, the outline of the configuration details for Guppy is below. The esEndpoint for elastic search is given alongside the indices, make sure that the name of indices matches the name of mapping in etlMapping

guppy:

  enabled: true

  image:

    tag: 2024.04 

  esEndpoint: http://elasticsearch.default.svc.cluster.local:9200

  indices:

  - index: dev_case

    type: case

Indexd

Indexd is a hash-based data indexing and tracking service providing 128-bit globally unique identifiers. It is designed to be accessed via a REST-like API or a client. It supports distributed resolution with a central resolver talking to other Indexd servers.

Indexd can be a standalone helm chart or part of Gen3 helm configuration. In the provided script, createK8sServiceCredsSecret is given to create the indexd-service-creds secret. This service also uses external secret defined for db with the naming convention of {{environment}}-{{service}}-creds.

indexd:

  enabled: true

  image:

    tag: 2024.04 

  externalSecrets:

    createK8sServiceCredsSecret: true

  defaultPrefix: <<your-index-prefix>>

Peregrine

Peregrine provides a query interface to get insights into data in Gen3 Commons. It mostly works with the Postgres database to perform queries. This microservice can be implemented as a stand-alone helm chart or part of the Gen3 Helm chart. The code snippet below shows how peregrine is included in the helm chart used for demonstration today. This script has arboristUrl which allows the peregrine to talk to the arborist.

peregrine:

enabled: true

image:

tag: 2024.04

arboristUrl: http://arborist-service

Portal

The portal microservice is responsible for spinning up the UI front of the website and also supports some basic interaction with Gen3. You can customize the logo, the colors, what elements are to be shown in the UI, and many more using the gitops configuration in the portal microservice. The portal can be an individual helm chart or part of the Gen3 helm chart. The code snippet below shows the configuration for the portal. In gitops, you can give the base64 encoded code for the favicon image <> and logo <> for Gen3 Commons. The json portion consists of gitops json which can have the format described in the link here

portal:

  enabled: true

  image:

    tag: 2023.04   

  gitops:

    favicon: <<base-64-of-favicon-image>>

    logo: <<base-64-of-logo-image>>

    json: |

JSON HERE

Revproxy

Revproxy is a core service to a commons that handles networking within the Kubernetes cluster. Revproxy is an NGINX container managing cluster endpoints. Revproxy is a vital nginx container that stores cluster endpoint data, requiring a setup of endpoints to enable traffic routing and normal operation. Revproxy can be an independent helm deployment or part of the Gen3 helm chart as shown below with a snippet for the demonstration helm chart.

The configuration here has nginx className, multiple annotations, hosts, and tls configurations to import the certificate from the AWS certificate manager to spin up load balancers (ALB- application load balancer) to run the Gen3 Commons

acme.cert-manager.io/http01-edit-in-place: "true"

cert-manager.io/cluster-issuer: "letsencrypt-prod"


revproxy:

  enabled: true

  ingress:

    enabled: true

    className: "nginx"

    annotations:

      acme.cert-manager.io/http01-edit-in-place: "true"

      cert-manager.io/cluster-issuer: "letsencrypt-prod"

    hosts:

    - host: <<your-website-URL>>

      paths:

        - path: /

          pathType: Prefix

    tls:

      - secretName: cm-gen3-certs-prod        # pragma: allowlist secret

        hosts:

          - <<your-website-URL>>

Sheepdog

Sheepdog is a core service that handles data submission. Data gets submitted to the commons, using the dictionary as a schema, which is reflected within the sheepdog database. Sheepdog has the configuration to connect to fenceUrl to check user permissions for data submission, also it has volumeMounts for volumes to mount to the container which takes configurations from the wsgi.py file

  

sheepdog:

  enabled: true

  image:

    tag: 2024.04 

  fenceUrl: https://<<your-website-URL>>/user

  volumeMounts:

  - name: "config-volume"

    readOnly: true

    mountPath: "/var/www/sheepdog/settings.py"

    subPath: "wsgi.py"

Step 1: Set up VM allowing access to AWS Account and Gen3 cloud-automation

Before we do Gen3 setup, a bunch of permissions need to be set up for access to AWS from local/any virtual system which can be used to spin up, monitor, and maintain all the resources. We are creating a VM that would be responsible for running the operations for Gen3.

OCC and the Kurmware team have created a Git repository supporting scripts that would be useful for spinning up Gen3 commons. clone the support-scripts-gen3 git repository into this newly created folder on your local. Make sure you are inside the demo-gen3 folder before executing the command

git clone --quiet [https://github.com/occ-data/support-scripts-gen3.git](https://github.com/occ-data/support-scripts-gen3.git)

cd gen3-admin-vm

After you've cloned the Git repository, go into the gen3-admin-vm folder. There, you'll find the Terraform module to deploy an adminVM in your AWS account. Make sure to follow these steps to spin up the VM using Terraform.

Setting up variables in config.tf file: You need to set up variables in CM
prefix = “the name of your VM” (VM name from Point 3 in Variables asked)
ssh_allowed_ips = [“list of IP addresses allowed to SSH in the VM”] (Point 13 in Variables asked)
aws_cli_profile = “name of the aws profile on your local machine” (Point 4 in Variables asked, demo-gen3-aws for this example)
aws_region = “aws region in which you want your VM in”
Once the variables are configured, you can run the Terraform scripts to implement the VM.
The first command is to initialize terraform from the root directory of gen3-admin-vm, This command will initiate the terraform with the required module configurations.

terraform init

Once the terraform is initialized, we run the plan command to determine the status of current resources or the list of new resources to be spun up.

terraform plan -out “plan”

Check the output on the terraform plan on your screen and if everything looks okay, you can apply the terraform plan to spin up the resources for the VM

terraform apply plan

The terraform script spins up a VPC for adminVM, with Ubuntu EC2 (Configurable with options) and IAM roles with SSH keys for ssh_allowed_ips IP address. The terraform also executes a provisioner script that will install the necessary packages and clone needed git repository required for running Gen3 using Helm.

Note: For this demonstration, we will be using a fork of gen3-helm and cloud-automation created by OCC and the Krumware team. Once the changes made for this are merged into the master, we will update this documentation accordingly.

The output for Terraform is stored in the outputs folder on your local in a text file (named <>-info.txt) with information about VPC that will be used in further steps to connect the VM to Gen3. The value of VM Public IP in the text file is the IP address for your VM. The folder also has SSH keys for the VM.

Once Terraform has successfully run, you need to give the following command from your local to connect to the VM (assuming you are in the support-scripts-gen3 folder)

ssh -o IdentitiesOnly=yes -i outputs/<<prefix>>.pem ubuntu@<<VM Public IP>>

Where outputs/<<prefix>>.pem is the location of the SSH key in the outputs folder

, and

<<VM Public IP>> is the IP address in <<prefix>>-info.txt

Step 2: Spin up VPC

In this step, we would execute the commands to set up VPC and squid instances for our Gen3 in EC2 as main components with other required AWS resources using cloud automation scripts and Gen3 commands. This step would require the following variables out of the variable/setup information mentioned above

Name of AWS profile (defined in step 1 above, Point 4 in Variables asked)
Name of VPC (Point 3 in Variables asked)
ACM certificate ARN (Point 7 in Variables asked)
OAuth Client ID and Client Secret (Point 8 in Variables asked)
SSH key to access Kubernetes (Created in Step 1 above)
VPC CIDR (Point 10 in Variables asked)
Another VPC ID and CIDR can be used to access the system (Point 12 in Variables asked)
Name of Indexd prefix for Gen3 data commons (Point 9 in Variables asked)
Name of the Gen3 Data commons (Point 5 in Variables asked)

Note: Make sure your terraform is on version 0.11.5 to use this script. To create VPC and resources, I’d be performing the following steps with demo-gen3 VPC name and demo-gen3-aws AWS profile

Initialize the base module to run terraform to spin up VPC using the gen3 workon command. Gen3’s workon command sets up a terraform workspace folder for a given AWS profile and infrastructure type (for example, commons VPC, VPN server, data bucket). Ensure your ~/.bashrc changes made in step 1 are present and that ~/.bashrc is sourced. You can use this command anytime to access the Terraform backend. Generally, the command follows the syntax of gen3 workon <> <> but for this instance, I would be executing the following command,

gen3 workon demo-gen3-aws demo-gen3

Once the terraform is successfully initialized, You will see the message showing Terraform is successfully initialized

Note: Please refer to this section of Solutions if you face any challenge in getting a successful output

Give the following command to access the terraform files of the VPC by following the command

gen3 cd

& then

ls

The ls command will show the list of VPC config files with an extension of tfvars and many more

Open config.tfvars and change the following variables. The explanation for each variable is explained here

Variable name	Value
vpc_name (point 3 in Variables asked)	demo-gen3
vpc_cidr_block (point 10 in Variables asked)	172.168.X.X/16
aws_cert_name (point 7 in Variables asked)	arn:aws:ACM:us-east-1:XXXXXXXXX:certificate/asdaf-asfafa-asfca-af-asfada-da-v
csoc_managed	false
csoc_account_id	Remove lines with these variables
peering_cidr (point 12 in Variables asked)	172.168.X.0/16
peering_vpc_id (point 12 in Variables asked)	vpc-XXXXXXXXXX
squid-nlb-endpointservice-name	Remove lines with these variables
deploy_single_proxy	false
indexd_prefix (point 9 in Variables asked)	dg.XXXX
hostname (point 5 in Variables asked)	demo-gen3.occ-pla.net
kube_ssh_key (the ssh key was created in step 1)	ssh-rsa AJCI629-23rncbhzyca0irl23riaisk7301bhxtakmcirhkngwGHIUbj7ulIHugjpq3i973gqu hbfkncc
google_client_id	XXXXXXXX.apps.googleusercontent.com
google_client_secret	Hklihg-tkjkj-468bl-76
config_folder	demo-gen3
fence_engine_version	13.7
sheepdog_engine_version	13.7
indexd_engine_version	13.7
gdcapi_indexd_password	If empty, Create a random 32-digit string for each one of these by either using cat /dev/urandom \| base64 \| head -c 32 (Make sure this string has no special characters)
db_password_fence
db_password_gdcapi
db_password_peregrine
db_password_sheepdog
db_password_indexd
hmac_encryption_key

Once the variables are set up, you use the following commands to terraform plan and apply them to compile and run terraform

gen3 tfplan

This command is equivalent to terraform plan which validates the configuration and creates an execution plan, which lets you preview the changes that Terraform plans to make to your infrastructure

gen3 tfapply

This command is equivalent to terraform apply which command executes the actions proposed in a Terraform plan to create gen3 infrastructure.

If you run into any errors, please fine-tune the values in the config.tfvars file and re-run the

gen3 tfplan

and

gen3 tfapply

command

This terraform script can take about 20-40 min to execute depending on the network and system, once the terraform runs successfully, you will see the following output. Please note down all the values under Outputs with you for future reference.

https://docs.google.com/presentation/d/12tiPsxnK3ZkVY0F5JIQSFOBpN4YFew-9IYsTi4MEcg8/edit#slide=id.g2c23222fc41_0_1050

https://docs.google.com/document/d/16MI-WW7i-HXK3SQ9zUoqFqzHduSDqjmwwrNfgzFdvAQ/edit#heading=h.p5qx3zz6mp07

Assumption

Assuming you have some basic knowledge and tools ready:

AWS Basics: You know about basic AWS services like VPC (Virtual Private Cloud), EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), IAM (Identity and Access Management), and EKS (Elastic Kubernetes Service).
AWS CLI: You're familiar with the AWS Command Line Interface (CLI) and how to use it to interact with AWS services from your terminal. Make sure you have an updated version of aws
Terraform: You understand how to use Terraform, a tool for automating infrastructure provisioning, to manage AWS resources. You should have Terraform version 0.11.15
AWS Account: You have an AWS account with admin permissions to create and manage resources.
Helm: You have Helm installed on your system and know the basics of using it for deploying and managing applications on Kubernetes.
Kubernetes Knowledge: You know how to work with Kubernetes, including deploying applications and troubleshooting issues. Make sure you have an updated version of kubectl
Git Basics: You are familiar with basic Git commands for version control.
Python Installed: Your system has Python version 3.8 or later installed.

Given these assumptions, you're ready to work with Gen3 in a development or production environment. This foundation will help you deploy, manage, and troubleshoot applications effectively across Gen3 environments.

https://dry-run.occ-pla.net/login