Skip to content

Commit

Permalink
Merge branch 'master' into feat/config-update-script
Browse files Browse the repository at this point in the history
  • Loading branch information
ajoaugustine authored Jul 17, 2024
2 parents 4a9bcbd + d4e2651 commit 756f75e
Show file tree
Hide file tree
Showing 15 changed files with 649 additions and 42 deletions.
52 changes: 52 additions & 0 deletions doc/dbbackup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# TL;DR

This script facilitates the management of database backup and restore within the Gen3 environment. It can establish policies, service accounts, roles, and S3 buckets. Depending on the command provided, it can initiate a database dump, perform a restore, migrate databases to a new RDS instance on Aurora, or clone databases to an RDS Aurora instance.

## Usage

```sh
gen3 dbbackup [dump|restore|va-dump|create-sa|migrate-to-aurora|copy-to-aurora]
```

### Commands

#### dump

Initiates a database dump and pushes it to an S3 bucket, creating the essential AWS resources if they are absent. The dump operation is intended to be executed from the namespace/commons that requires the backup.

```sh
gen3 dbbackup dump
```

#### restore

Initiates a database restore from an S3 bucket, creating the essential AWS resources if they are absent. The restore operation is meant to be executed in the target namespace where the backup needs to be restored.

```sh
gen3 dbbackup restore
```

#### create-sa

Creates the necessary service account and roles for DB copy.

```sh
gen3 dbbackup create-sa
```

#### migrate-to-aurora

Triggers a service account creation and a job to migrate a Gen3 commons to an AWS RDS Aurora instance.

```sh
gen3 dbbackup migrate-to-aurora
```

#### copy-to-aurora

Triggers a service account creation and a job to copy the databases Indexd, Sheepdog & Metadata to new databases within an RDS Aurora cluster from another namespace <source-namespace> in same RDS cluster.

```sh
gen3 dbbackup copy-to-aurora <source-namespace>
```

2 changes: 1 addition & 1 deletion files/scripts/ecr-access-job.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Trust policy (allows Acct2):
}
```

- Policy in the account (Acct2) that contains the DynamoDB table (created automatically by `kube-setup-ecr-access-job.sh`):
- Policy in the account (Acct2) that contains the DynamoDB table (created automatically by `kube-setup-ecr-access-cronjob.sh`):
```
{
"Version": "2012-10-17",
Expand Down
1 change: 1 addition & 0 deletions files/squid_whitelist/web_whitelist
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ clinicaltrials.gov
charts.bitnami.com
ctds-planx.atlassian.net
data.cityofchicago.org
data.stage.qdr.org
dataguids.org
api.login.yahoo.com
apt.kubernetes.io
Expand Down
102 changes: 81 additions & 21 deletions gen3/bin/dbbackup.sh
Original file line number Diff line number Diff line change
@@ -1,26 +1,28 @@
#!/bin/bash

####################################################################################################
# Script: dbdump.sh
# Script: dbbackup.sh
#
# Description:
# This script facilitates the management of database backups within the gen3 environment. It is
# equipped to establish policies, service accounts, roles, and S3 buckets. Depending on the
# command provided, it will either initiate a database dump or perform a restore.
# equipped to establish policies, service accounts, roles, and S3 buckets. Depending on the
# command provided, it will either initiate a database dump, perform a restore, migrate to Aurora,
# or copy to Aurora.
#
# Usage:
# gen3 dbbackup [dump|restore]
# gen3 dbbackup [dump|restore|va-dump|create-sa|migrate-to-aurora|copy-to-aurora <source_namespace>]
#
# dump - Initiates a database dump, creating the essential AWS resources if they are absent.
# The dump operation is intended to be executed from the namespace/commons that requires
# the backup.
# restore - Initiates a database restore, creating the essential AWS resources if they are absent.
# The restore operation is meant to be executed in the target namespace, where the backup
# needs to be restored.
# dump - Initiates a database dump, creating the essential AWS resources if they are absent.
# The dump operation is intended to be executed from the namespace/commons that requires
# the backup.
# restore - Initiates a database restore, creating the essential AWS resources if they are absent.
# The restore operation is meant to be executed in the target namespace, where the backup
# needs to be restored.
# va-dump - Runs a va-testing DB dump.
# create-sa - Creates the necessary service account and roles for DB copy.
# migrate-to-aurora - Triggers a service account creation and a job to migrate a Gen3 commons to an AWS RDS Aurora instance.
# copy-to-aurora - Triggers a service account creation and a job to copy the databases Indexd, Sheepdog & Metadata to new databases within an RDS Aurora cluster.
#
# Notes:
# This script extensively utilizes the AWS CLI and the gen3 CLI. Proper functioning demands a
# configured gen3 environment and the availability of the necessary CLI tools.
#
####################################################################################################

Expand Down Expand Up @@ -49,7 +51,6 @@ gen3_log_info "namespace: $namespace"
gen3_log_info "sa_name: $sa_name"
gen3_log_info "bucket_name: $bucket_name"


# Create an S3 access policy if it doesn't exist
create_policy() {
# Check if policy exists
Expand Down Expand Up @@ -87,7 +88,6 @@ EOM
fi
}


# Create or update the Service Account and its corresponding IAM Role
create_service_account_and_role() {
cluster_arn=$(kubectl config current-context)
Expand All @@ -101,7 +101,6 @@ create_service_account_and_role() {
gen3_log_info "oidc_url: $oidc_url"
gen3_log_info "role_name: $role_name"


cat > ${trust_policy} <<EOF
{
"Version": "2012-10-17",
Expand Down Expand Up @@ -161,13 +160,11 @@ create_s3_bucket() {
fi
}


# Function to trigger the database backup job
db_dump() {
gen3 job run psql-db-prep-dump
}


# Function to trigger the database backup restore job
db_restore() {
gen3 job run psql-db-prep-restore
Expand All @@ -177,8 +174,55 @@ va_testing_db_dump() {
gen3 job run psql-db-dump-va-testing
}

# Function to create the psql-db-copy service account and roles
create_db_copy_service_account() {
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: psql-db-copy-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: psql-db-copy-role
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: psql-db-copy-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: psql-db-copy-role
subjects:
- kind: ServiceAccount
name: psql-db-copy-sa
namespace: ${namespace}
EOF
}

# Function to run the Aurora migration job
migrate_to_aurora() {
create_db_copy_service_account
sleep 30
gen3 job run psql-db-aurora-migration
}

# Function to run the Aurora copy job
copy_to_aurora() {
create_db_copy_service_account
sleep 30
gen3 job run psql-db-copy-aurora SOURCE_NAMESPACE "$1"
}

# main function to determine whether dump or restore
# main function to determine whether dump, restore, or create service account
main() {
case "$1" in
dump)
Expand All @@ -202,11 +246,27 @@ main() {
create_s3_bucket
va_testing_db_dump
;;
create-sa)
gen3_log_info "Creating service account for DB copy..."
create_db_copy_service_account
;;
migrate-to-aurora)
gen3_log_info "Migrating Gen3 commons to Aurora..."
migrate_to_aurora
;;
copy-to-aurora)
if [ -z "$2" ]; then
echo "Usage: $0 copy-to-aurora <source_namespace>"
exit 1
fi
gen3_log_info "Copying databases within Aurora..."
copy_to_aurora "$2"
;;
*)
echo "Invalid command. Usage: gen3 dbbackup [dump|restore|va-dump]"
echo "Invalid command. Usage: gen3 dbbackup [dump|restore|va-dump|create-sa|migrate-to-aurora|copy-to-aurora <source_namespace>]"
return 1
;;
esac
}

main "$1"
main "$@"
12 changes: 12 additions & 0 deletions gen3/bin/kube-setup-argo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,18 @@ EOF
aws iam put-role-policy --role-name ${roleName} --policy-name ${internalBucketPolicy} --policy-document file://$internalBucketPolicyFile || true
fi

# Create a secret for the slack webhook
alarm_webhook=$(g3kubectl get cm global -o yaml | yq .data.slack_alarm_webhook | tr -d '"')

if [ -z "$alarm_webhook" ]; then
gen3_log_err "Please set a slack_alarm_webhook in the 'global' configmap. This is needed to alert for failed workflows."
exit 1
fi

g3kubectl -n argo delete secret slack-webhook-secret
g3kubectl -n argo create secret generic "slack-webhook-secret" --from-literal=SLACK_WEBHOOK_URL=$alarm_webhook


## if new bucket then do the following
# Get the aws keys from secret
# Create and attach lifecycle policy
Expand Down
22 changes: 21 additions & 1 deletion gen3/bin/kube-setup-hatchery.sh
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,8 @@ $assumeImageBuilderRolePolicyBlock
"Action": [
"batch:DescribeComputeEnvironments",
"batch:CreateComputeEnvironment",
"batch:UpdateComputeEnvironment",
"batch:ListJobs",
"batch:CreateJobQueue",
"batch:TagResource",
"iam:ListPolicies",
Expand All @@ -197,10 +199,28 @@ $assumeImageBuilderRolePolicyBlock
"iam:CreateInstanceProfile",
"iam:AddRoleToInstanceProfile",
"iam:PassRole",
"s3:CreateBucket"
"kms:CreateKey",
"kms:CreateAlias",
"kms:DescribeKey",
"kms:TagResource",
"s3:CreateBucket",
"s3:PutEncryptionConfiguration",
"s3:PutBucketPolicy",
"s3:PutLifecycleConfiguration"
],
"Resource": "*"
},
{
"Sid": "CreateSlrForNextflowBatchWorkspaces",
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "arn:aws:iam::*:role/aws-service-role/batch.amazonaws.com/*",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "batch.amazonaws.com"
}
}
},
{
"Sid": "PassRoleForNextflowBatchWorkspaces",
"Effect": "Allow",
Expand Down
2 changes: 1 addition & 1 deletion kube/services/argo-events/workflows/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ data:
purpose: workflow
limits:
resources:
cpu: 2000
cpu: 4000
providerRef:
name: workflow-WORKFLOW_NAME
# Kill nodes after 30 days to ensure they stay up to date
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ spec:
valueFrom:
configMapKeyRef:
name: global
key: slack_webhook
key: slack_alarm_webhook

command: ["/bin/bash"]
args:
Expand Down
14 changes: 14 additions & 0 deletions kube/services/argo/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,20 @@ controller:
workflowDefaults:
spec:
archiveLogs: true
onExit: alert-on-timeout
templates:
- name: alert-on-timeout
script:
image: quay.io/cdis/amazonlinux-debug:master
command: [sh]
envFrom:
- secretRef:
name: slack-webhook-secret
source: |
failure_reason=$(echo {{workflow.failures}} | jq 'any(.[]; .message == "Step exceeded its deadline")' )
if [ "$failure_reason" ]; then
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"ALERT: Workflow {{workflow.name}} has been killed due to timeout\"}" "$SLACK_WEBHOOK_URL"
fi
# -- [Node selector]
nodeSelector:
Expand Down
Loading

0 comments on commit 756f75e

Please sign in to comment.