Skip to content

Commit

Permalink
Merge branch 'master' of github.com:uc-cdis/cloud-automation into fea…
Browse files Browse the repository at this point in the history
…t/ecr-access-job
  • Loading branch information
paulineribeyre committed Feb 21, 2024
2 parents 637c39e + ff88b7b commit 1babb53
Show file tree
Hide file tree
Showing 48 changed files with 892 additions and 71 deletions.
2 changes: 1 addition & 1 deletion Docker/jenkins/Jenkins-CI-Worker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM jenkins/inbound-agent:jdk11
FROM jenkins/inbound-agent:jdk21

USER root

Expand Down
2 changes: 1 addition & 1 deletion Docker/jenkins/Jenkins-Worker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM jenkins/inbound-agent:jdk11
FROM jenkins/inbound-agent:jdk21

USER root

Expand Down
2 changes: 1 addition & 1 deletion Docker/jenkins/Jenkins/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM jenkins/jenkins:2.415-jdk11
FROM jenkins/jenkins:2.426.3-lts-jdk21

USER root

Expand Down
2 changes: 1 addition & 1 deletion Docker/jenkins/Jenkins2/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM jenkins/jenkins:2.415-jdk11
FROM jenkins/jenkins:2.426.3-lts-jdk21

USER root

Expand Down
68 changes: 68 additions & 0 deletions doc/s3-to-google-replication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# S3 to Google Cloud Storage Replication Pipeline

This document will guide you through setting up a replication pipeline from AWS S3 to Google Cloud Storage (GCS) using VPC Service Controls and Storage Transfer Service. This solution is compliant with security best practices, ensuring that data transfer between AWS S3 and GCS is secure and efficient.

## Table of Contents

- [Prerequisites](#prerequisites)
- [Step-by-step Guide](#step-by-step-guide)
- [Setup VPC Service Controls](#setup-vpc-service-controls)
- [Initiate Storage Transfer Service](#initiate-storage-transfer-service)
- [Compliance Benefits](#compliance-benefits)
- [Cost Benefit Analysis](#cost-benefit-analysis)

## Prerequisites

1. **AWS account** with access to the S3 bucket.
2. **Google Cloud account** with permissions to create buckets in GCS and set up VPC Service Controls and Storage Transfer Service.
3. Familiarity with AWS IAM for S3 bucket access and Google Cloud IAM for GCS access.

## Step-by-step Guide

### Setup VPC Service Controls

1. **Access the VPC Service Controls** in the Google Cloud Console.
2. **Create a new VPC Service Control perimeter**.
- Name the perimeter and choose the desired region.
- Add the necessary GCP services. Ensure to include `storagetransfer.googleapis.com` for Storage Transfer Service.
3. **Setup VPC Service Control Policy** to allow connections from AWS.
- Use the [documentation](https://cloud.google.com/vpc-service-controls/docs/set-up) to help set up.

### Initiate Storage Transfer Service

1. Navigate to **Storage Transfer Service** in the Google Cloud Console.
2. Click **Create Transfer Job**.
3. **Select Source**: Choose Amazon S3 bucket and provide the necessary details.
- Ensure to have necessary permissions for the S3 bucket in AWS IAM.
4. **Select Destination**: Choose your GCS bucket.
5. **Schedule & Advanced Settings**: Set the frequency and conditions for the transfer. Consider setting up notifications for job completion or errors.
6. **Review & Create**: Confirm the details and initiate the transfer job.

## Compliance Benefits

Setting up a secure replication pipeline from AWS S3 to GCS using VPC Service Controls and Storage Transfer Service offers the following compliance benefits:

1. **Data Security**: The VPC Service Controls provide an additional layer of security by ensuring that the transferred data remains within a defined security perimeter, reducing potential data leak risks.
2. **Auditability**: Both AWS and GCS offer logging and monitoring tools that can provide audit trails for data transfer. This can help in meeting regulatory compliance requirements.
3. **Consistent Data Replication**: The Storage Transfer Service ensures that data in GCS is up to date with the source S3 bucket, which is essential for consistent backup and disaster recovery strategies.

## Cost Benefit Analysis

**Benefits**:

1. **Data Redundancy**: Having data stored in multiple cloud providers can be a part of a robust disaster recovery strategy.
2. **Flexibility**: Replicating data to GCS provides flexibility in multi-cloud strategies, enabling seamless migrations or usage of GCP tools and services.
3. **Security**: Utilizing VPC Service Controls strengthens the security posture.

**Costs**:

1. **Data Transfer Costs**: Both AWS and Google Cloud might charge for data transfer. It's crucial to analyze the cost, especially for large data transfers.
2. **Storage Costs**: Storing data redundantly incurs additional storage costs in GCS.

**Analysis**:

To stay in compliance, we require multiple copies of our data in separate datacenters or clouds. After our security audit, we found the important of not keeping data in a single cloud. It may be expensive to transfer data from AWS to GCP and to store it in 2 clouds simultaniously, but if we need to, then this solution could be an easy way to achieve compliance.

---

Please note that while this guide is based on the provided Google Cloud documentation, it's crucial to refer to the original [documentation](https://cloud.google.com/architecture/transferring-data-from-amazon-s3-to-cloud-storage-using-vpc-service-controls-and-storage-transfer-service) for the most accurate and up-to-date information.
1 change: 0 additions & 1 deletion files/scripts/ci-env-pool-reset.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ source "${GEN3_HOME}/gen3/gen3setup.sh"
cat - > jenkins-envs-services.txt <<EOF
jenkins-blood
jenkins-brain
jenkins-dcp
jenkins-genomel
jenkins-niaid
EOF
Expand Down
15 changes: 10 additions & 5 deletions files/scripts/healdata/heal-cedar-data-ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,14 @@
"Questionnaire/Survey/Assessment - unvalidated instrument": "Questionnaire/Survey/Assessment",
"Cis Male": "Male",
"Cis Female": "Female",
"Trans Male": "Female-to-male transsexual",
"Trans Female": "Male-to-female transsexual",
"Agender, Non-binary, gender non-conforming": "Other",
"Gender Queer": "Other",
"Intersex": "Intersexed",
"Trans Male": "Transgender man/trans man/female-to-male (FTM)",
"Female-to-male transsexual": "Transgender man/trans man/female-to-male (FTM)",
"Trans Female": "Transgender woman/trans woman/male-to-female (MTF)",
"Male-to-female transsexual": "Transgender woman/trans woman/male-to-female (MTF)",
"Agender, Non-binary, gender non-conforming": "Genderqueer/gender nonconforming/neither exclusively male nor female",
"Gender Queer": "Genderqueer/gender nonconforming/neither exclusively male nor female",
"Intersex": "Genderqueer/gender nonconforming/neither exclusively male nor female",
"Intersexed": "Genderqueer/gender nonconforming/neither exclusively male nor female",
"Buisness Development": "Business Development"
}

Expand Down Expand Up @@ -85,6 +88,8 @@ def update_filter_metadata(metadata_to_update):
]
# Add any new tags from advSearchFilters
for f in metadata_to_update["advSearchFilters"]:
if f["key"] == "Gender":
continue
tag = {"name": f["value"], "category": f["key"]}
if tag not in tags:
tags.append(tag)
Expand Down
4 changes: 3 additions & 1 deletion files/squid_whitelist/web_whitelist
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ ctds-planx.atlassian.net
data.cityofchicago.org
dataguids.org
api.login.yahoo.com
api.snapcraft.io
apt.kubernetes.io
argoproj.github.io
archive.cloudera.com
Expand All @@ -34,6 +33,7 @@ cernvm.cern.ch
charts.bitnami.com
charts.helm.sh
cloud.r-project.org
coredns.github.io
coreos.com
covidstoplight.org
cpan.mirrors.tds.net
Expand Down Expand Up @@ -77,6 +77,7 @@ golang.org
gopkg.in
grafana.com
grafana.github.io
helm.elastic.co
http.us.debian.org
ifconfig.io
ingress.coralogix.us
Expand Down Expand Up @@ -145,6 +146,7 @@ repos.sensuapp.org
repo.vmware.com
repository.cloudera.com
resource.metadatacenter.org
rmq.n3c.ncats.io
rules.emergingthreats.net
rweb.quant.ku.edu
sa-update.dnswl.org
Expand Down
3 changes: 3 additions & 0 deletions files/squid_whitelist/web_wildcard_whitelist
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,12 @@
.sks-keyservers.net
.slack.com
.slack-msgs.com
.snapcraft.io
.snapcraftcontent.com
.sourceforge.net
.southsideweekly.com
.theanvil.io
.tigera.io
.twistlock.com
.ubuntu.com
.ucsc.edu
Expand Down
38 changes: 33 additions & 5 deletions flavors/squid_auto/squid_running_on_docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ DISTRO=$(awk -F '[="]*' '/^NAME/ { print $2 }' < /etc/os-release)
WORK_USER="ubuntu"
if [[ $DISTRO == "Amazon Linux" ]]; then
WORK_USER="ec2-user"
if [[ $(awk -F '[="]*' '/^VERSION_ID/ { print $2 }' < /etc/os-release) == "2023" ]]; then
DISTRO="al2023"
fi
fi
HOME_FOLDER="/home/${WORK_USER}"
SUB_FOLDER="${HOME_FOLDER}/cloud-automation"
Expand Down Expand Up @@ -60,6 +63,8 @@ fi
function install_basics(){
if [[ $DISTRO == "Ubuntu" ]]; then
apt -y install atop
elif [[ $DISTRO == "al2023" ]]; then
sudo dnf install cronie nc -y
fi
}

Expand All @@ -69,10 +74,18 @@ function install_docker(){
# Docker
###############################################################
# Install docker from sources
curl -fsSL ${DOCKER_DOWNLOAD_URL}/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] ${DOCKER_DOWNLOAD_URL} $(lsb_release -cs) stable"
apt update
apt install -y docker-ce
if [[ $DISTRO == "Ubuntu" ]]; then
curl -fsSL ${DOCKER_DOWNLOAD_URL}/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] ${DOCKER_DOWNLOAD_URL} $(lsb_release -cs) stable"
apt update
apt install -y docker-ce
else
sudo yum update -y
sudo yum install -y docker
# Start and enable Docker service
sudo systemctl start docker
sudo systemctl enable docker
fi
mkdir -p /etc/docker
cp ${SUB_FOLDER}/flavors/squid_auto/startup_configs/docker-daemon.json /etc/docker/daemon.json
chmod -R 0644 /etc/docker
Expand Down Expand Up @@ -201,8 +214,10 @@ function install_awslogs {
if [[ $DISTRO == "Ubuntu" ]]; then
wget ${AWSLOGS_DOWNLOAD_URL} -O amazon-cloudwatch-agent.deb
dpkg -i -E ./amazon-cloudwatch-agent.deb
else
elif [[ $DISTRO == "Amazon Linux" ]]; then
sudo yum install amazon-cloudwatch-agent nc -y
elif [[ $DISTRO == "al2023" ]]; then
sudo dnf install amazon-cloudwatch-agent -y
fi

# Configure the AWS logs
Expand Down Expand Up @@ -292,6 +307,19 @@ function main(){
--volume ${SQUID_CACHE_DIR}:${SQUID_CACHE_DIR} \
--volume ${SQUID_CONFIG_DIR}:${SQUID_CONFIG_DIR}:ro \
quay.io/cdis/squid:${SQUID_IMAGE_TAG}

max_attempts=10
attempt_counter=0
while [ $attempt_counter -lt $max_attempts ]; do
#((attempt_counter++))
sleep 10
if [[ -z "$(sudo lsof -i:3128)" ]]; then
echo "Squid not healthy, restarting."
docker restart squid
else
echo "Squid healthy"
fi
done
}

main
3 changes: 2 additions & 1 deletion gen3/bin/create-es7-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ else
--vpc-options "SubnetIds=${subnet_ids[*]},SecurityGroupIds=${security_groups[*]}" \
--access-policies "$access_policies" \
--encryption-at-rest-options "Enabled=true,KmsKeyId=$kms_key_id"\
--node-to-node-encryption-options "Enabled=true"
> /dev/null 2>&1

# Wait for the new cluster to be available
Expand All @@ -60,4 +61,4 @@ else
if [ $retry_count -eq $max_retries ]; then
echo "New cluster creation may still be in progress. Please check the AWS Management Console for the status."
fi
fi
fi
2 changes: 1 addition & 1 deletion gen3/bin/kube-roll-all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ if [[ "$GEN3_ROLL_FAST" != "true" ]]; then
else
gen3 kube-setup-autoscaler &
fi
gen3 kube-setup-kube-dns-autoscaler &
#gen3 kube-setup-kube-dns-autoscaler &
gen3 kube-setup-metrics deploy || true
gen3 kube-setup-tiller || true
#
Expand Down
13 changes: 12 additions & 1 deletion gen3/bin/kube-setup-argo-wrapper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,16 @@ if [[ -z "$GEN3_SOURCE_ONLY" ]]; then
gen3 roll argo-wrapper
g3kubectl apply -f "${GEN3_HOME}/kube/services/argo-wrapper/argo-wrapper-service.yaml"

if g3k_manifest_lookup .argo.argo_server_service_url 2> /dev/null; then
argo_server_service_url=$(g3k_manifest_lookup .argo.argo_server_service_url)

export ARGO_HOST=${argo_server_service_url}
export ARGO_NAMESPACE=argo-$(gen3 db namespace)
envsubst <"${GEN3_HOME}/kube/services/argo-wrapper/config.ini" > /tmp/config.ini

g3kubectl delete configmap argo-wrapper-namespace-config
g3kubectl create configmap argo-wrapper-namespace-config --from-file /tmp/config.ini
fi

gen3_log_info "the argo-wrapper service has been deployed onto the kubernetes cluster"
fi
fi
55 changes: 55 additions & 0 deletions gen3/bin/kube-setup-cedar-wrapper.sh
Original file line number Diff line number Diff line change
@@ -1,13 +1,68 @@
source "${GEN3_HOME}/gen3/lib/utils.sh"
gen3_load "gen3/lib/kube-setup-init"

create_client_and_secret() {
local hostname=$(gen3 api hostname)
local client_name="cedar_ingest_client"
gen3_log_info "kube-setup-cedar-wrapper" "creating fence ${client_name} for $hostname"
# delete any existing fence cedar clients
g3kubectl exec -c fence $(gen3 pod fence) -- fence-create client-delete --client ${client_name} > /dev/null 2>&1
local secrets=$(g3kubectl exec -c fence $(gen3 pod fence) -- fence-create client-create --client ${client_name} --grant-types client_credentials | tail -1)
# secrets looks like ('CLIENT_ID', 'CLIENT_SECRET')
if [[ ! $secrets =~ (\'(.*)\', \'(.*)\') ]]; then
gen3_log_err "kube-setup-cedar-wrapper" "Failed generating ${client_name}"
return 1
else
local client_id="${BASH_REMATCH[2]}"
local client_secret="${BASH_REMATCH[3]}"
gen3_log_info "Create cedar-client secrets file"
cat - <<EOM
{
"client_id": "$client_id",
"client_secret": "$client_secret"
}
EOM
fi
}

setup_creds() {
# check if new cedar client and secrets are needed"
local cedar_creds_file="cedar_client_credentials.json"

if gen3 secrets decode cedar-g3auto ${cedar_creds_file} > /dev/null 2>&1; then
local have_cedar_client_secret="1"
else
gen3_log_info "No g3auto cedar-client key present in secret"
fi

local client_name="cedar_ingest_client"
local client_list=$(g3kubectl exec -c fence $(gen3 pod fence) -- fence-create client-list)
local client_count=$(echo "$client_list=" | grep -cE "'name':.*'${client_name}'")
gen3_log_info "CEDAR client count = ${client_count}"

if [[ -z $have_cedar_client_secret ]] || [[ ${client_count} -lt 1 ]]; then
gen3_log_info "Creating new cedar-ingest client and secret"
local credsPath="$(gen3_secrets_folder)/g3auto/cedar/${cedar_creds_file}"
if ! create_client_and_secret > $credsPath; then
gen3_log_err "Failed to setup cedar-ingest secret"
return 1
else
gen3 secrets sync
gen3 job run usersync
fi
fi
}

[[ -z "$GEN3_ROLL_ALL" ]] && gen3 kube-setup-secrets

if ! g3kubectl get secrets/cedar-g3auto > /dev/null 2>&1; then
gen3_log_err "No cedar-g3auto secret, not rolling CEDAR wrapper"
return 1
fi

gen3_log_info "Checking cedar-client creds"
setup_creds

if ! gen3 secrets decode cedar-g3auto cedar_api_key.txt > /dev/null 2>&1; then
gen3_log_err "No CEDAR api key present in cedar-g3auto secret, not rolling CEDAR wrapper"
return 1
Expand Down
24 changes: 23 additions & 1 deletion gen3/bin/kube-setup-ingress.sh
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,28 @@ gen3_ingress_setup_role() {
}
}
},
{
"Effect": "Allow",
"Action": [
"elasticloadbalancing:AddTags"
],
"Resource": [
"arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
"arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
"arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"
],
"Condition": {
"StringEquals": {
"elasticloadbalancing:CreateAction": [
"CreateTargetGroup",
"CreateLoadBalancer"
]
},
"Null": {
"aws:RequestTag/elbv2.k8s.aws/cluster": "false"
}
}
},
{
"Effect": "Allow",
"Action": [
Expand Down Expand Up @@ -329,4 +351,4 @@ g3kubectl apply -f "${GEN3_HOME}/kube/services/revproxy/revproxy-service.yaml"
envsubst <$scriptDir/ingress.yaml | g3kubectl apply -f -
if [ "$deployWaf" = true ]; then
gen3_ingress_setup_waf
fi
fi
Loading

0 comments on commit 1babb53

Please sign in to comment.