Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IBCDPE-935] Setting up declerative defintion of TF resources #11

Merged
merged 65 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
95d012f
Start the process of organizing the tf resources
BryanFauble Jul 9, 2024
ef60c20
correct project root
BryanFauble Jul 9, 2024
7065d8a
Remove setting provider vals
BryanFauble Jul 9, 2024
a3ffe84
Add provider to modules
BryanFauble Jul 9, 2024
1df915c
Set versions tf on module
BryanFauble Jul 9, 2024
069b3b9
Swap to Opentofu
BryanFauble Jul 9, 2024
721d5d1
Set to opentofu 1.7.2
BryanFauble Jul 9, 2024
5fa814a
Correct reference
BryanFauble Jul 9, 2024
93fc88d
Remove policy attachment for now
BryanFauble Jul 9, 2024
15577f3
import root admin stack
BryanFauble Jul 9, 2024
fe49839
Set space on spacelift modules, auto deploy admin stack
BryanFauble Jul 9, 2024
8d674c9
Enable space inheritence
BryanFauble Jul 9, 2024
63e8093
Move location of autoscaler
BryanFauble Jul 9, 2024
a0eaf32
Update eks module to take in VPC name
BryanFauble Jul 9, 2024
c0c0411
Update k8s sandbox stack
BryanFauble Jul 9, 2024
4783cd0
Correct output
BryanFauble Jul 9, 2024
a889572
Attach cloud integration to stack
BryanFauble Jul 9, 2024
484f65c
Pass along IDs so they're dynamically passed
BryanFauble Jul 9, 2024
c26d084
Pass along vpc_id to eks cluster
BryanFauble Jul 9, 2024
972346d
Remove data elements in favor of variables
BryanFauble Jul 9, 2024
5e0f536
Testing out stack dependencies
BryanFauble Jul 9, 2024
6f9a192
Try out setting a context within module
BryanFauble Jul 9, 2024
b675176
Move where context is defined
BryanFauble Jul 9, 2024
153e670
Update var reference
BryanFauble Jul 9, 2024
79d6faa
Update space id
BryanFauble Jul 9, 2024
22edf71
Correct which module context is attached to
BryanFauble Jul 9, 2024
1cb3eda
Watch for changes to modules from root admin stack
BryanFauble Jul 9, 2024
696fa36
Change where kubeconfig context is created
BryanFauble Jul 9, 2024
00d85da
Include contexts in common
BryanFauble Jul 9, 2024
9b28986
Output cluster name and region
BryanFauble Jul 9, 2024
303e144
Try to set version with spacelift_version resource
BryanFauble Jul 9, 2024
c9bf0d1
Increment module versions
BryanFauble Jul 9, 2024
1bd775a
Output cluster and region from eks
BryanFauble Jul 9, 2024
9a99757
dev resources should depend on spacelift root admin stack
BryanFauble Jul 9, 2024
f837ae7
Update dependencies
BryanFauble Jul 9, 2024
cab497b
Manually specifying default storage class to gp3
BryanFauble Jul 9, 2024
4073651
Increment autoscaler version
BryanFauble Jul 9, 2024
6ed6e4b
Use node security group for EC2 instance
BryanFauble Jul 10, 2024
b9fbeeb
Correct output
BryanFauble Jul 10, 2024
1ee75c0
Correct spacelift stack dependency reference
BryanFauble Jul 10, 2024
70920b6
Grant AmazonEKSVPCResourceController to EKS IAM role
BryanFauble Jul 10, 2024
ec09447
Watch module sub directories too
BryanFauble Jul 10, 2024
05a05c9
Bump up eks module
BryanFauble Jul 10, 2024
ab4df14
Bump usage of eks module
BryanFauble Jul 10, 2024
375bb93
output cidr blocks from vpc
BryanFauble Jul 10, 2024
8167abe
Pass along the CIDR block
BryanFauble Jul 10, 2024
54fb11e
Also output private/public subnet cidrs
BryanFauble Jul 10, 2024
b02823e
Move location of files to sub directory
BryanFauble Jul 10, 2024
a86c805
Deploy an example set of services to the k8s cluster
BryanFauble Jul 10, 2024
4cc509f
Try yamldecode
BryanFauble Jul 10, 2024
d0a71a3
Convert over to the related tf resources
BryanFauble Jul 10, 2024
91d0ef4
Fix spaces in quotes
BryanFauble Jul 10, 2024
b9ee7dd
Update plan hooks to provide kubeconfig everywhere
BryanFauble Jul 10, 2024
8688100
Adding k8s provider
BryanFauble Jul 10, 2024
994c8b1
Pass along cluster name
BryanFauble Jul 10, 2024
d94cffa
Fix missing namespaces and dependencies on those namespaces
BryanFauble Jul 10, 2024
62bbb7d
Add more to readmes
BryanFauble Jul 11, 2024
cf30432
Flatten directory structure
BryanFauble Jul 11, 2024
392bce9
Fix project_root
BryanFauble Jul 11, 2024
f1518e5
Update comments
BryanFauble Jul 11, 2024
3e91f90
Enable event logs for vpc-cni plugin
BryanFauble Jul 11, 2024
3cf3ceb
Update eks module version
BryanFauble Jul 11, 2024
a79cd14
Update format of param
BryanFauble Jul 11, 2024
c896b61
Correct param values
BryanFauble Jul 11, 2024
3a1a43b
Add note about initial admin stack
BryanFauble Jul 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 101 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,102 @@
# EKS-stack

Leveraging spot.io, we spin up an EKS stack behind an existing private VPC that has scale-to-zero capabilities. To deploy this stack:

TODO: Instructions need to be re-writen. Deployment is occuring through spacelift.io

<!-- 1. log into dpe-prod via jumpcloud and export the credentials (you must have admin)
2. run `terraform apply`
3. This will deploy the terraform stack. The terraform backend state is stored in an S3 bucket. The terraform state is stored in the S3 bucket `s3://dpe-terraform-bucket`
4. The spot.io account token is stored in AWS secrets manager: `spotinst_token`
5. Add `AmazonEBSCSIDriverPolicy` and `SecretsManagerReadWrite` to the IAM policy -->
# Purpose

This repo is used to deploy an EKS cluster to AWS. CI/CD is managed through Spacelift.

# Directory Structure
```
.: Contains references to all the "Things" that are going to be deployed
├── common-resources: Resources that are environment independent
│ ├── contexts: Contexts that we'll attach across environments
│ └── policies: Rego policies that can be attached to 0..* spacelift stacks
├── dev: Development/sandbox environment
│ ├── spacelift: Terraform scripts to manage spacelift resources
│ │ └── dpe-sandbox: Spacelift specific resources to manage the CI/CD pipeline
│ └── stacks: The deployable cloud resources
│ ├── dpe-sandbox-k8s: K8s + supporting AWS resources
│ └── dpe-sandbox-k8s-deployments: Resources deployed inside of a K8s cluster
└── modules: Templatized collections of terraform resources that are used in a stack
├── apache-airflow: K8s deployment for apache airflow
│ └── templates: Resources used during deployment of airflow
├── sage-aws-eks: Sage specific EKS cluster for AWS
├── sage-aws-k8s-node-autoscaler: K8s node autoscaler using spotinst ocean
└── sage-aws-vpc: Sage specific VPC for AWS
```

This root `main.tf` contains all the "Things" that are going to be deployed.
In this top level directory you'll find that the terraform files are bringing together
everything that should be deployed in spacelift declerativly. The items declared in
this top level directory are as follows:

1) A single root administrative stack that is responsible for taking each and every resource to deploy it to spacelift.
2) A spacelift space that everything is deployed under called `environment`.
3) Reference to the `terraform-registry` modules directory.
4) Reference to `common-resources` or reusable resources that are not environment specific.
5) The environment specific resources such as `dev`, `staging`, or `prod`

This structure is looking to https://github.com/antonbabenko/terraform-best-practices/tree/master/examples for inspiration.

## AWS VPC + AWS EKS
This section describes the VPC (Virtual Private Cloud) that the EKS cluster is deployed
to.

### AWS VPC

The VPC used in this project is created with the [AWS VPC Terraform module](https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest).
It contains a number of defaults for our use-case at sage. Head on over to the module
definition to learn more.

### AWS EKS

[AWS EKS](https://aws.amazon.com/eks/) is a managed kubernetes cluster that handles
many of the tasks around running a k8s cluster. On-top of it we are providing the
configurable parameters in order to run a number of workloads.

#### EKS API access
API access to the kubernetes cluster endpoint is set to `Public and private`.

##### Public
This allows one outside of the VPC to connect via `kubectl` and related tools to
interact with kubernetes resources. By default, this API server endpoint is public to
the internet, and access to the API server is secured using a combination of AWS
Identity and Access Management (IAM) and native Kubernetes Role Based Access Control
(RBAC).

##### Private
You can enable private access to the Kubernetes API server so that all communication
between your worker nodes and the API server stays within your VPC. You can limit the
IP addresses that can access your API server from the internet, or completely disable
internet access to the API server.


#### EKS VPC CNI Plugin
This section describes the VPC CNI (Container Network Interface) that is being used
within the EKS cluster. The plugin is responsible for allocating VPC IP addresses to
Kubernetes nodes and configuring the necessary networking for Pods on each node.


#### Security groups for pods
Allows us to assign EC2 security groups directly to pods running in AWS EKS clusters.
This can be used as an alternative or in conjunction with `Kubernetes network policies`.

#### Kubernetes network policies
Controls network traffic within the cluster, for example pod to pod traffic.

Further reading:
- https://docs.aws.amazon.com/eks/latest/userguide/cni-network-policy.html
- https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html
- https://aws.amazon.com/blogs/containers/introducing-security-groups-for-pods/
- https://kubernetes.io/docs/concepts/services-networking/network-policies/


#### EKS Autoscaler

Us use spot.io to manage the nodes attached to each of the EKS cluster. This tool has
scale-to-zerio capabilities and will dynamically add or removes nodes from the cluster
depending on the required demand. The autoscaler is templatized and provided as a
terraform module to be used within an EKS stack.


#### Connecting to an EKS cluster for kubectl commands

To connect to the EKS stack running in AWS you'll need to make sure that you have
SSO setup for the account you'll be using. Once setup run the commands below:
Expand All @@ -21,55 +109,5 @@ aws sso login --profile dpe-prod-admin
# AWS using my SSO session for the profile `dpe-prod-admin`. After authenticated
# assuming that we want to use the `role/eks_admin_role` to connect to the k8s
# cluster". This will update your kubeconfig with permissions to access the cluster.
aws eks update-kubeconfig --region us-east-1 --name dpe-k8 --role-arn arn:aws:iam::766808016710:role/eks_admin_role --profile dpe-prod-admin
```

## Future work

1. Create a separate VPC dedicated to the K8 cluster
2. Create CI/CD to deploy this stack
3. Push this entire stack behind a module
4. Create a module for the node groups so we can attach node groups to EKS cluster


## Adding a node group (WIP)

1. Add an EKS node group

```
two = {
name = "seqera"
desired_size = 1
min_size = 0
max_size = 10

instance_types = ["t3.large"]
capacity_type = "SPOT"
}
```

2. Add an AWS IAM instance profile

```
data "aws_iam_instance_profiles" "profile2" {
depends_on = [module.eks]
role_name = module.eks.eks_managed_node_groups["two"].iam_role_name
}
```

3. Add an ocean virtual node group

```
module "ocean-aws-k8s-vng_gpu" {
source = "spotinst/ocean-aws-k8s-vng/spotinst"

name = "seqera" # Name of VNG in Ocean
ocean_id = module.ocean-aws-k8s.ocean_id
subnet_ids = var.subnet_ids

iam_instance_profile = tolist(data.aws_iam_instance_profiles.profile2.arns)[0]
# instance_types = ["g4dn.xlarge","g4dn.2xlarge"] # Limit VNG to specific instance types
# spot_percentage = 50 # Change the spot %
tags = var.tags
}
```
aws eks update-kubeconfig --region us-east-1 --name dpe-k8 --role-arn arn:aws:iam::766808016710:role/eks_admin_role --profile dpe-prod-admin
```
27 changes: 27 additions & 0 deletions common-resources/contexts/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# infracost integration
resource "spacelift_context" "k8s-kubeconfig" {
description = "Hooks used to set up the kubeconfig for connecting to the K8s cluster"
name = "Kubernetes Deployments Kubeconfig"
space_id = "root"

before_init = [
"aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME"
]

before_plan = [
"aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME"
]

before_apply = [
"aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME"
]

before_perform = [
"aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME"
]

before_destroy = [
"aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME"
]
}

7 changes: 7 additions & 0 deletions common-resources/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module "policies" {
source = "./policies"
}

module "contexts" {
source = "./contexts"
}
9 changes: 9 additions & 0 deletions common-resources/policies/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
output "enforce_tags_on_resources_id" {
value = spacelift_policy.enforce-tags-on-resources.id
description = "The ID for this spacelift_policy. Checks that a cost center tag is added."
}

output "check_estimated_cloud_spend_id" {
value = spacelift_policy.cloud-spend-estimation.id
description = "The ID for this spacelift_policy"
}
File renamed without changes.
22 changes: 0 additions & 22 deletions data.tf

This file was deleted.

4 changes: 1 addition & 3 deletions deployments/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
## Deployments

These are the different deployments that are within the kubernetes cluster
This directory is not actively used and will be removed in the future
11 changes: 11 additions & 0 deletions dev/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
resource "spacelift_space" "development" {
name = "development"
parent_space_id = var.parent_space_id
description = "Contains all the resources to deploy out to the dev enviornment."
inherit_entities = true
}

module "dpe-sandbox-spacelift" {
source = "./spacelift/dpe-sandbox"
parent_space_id = spacelift_space.development.id
}
125 changes: 125 additions & 0 deletions dev/spacelift/dpe-sandbox/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
resource "spacelift_space" "dpe-sandbox" {
name = "dpe-sandbox"
parent_space_id = var.parent_space_id
description = "Contains resources for the DPE team for sandbox testing."
inherit_entities = true
}

resource "spacelift_stack" "k8s-stack" {
github_enterprise {
namespace = "Sage-Bionetworks-Workflows"
id = "sage-bionetworks-workflows-gh"
}

administrative = false
autodeploy = true
branch = "ibcdpe-935-vpc-updates"
BryanFauble marked this conversation as resolved.
Show resolved Hide resolved
description = "Infrastructure to support deploying to an EKS cluster"
name = "DPE DEV Kubernetes Infrastructure"
project_root = "dev/stacks/dpe-sandbox-k8s"
repository = "eks-stack"
terraform_version = "1.7.2"
terraform_workflow_tool = "OPEN_TOFU"
BryanFauble marked this conversation as resolved.
Show resolved Hide resolved
space_id = spacelift_space.dpe-sandbox.id
}

resource "spacelift_stack" "k8s-stack-deployments" {
github_enterprise {
namespace = "Sage-Bionetworks-Workflows"
id = "sage-bionetworks-workflows-gh"
}

administrative = false
autodeploy = true
branch = "ibcdpe-935-vpc-updates"
description = "Deployments internal to an EKS cluster"
name = "DPE DEV Kubernetes Deployments"
project_root = "dev/stacks/dpe-sandbox-k8s-deployments"
repository = "eks-stack"
terraform_version = "1.7.2"
terraform_workflow_tool = "OPEN_TOFU"
space_id = spacelift_space.dpe-sandbox.id
}

resource "spacelift_context_attachment" "k8s-kubeconfig-hooks" {
context_id = "kubernetes-deployments-kubeconfig"
stack_id = spacelift_stack.k8s-stack-deployments.id
}

resource "spacelift_stack_dependency" "k8s-stack-to-deployments" {
stack_id = spacelift_stack.k8s-stack-deployments.id
depends_on_stack_id = spacelift_stack.k8s-stack.id
}

resource "spacelift_stack_dependency_reference" "vpc-id-reference" {
stack_dependency_id = spacelift_stack_dependency.k8s-stack-to-deployments.id
output_name = "vpc_id"
input_name = "TF_VAR_vpc_id"
}

resource "spacelift_stack_dependency_reference" "private-subnet-ids-reference" {
stack_dependency_id = spacelift_stack_dependency.k8s-stack-to-deployments.id
output_name = "private_subnet_ids"
input_name = "TF_VAR_private_subnet_ids"
}

resource "spacelift_stack_dependency_reference" "security-group-id-reference" {
stack_dependency_id = spacelift_stack_dependency.k8s-stack-to-deployments.id
output_name = "node_security_group_id"
input_name = "TF_VAR_node_security_group_id"
}

resource "spacelift_stack_dependency_reference" "vpc-cidr-block-reference" {
stack_dependency_id = spacelift_stack_dependency.k8s-stack-to-deployments.id
output_name = "vpc_cidr_block"
input_name = "TF_VAR_vpc_cidr_block"
}

resource "spacelift_stack_dependency_reference" "region-name" {
stack_dependency_id = spacelift_stack_dependency.k8s-stack-to-deployments.id
output_name = "region"
input_name = "REGION"
}

resource "spacelift_stack_dependency_reference" "cluster-name" {
stack_dependency_id = spacelift_stack_dependency.k8s-stack-to-deployments.id
output_name = "cluster_name"
input_name = "CLUSTER_NAME"
}

resource "spacelift_stack_dependency_reference" "cluster-name-tfvar" {
stack_dependency_id = spacelift_stack_dependency.k8s-stack-to-deployments.id
output_name = "cluster_name"
input_name = "TF_VAR_cluster_name"
}

# resource "spacelift_policy_attachment" "policy-attachment" {
# policy_id = each.value.policy_id
# stack_id = spacelift_stack.k8s-stack.id
# }

resource "spacelift_stack_destructor" "k8s-stack-deployments-destructor" {
depends_on = [
spacelift_stack.k8s-stack,
]

stack_id = spacelift_stack.k8s-stack-deployments.id
}

resource "spacelift_stack_destructor" "k8s-stack-destructor" {
stack_id = spacelift_stack.k8s-stack.id
}

resource "spacelift_aws_integration_attachment" "k8s-aws-integration-attachment" {
integration_id = "01HXW154N60KJ8NCC93H1VYPNM"
stack_id = spacelift_stack.k8s-stack.id
read = true
write = true
}

resource "spacelift_aws_integration_attachment" "k8s-deployments-aws-integration-attachment" {
integration_id = "01HXW154N60KJ8NCC93H1VYPNM"
BryanFauble marked this conversation as resolved.
Show resolved Hide resolved
stack_id = spacelift_stack.k8s-stack-deployments.id
read = true
write = true
}
7 changes: 7 additions & 0 deletions dev/spacelift/dpe-sandbox/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
output "k8s_stack_id" {
value = spacelift_stack.k8s-stack.id
}

output "k8s_stack_deployments_id" {
value = spacelift_stack.k8s-stack-deployments.id
}
Loading