Skip to content

Send email / restart node if kubernetes node is down for 5 plus minutes.

Notifications You must be signed in to change notification settings

interwebology/kubernetes_restart_nodes_automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Restart or Alert On Nodes That stay NotReady for over 5 Minutes.

This was meant to reboot nodes in lab environments, but was modified to just send an email at one point.

This script uses kubectl to switch between contexts listed in lab_cluster.txt and records NotReady nodes.

If a node stays down for more then 5 minutes then a email is sent to the team members listed in the script.

An alert email has the following text.

Nodes in NotReady State
 

dev-cluster
172.27.120.44
172.27.120.4

cluster-lake-01
172.72.17.21

Installation

kubectl

install using kubectl site instructions.

You can find more information about configuring kubectl HERE

You will now need to use the secure copy command to transfer the file into your hosting VM

scp kubectl [email protected]:~

Now SSH into your box home directory and make the kubectl binary executable.

 chmod +x ./kubectl 

Move the binary in to your PATH.

 sudo mv ./kubectl /usr/local/bin/kubectl

now configure kubectl

You can find more information about configuring kubectl HERE

Install packages

  • sudo yum install python-pip
  • sudo pip install jinja2

Copy over kube_monitor.py script

Inside the repo on your local machine move app over to /usr/local/bin/app and add this to the remotes path

scp -r app [email protected]:/usr/local/bin/

Add to Path

echo 'export PATH=$PATH:/usr/local/bin/app' >> ~/.profile 

then

source ~/.profile

Set Timezone

ln -sf /usr/share/zoneinfo/America/Denver /etc/localtime

Setup The Script Cron Job

Edit the crontab by using the following command

crontab -e

The following top line starts the script at 9am Monday-Friday and the bottom goes and kills the script at 5pm Moday-Friday

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/app:~/.kube

0 9 * * 1-5 nohup /usr/bin/python /opt/app/kube_monitor.py  >> /tmp/kube_monitor.log 2>&1
0 17 * * 1-5 pkill -f kube_monitor.py

The lab_cluster.txt file

This file contains a cluster on each line and must be in the same folder as the kube_monitor script. These are the clusters you have setup for use with your kubectl.

Congrats, You did the thing.

About

Send email / restart node if kubernetes node is down for 5 plus minutes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages