How to dynamize your Kubernetes local storage
In this article, I will explain how I used a combination of { Ansible + local storage static provisioner + Kustomize } in order to manage local volumes (on-prem VMs) in a pseudo-dynamic way.
Some issues with the traditional way
Managed Kubernetes clusters are great. Especially when they are managed by major cloud providers, like GCP or AWS.
But shit can happen and you may have to deal yourself with the full complexity of on-premise kubernetes clusters and, even worse, with no volume driver to interact with (ie. vSphere volume plugin, etc.).
When you are dealing with the local
Storage Class, there is no provisioner associated with it. So there are some drawbacks compared to other common dynamic provisioners.
Here is a non-exhaustive list:
-
you have to create and delete manually your Persistent Volumes (PVs)
-
a PV is linked to a specific node. So you have to play with node labels to ensure that your workloads will be scheduled on the correct node(s)
-
you have to create manually one or more Storage Classes, depending on the granularity you want for storage requests. Indeed, if you have different kind of workloads, you don’t want that a given PVC only requesting for 1Gi storage get offered a large pre-created PV of 10Gi! so, as a response to this issue, it’s common to prepare dedicated SC for each type of workload.
A brief reminder of provisioners traits
Internal VS External provisioner
The difference between those two is really well explained here in a few words.
For example, a few Volume Plugins like 'Local' and 'NFS' are not natively managed by Kubernetes. But they can be managed by some external provisioners.
Dynamic VS Static provisioner
Dynamic provisioners will handle the PV lifecycle for you, whereas, as opposite, you may be the "static" provisioner.
The local-storage-static-provisioner
In this article I’m speaking about local storage management. So, here comes the local-storage-static-provisioner. Its purpose is to detect local directories in one - or more - given mount(s) in order to create the associated PV.
These directories can be either mounted filesystem or mounted block disks. In my case, I’m playing with LVM and mounting ext4-formatted LVs on the filesystem.
Once a LV is mounted on a sub-directory under a known parent dir (which is associated to one given StorageClass), then this provisioner will create the associated PV for this new sub-directory.
You can have all your PVs as "instances" of one single StorageClass if they have one common purpose. Or you can play with multiple SC if you want to manage them by size and purpose.
To catch-up with the provisioners definition given above, we are here facing a semi-static / semi-dynamic external provisioner. I hope this makes things neat 😏
The plan with an example
Well, I’ll use the example of an ElasticSearch cluster deployment. As you may know, there are two kinds of nodes in an ES typology: master & data nodes.
So here, I want to use 3 PVs of 5Gi for the 3 master nodes and 2 other PVs of 20Gi for data nodes.
Of course, I want to produce minimal effort to get those PVs ready, so I will use a combination of Ansible playbook and roles:
-
deploy the local-storage-static-provisioner: spawns a DaemonSet, pretty easily thanks to its Helm chart
-
manage the underlying storage with LVs
-
create as many StorageClasses as there are usage types: here, we need to differentiate master nodes and data nodes because of the volumes size requirements, so 2 SC in total
-
update the local-storage-static-provisioner configuration via Kustomize, so that it auto-magically creates the corresponding PVs
-
run the Helm chart of ElasticSearch with a templatized
values.yaml
referencing our new StorageClasses
Here is the Big Picture:
Woh! nice plan! 🎢
Implementation
First, some context about my environment: the Kubernetes cluster was installed with kubeadm and it’s composed of 1 master and 2 workers. helm
and kubectl
commands are run from the master node.
Given the plan above, the implementation is pretty straightforward with Ansible.
Step 1: local-volume-provisioner
I won’t describe how the local-storage-static-provisioner was deployed because I just ran the Helm chart, wrapped in an Ansible role. The only value that was overridden was the classes:
one, with no content at all (I don’t need those default values).
Steps 2, 3 & 4: LVs, StorageClasses and provisioner configuration
Let’s start with the definition of our volumes:
---
local_storage: (1)
name: "local-storage"
base_mount_point: "/mnt/local-storage"
vg_name: "rvg"
volume_elastic_master: (2)
class_name: "elastic-master-sc"
lv_size: "5G"
lv_name: "elastic_master"
pv_reclaimPolicy: Delete
number_of_volumes: 3
volume_elastic_data: (3)
class_name: "elastic-data-sc"
lv_size: "10G"
lv_name: "elastic_data"
pv_reclaimPolicy: Delete
number_of_volumes: 2
1 | Define the base mount path on each k8s worker node and some global vars |
2 | Define the requirements for master nodes of the ES cluster. 3 volumes and LV should be named and sized like described |
3 | Same story for ES data nodes |
Those variables describe the volume requirements for each type of ES node. All of them are managed by a single Ansible role, the so-called smart-local-volume
:
- name: "smart management of volumes"
become: yes
run_once: true
include_role:
name: smart-local-volumes
vars:
volume: "{{ item }}"
workers: "{{ groups['k8s-workers'] }}"
remote_basepath_kustomize_assets: "{{ local_storage.name }}"
with_items:
- "{{ volume_elastic_master }}"
- "{{ volume_elastic_data }}"
So, we have one role in charge of:
-
the creation of all our LVs and mounts
-
the creation of the 2 associated StorageClasses
-
updating the local-storage provisioner config
Note that, there are a few subtleties that could interest you, like the random distribution of volumes across the cluster. See this task, line 47:
- name: global vars
set_fact:
nbWorkers: "{{ workers | length }}"
shuffled_workers: "{{ workers | shuffle }}"
<...>
# LV
- name: "create LV(s)"
become: yes
run_once: true
include_role:
name: lv-mgmt
apply: # https://github.com/ansible/ansible/issues/35398#issuecomment-451301837
delegate_to: "{{ shuffled_workers[loop_index | int % nbWorkers | int] }}" (2)
vars:
logical_volume:
vg: "{{ local_storage.vg_name }}"
lv: "{{ volume.lv_name }}{{ loop_index }}"
size: "{{ volume.lv_size }}"
mount_point: "{{ local_storage.base_mount_point }}/{{volume.class_name }}/{{ volume.lv_name }}{{ loop_index }}"
delegate_facts: true
with_sequence: start=0 end={{ volume.number_of_volumes - 1 }} (1)
loop_control:
loop_var: loop_index
1 | for the number of volumes… |
2 | create the given volume iteratively on nodes, starting from a random worker node |
When it comes to notifying the provisioner that we want it to manage new kind of StorageClasses, we use a combination of Ansible templates and Kustomize in order to patch both the ConfigMap and the DaemonSet.
Have a look at the Ansible script, from line 67 to the end.
As you can see, the kustomization.yaml
file plays 2 patches, one for the ConfigMap, the other one for the DaemonSet.
resources:
- previous_ds.yaml
- previous_cm.yaml
patches:
- config.yaml
- ds.yaml
Patching a DaemonSet is a built-in feature but, as far as I know, there are no pre-build plugins for merging a given item in a ConfigMap (which sounds legit). Let me know if you think Kustomize can do it natively.
Step 5: deploy ElasticSearch
It’s nothing more than another set of small Ansible tasks, pushing the Helm chart and a templatized values.yaml
, itself referencing the same volume vars.
You will find it here.
As you can see, I can use the Helm chart in a standard way, just by referencing the newly created StorageClasses:
master:
name: master
replicas: "{{ volume_elastic_master.number_of_volumes }}"
heapSize: "512m"
# additionalJavaOpts: "-XX:MaxRAM=512m"
persistence:
enabled: true
accessMode: ReadWriteOnce
name: data
size: "{{ volume_elastic_master.lv_size }}"
storageClass: "{{ volume_elastic_master.class_name }}"
resources:
limits:
cpu: "1"
memory: "1024Mi"
requests:
cpu: "25m"
memory: "512Mi"
That' it! PVCs will bind to pre-created PV, according to the StorageClasses.
ElasticSearch should be up & running! 🍩
PV renewal
Depending on the retain policy that you have configured on each StorageClass or PV, your PV may be just "released" or "deleted" once the associated PVC is deleted.
If you set that property to "Delete", the PV will be deleted and automatically re-created in a few seconds by our local provisioner.
As opposite, if you set that property to "Retain", you may have delete the PVC, the PV and to clean the underlying storage volume, see below.
Cleaning tasks
At the moment, the removal part both of PVs and of the underlying storage - ie. the LVs and filesystem mounts - is not yet scripted.
I didn’t start this script because I was not sure if I really need it in a production environment.
Also, it may happen that you just want to remove a subset of volumes for a given class. So cleaning scripts should take this kind of option into consideration.
Here is my personal TODO-list when I want to clear the volumes associated to an application and clean the system:
# alias k='kubectl'
# alias ks='kubectl -n kube-system'
# purge Helm release of the app
$ helm delete --purge <release-name>
# list PVC, if bound, stop app and delete PVC
$ k get pvc -A
$ k delete pvc <...>
# edit the configMap to delete the given SC
$ ks edit cm local-provisioner-config
# edit the daemonSet to do so with volumes and mounts
$ ks edit ds local-volume-provisioner
# check that the DS pods of the provisioner are automatically restarted
$ ks get po
# delete PV
$ k get pv
$ k delete pv <...>
# delete storageClass
$ k get sc
$ k delete sc <...>
# on each worker, delete the content of the mounted dir
$ lsblk
$ rm -rf /mnt/local-storage/registry-sc/registry0/*
# umount dir
$ lsblk
$ umount /mnt/local-storage/registry-sc/registry0
# delete the mounted dir
$ rmdir ...
# lvremove
$ lvs
$ lvremove ...
# update fstab
$ vim /etc/fstab