Physical movement of disks supported by external tool
In last blog, I came up with a scenario where we have to scale our disperse volume by one node only and how can we adopt two different approaches to do so. Here, I will continue with the second approach which involves physical migration of drives/bricks from one server/node to other server/node. This approach is supported by a tool called as pos_main.py.
Little background:
When we add new bricks to an existing volume we use “gluster volume add-brick” command. It is our responsibility to make sure that all (or redundant number) bricks are hosted on different nodes.
Before using this tool, let’s see two important conditions or state which we should always have during this migration of drives from one server to other server.
1 – We should not migrate more than redundant (Let’s call it R) number of bricks of any old sub volume to new node.
2 – At the end we should not have more than R bricks from new node to any of the old node.
Let’s move to the usage of the tool, pos_main.py.
It is required to add new node into the cluster of the node which we are using for our existing disperse volume. We can do so by using gluster peer probe <new node IP or hostname>.
pos_main.py needs to be executes from any one of the server in cluster pool. We need to provide three inputs to this tool after we run it.
1 – Name of the volume
2 – Name of the new server
3 – File name consisting list of new drives in the form of “node:<mount-point of the drive>”
# python3 pos_main.py
Enter name of the volume: vol
Enter file name which contains new bricks: new-node.txt
Enter IP/hostname of new node: node-4
volname = vol
filename = new-node.txt
hostname = node-4
New Bricks***
node-4:/root/brick-new-1
node-4:/root/brick-new-2
node-4:/root/brick-new-3
node-4:/root/brick-new-4
node-4:/root/brick-new-5
node-4:/root/brick-new-6
As soon as we provide these inputs to the script, it validates the inputs and checks if the volume can be scaled or not. Following checks will be done to make sure that scaling can be done without any issue –
1 – Check for the health of the existing volume which we are trying to scale.
2 – Do we have enough new bricks to scale.
3 – The existing volume is well spread out and having a fault tolerant setup.
Once all the conditions are met and checks are done by the script, it does all the calculations and provide a map of the drives which contains old-bricks and the respective new-bricks which should be swapped. An example:
old_brick=apandey:/home/apandey/bricks/gluster/vol-6 and new brick=node-4:/root/brick-new-6
old_brick=apandey:/home/apandey/bricks/gluster/vol-18 and new brick=node-4:/root/brick-new-5
All together : 1
One brick at a time : 0
How do you want to swap device: (0 or 1)
Map will contain all the old and new bricks which are required to be swapped between old and new server. Here afterwards, a user can swap the drives all in one short or one by one as per the input given to the script and as per user choice.
If user opts for “One brick at a time“, script will wait for user to swap the drive it was asked. Once swap is confirmed, drive on new node will be included on existing disperse sub volume and heal will be triggered to make sure volume becomes healthy.
On the other hand if user goes for the other option i.e. “All together” this tool will wait for the user to swap all the drives and confirm it. At the end, all the bricks will be included into the existing volume and again heal will be triggered to make sure we don’t have any inconsistency in our volume.
After successful swapping, we would see all the new bricks on different servers. Now, we can add these bricks using “gluster volume add-brick” command and all the bricks will be properly spread out to have a fault tolerant disperse volume.
Yeh, it does look like a complex procedure to scale our volume by one node. However, this is the fastest and very reliable way of adding new node on which we have drives.
Please drop a comment if you have any queries or better approach to scale our setup by one node. Don’t hesitate to modify the tool to make it error free, faster and better.