dracut-initqueue timeout and root LV missing, some LV's missing in rescue mode

March 26, 2018 - Reading time: ~1 minute

Yesterday I had a machine that stopped booting after an update. What I didn't know was that during issues with yum update the LVM package was removed and after that the initramfs's (all of them) where regenerated. Which resulted in a machine that wasn't able to boot anymore.

The errors (list of search words, not actual since no log exists):

dracut-initqueue timeout
root does not exist
Starting dracut emergency shell
Entering emergency mode
dracut-initqueue[259]: Warning: dracut-initqueue timeout
dracut-initqueue[279] Warning: Could not boot
dracut-initqueue[279] Warning; /dev/mapper/rhel_...-root does not exit
in rescue mode
job timeout
Timed out waiting for dev-mapper-VG\LV.device
unable to mount logic volumes
vgchange missing
lvchange missing

This is how I solved it:

I started a live CD, configured the network and chrooted into the machine and ran (check kernel version and actual initramfs file):

lsinitrd -m –k /boot/initramfs-3.10.0-693.11.1.el7.x86_64.img | grep lvm (returned nothing)
yum install lvm2
dracut -f /boot/initramfs-3.10.0-693.11.1.el7.x86_64.img 3.10.0-693.11.1.el7.x86_64

Since this isn't standard behavior, you should check all services and make sure that your packages are consistent.

yum check all

Fully automated backup of Satellite

March 7, 2018 - Reading time: ~1 minute

Today I created a crontab entry to automate the backup of Satellite using katello-backup. We had this in the past but it was a bit harsh. Now we keep biweekly fulls, daily incrementals and clean up after one month. (as an example). Make sure that the backup doesn't run when you for example run your OpenScap reports, since all services are down during the backup.

#katello backup, biweekly full + daily incremental
0 2 * * 0 root expr `date +\%s` / 604800 \% 2 >/dev/null || (/usr/sbin/satellite-backup --assumeyes /backup/ && ls -td -- /backup/satellite-backup-* | head -n 1 > /backup/latest_full; find /backup/ -type d -ctime +30 -exec rm -rf {} \;)
0 2 * * 2-6 root /usr/sbin/satellite-backup --assumeyes /backup/ --incremental "$(cat /backup/latest_full | head -n1)"
#this checks if the latest backup failed and cleans up anyway to free up space
0 6 * * 0 if [[ $(find "$(cat /backup/latest_full)" -mtime +15 -print) ]]; then find /backup/ -type d -ctime +30 -exec rm -rf {} \;; fi

Shrink a disk

February 6, 2018 - Reading time: 6 minutes

This was tested on RHEL7 but should work on any recent Linux.

In case you have a too large disk or you want to roll back an expansion because for instance the need of extra disk space isn't relevant anymore, you can follow this guide.

! Always make sure you have a backup, these are dangerous commands to run that can lead to data corruption

There are a lot of tricks you can do but these ones are tested. Shifting around with new disks to swap should be the safest way but Red Hat's implementation of Grub and initramfs has some drawbacks which during my tests lead to an unbootable machine. This article describes how I did a move (over network if you like) in a Debian-based set up.

First things first, resize the actual file system(s). If you can't unmount the partition, start from a rescue disk (gparted-live for instance) to do this part and return to the installed system afterwards. Take a margin of 10% to make sure we don't have any conversion issues resulting in dataloss (repeat this for all partitions)

If you are on xfs or another filesystem that doesn't allow shrinking you can use fstransform (actually you can always use this). fstransform is in the EPEL directory but you can just download the RPM and install it with yum install too.

If you want to keep xfs

fstransform /dev/[volumegroup]/[logicalvolume] --new-size=[size] xfs

If you rather would have ext4

fstransform /dev/[volumegroup]/[logicalvolume] --new-size=[size] ext4

for ext4

e2fsck -f /dev/[volumegroup]/[logicalvolume]
resize2fs -p /dev/[volumegroup]/[logicalvolume] [size]
 
example:
resize2fs -p /dev/mapper/vg_system_lv_tmp 900M

Next we are going to reduce the LVM volume (if you can't manage the downtime, this part you can do with a mounted FS) (repeat this for all partitions)

lvreduce -L [newlvsize] /dev/[volumegroup]/[logicalvolume]
 
example:
lvreduce -L 900M /dev/mapper/vg_system_lv_tmp

Next if you want to drop a disk, now is the moment to clean it out. (if your LVM is on one disk, you can skip this part)

pvmove /dev/[the disk you want to remove]
vgreduce [volumegroup] /dev/[the disk you want to remove]
pvremove /dev/[the disk you want to remove]

Now you can drop/reuse this disk.

To shrink the LVM part on a disk with partition table you should first check the used and total extensions and ask LVM to move all extensions to the front of the LVM partition

vgdisplay
pvmove --alloc anywhere /dev/[partition]:[alloc PE + 1]-[total PE - 1]

Now check if all used extends are on the front of the disk and the only free portion is on the end. If not repeat the previous pvmove command until you get the desired result.

pvs -v --segments

This is how it should look

undefined

And now you can resize the volume group to match the wanted disk size

pvresize --setphysicalvolumesize [size] /dev/[lvm partition]
 
example:
pvresize --setphysicalvolumesize 38G /dev/sda2

Next you can resize the partition table to become small enough to fit the end size of your disk and next you can resize the virtual disk. Check the extends of an other virtual machine with the wanted disk size if you want to push it to the limit, otherwise keep a margin. I used 40G as an example.

fdisk /dev/sda
*d* -> select partition 2 if that is the LVM one
*n* -> create a new primary partition 2 with the default suggested start sector but for the end sector select 83886079 (for a 40G total disk with 500M boot partition)
*:wq* to save

Now resize the VM disk to the same size (40G in this example). Next you can use lvextend and resize2fs to fill up the margin again to get the desired result.

Since it seems impossible to shrink a disk in hyper-v (Gen 1). I added a 40G disk, started the live cd again and did:

dd if=/dev/sda of=/dev/sdb

Next I just removed the first disk (if you want to be safe, press remove but select "no" when hyper-v asks if you want to remove the disk from the host too) and made the new disk the primary disk and marked it "contains the operating system for the virtual machine" and it booted fine.

pvresize /dev/[lvm partition]
lvresize -L [oldlvsize] /dev/[volumegroup]/[logicalvolume]
resize2fs /dev/[volumegroup]/[logicalvolume]

And finally delete the old disk from the host if you kept it. For this go to the host running the VM, go into the directory you find when you open the disk settings of the VM and just delete the old disk with shift+delete.


Simple script to get the IP of a list of hostnames (update)

December 18, 2017 - Reading time: ~1 minute

I made a file called to_find_ip which has a hostname on every line. And a simple bash script to process the file and return the matching IP.

The script called get_ip_for_list_of_hostnames.sh (use getent ahosts if you only need IPv4 addresses)

#!/bin/bash
while read p; do
getent hosts $p | cut -f1 -d ' '
done <$1

To run the script:

sh /tmp/get_ip_for_list_of_hostnames.sh /tmp/to_find_ip

Swap clean up script (update)

October 3, 2017 - Reading time: ~1 minute

Today I had a machine that had no swap space left which in return triggered a monitoring alert. Since the pressure was already gone but the swap wasn't cleared fast enough I searched and found this script as a nice solution. The script was a bit older and was made for a version of free that seperated buffers and caches. In newer versions they are combined as "buff/cache" and a new field is introduced that shows the combined available memory. Which gave me a chance to simplify the script.

clearswap.sh

#!/bin/bash

free_mem="$(free | grep 'Mem:' | awk '{print $7}')"
used_swap="$(free | grep 'Swap:' | awk '{print $3}')"

echo -e "Free memory:\t$free_mem kB ($((free_mem / 1024)) MiB)\nUsed swap:\t$used_swap kB ($((used_swap / 1024)) MiB)"
if [[ $used_swap -eq 0 ]]; then
    echo "Congratulations! No swap is in use."
elif [[ $used_swap -lt $free_mem ]]; then
    echo "Freeing swap..."
    sudo swapoff -a
    sudo swapon -a
else
    echo "Not enough free memory. Exiting."
    exit 1
fi

 Thx Scott Severance for the initial script


Ansible check if machines can connect to a port on a server

September 12, 2017 - Reading time: ~1 minute

I had to test if all hosts were able to connect to a certain port on a certain server. Ansible was the perfect tool, but since not all machines have nc or nmap installed I had to make a workaround using Python. I will be checking if TCP port nagios.company.com:5666 is open.

the script, nagios.py (will be copied and executed on the remote host by ansible). This only works for TCP, changing SOCK_STREAM to SOCK_DGRAM will always return 0.

#!/usr/bin/python
import socket;
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
result = sock.connect_ex(('nagios.company.com',5666))
if result == 0:
   print "OK"
else:
   print "Not OK"

the playbook, nagios.yml

- name: Nagios connectivity test
  hosts: all
  tasks:
    - name: script
      script: /tmp/nagios.py
      register: nagios
    - debug: msg="{{ nagios.stdout }}"

the run command to filter the hosts that can't connect

ansible-playbook /tmp/nagios.yml | grep -B1 Not\ OK

About

Koen Diels




I'm a freelance system and network engineer from Mechelen (BE) and I'm available for ad-hoc and long term projects.

>>my resume<<

Navigation