Proxmox 4.1 kernel panic: downgrade DRBD resources from DRBD 9 to 8.4

I've upgraded my Proxmox environment to the latest 4.1, featuring the brand new (and still in preview state) DRBD 9.

Proxmox documentation is not so clear about it's preview state and, also, users have no choice about the DRBD version to use; Proxmox 4.1 only has DRBD 9, no choice.

That said, once completed upgrade I found that DRBD9 was the cause of host node crashes (I mean the whole host crashed with a kernel panic, together with all hosted VMs).
The issue has been reported by me and another user here:
https://forum.proxmox.com/threads/kernel-panic-with-proxmox-4-1-13-drbd-...
http://www.gossamer-threads.com/lists/drbd/users/27685

It seems that the only way-out is to downgrade DRBD resources to version 8.4.
I've done it and I had no issues after that (my environment is running flawlessy since a week ago).

NOTE: since Proxmox doesn't support 8.4 anymore, you have to build it from sources and replace the original one installed from Proxmox repositories.

System configuration: two nodes Proxmox 4.1, LVM on DRBD resource r0, DRBD 9.0, kernel pve-kernel-4.2.8-1-pve.

Downgrade running kernel from DRBD 9.x to DRBD 8.4

This part describes the procedure to downgrade from a running DRBD 9.x to DRBD 8.4 on the same kernel version.
If you're going to upgrade your kernel (already running a DRBD 8.4 module) to a newer version please follow the chapter below.

We'll have to downgrade one node at a time, let's start from NodeA.

Initialization

Define a variable containing the kernel version you're compiling for. If this is the first downgrade (still running the bundled DRBD 9.0), KVER must be set to current kernel version:

export KVER=`uname -r`

Install build tools, DRBD sources and kernel headers

apt-get install build-essential flex
apt-get install pve-headers-$KVER
cd /usr/src
wget http://oss.linbit.com/drbd/8.4/drbd-8.4.7-1.tar.gz
wget http://oss.linbit.com/drbd/drbd-utils-8.9.6.tar.gz
tar zxvf drbd-8.4.7-1.tar.gz
tar zxvf drbd-utils-8.9.6.tar.gz

Build DRBD module and utils

NOTE: configure scripts will automatically use the $KVER variable defined above.

cd /usr/src/drbd-8.4.7-1
make clean
cd drbd
make

you can now (optionally) strip the binaries and make them smaller

strip --strip-unneeded drbd.ko

now build the userland utils

cd /usr/src/drbd-utils-8.9.6
./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc --without-83support --with-84support --without-manual --with-distro=debian
make clean
make

and (optionally) strip the binaries to make them smaller

strip --strip-unneeded drbdadm-84
strip --strip-unneeded drbdsetup-84
strip --strip-unneeded drbdmeta

Move all the VMs to NodeB and shutdown resource

Move all of the VMs using the resource we're going to downgrade to the other node, then demote and deactivate resource.

drbdadm secondary r0
drbdadm down r0

Replace the bundled DRBD 9.x module with our own 8.4 version

rmmod drbd_transport_tcp
rmmod drbd
cd /lib/modules/$KVER/kernel/drivers/block/drbd
mv drbd.ko drbd.ko-9.0.0
mv drbd_transport_tcp.ko drbd_transport_tcp.ko-9.0.0
cp /usr/src/drbd-8.4.7-1/drbd/drbd.ko .
modprobe drbd

Replace DRBD 9.x tools with version 8.4

cd /usr/sbin
mv drbdadm drbdadm-9.0.0
mv drbdmeta drbdmeta-9.0.0
mv drbd-overview drbd-overview-9.0.0
mv drbdsetup drbdsetup-9.0.0
cp /usr/src/drbd-utils-8.9.6/user/v84/drbdadm-84 .
ln -s drbdadm-84 drbdadm
cp /usr/src/drbd-utils-8.9.6/user/v84/drbdsetup-84 .
ln -s drbdsetup-84 drbdsetup
cp /usr/src/drbd-utils-8.9.6/user/v9/drbdmeta .
cp /usr/src/drbd-utils-8.9.6/scripts/drbd-overview.pl drbd-overview

NOTE: in my setup I've also had to edit resource configuration file and comment out lines with "node-id: xxx;" parameters

DRBD utils warning

DRBD utils now emit a warning like this:

DRBD module version: 8.4.7
   userland version: 8.9.6
please don't mix different DRBD series.

That's because drbd-utils-8.9.6 looks forward to 9.x series so they warn if used with previous ones.
It is safe to ignore it but, if you don't feel comfortable, you can suppress the message by defining an environment variable:

export DRBD_DONT_WARN_ON_VERSION_MISMATCH=1

The relevant source file is in drbdadm_main.c, line 3597:

if (!getenv("DRBD_DONT_WARN_ON_VERSION_MISMATCH"))
    warn_on_version_mismatch();

Then each time you run drbdadm, the version mismatch warning won't be shown anymore.
You can add it to /etc/environment to have it defined for each new shell.

Downgrade resource metadata

sudo drbdadm create-md r0

It will ask you a confirmation before downgrading metadata from v90 to v84 format.

Restart DRBD service

/etc/init.d/drbd restart

Check resource status

The resource should be up and running (and resyncing), otherwise bring it up.

drbdsetup status --verbose
drbdadm up r0

Downgrade the other node

You can follow the same guide to downgrade the other host or, if both nodes share the same kernel and hardware, you could simply copy binary compiled files over it and install them, no need to install build tools.

 

Kernel updates

Each new kernel you'll install in the future needs a downgraded DRBD 8.4 module to be built for it.
The procedure is almost identical with some small changes.

Install new kernel & headers

Install the new kernel and its headers (i.e. version 4.99.99)

sudo apt-get install pve-kernel-4.99.99-pve pve-headers-4.99.99-pve

Define a variable containing the target kernel version.
Since we're compiling for a kernel not currently running, KVER must be set manually:

export KVER=4.99.99-pve

If there's a newer DRBD version available, you could update its sources too:

cd /usr/src
wget http://oss.linbit.com/drbd/8.4/drbd-X.X.X.tar.gz
wget http://oss.linbit.com/drbd/drbd-utils-Y.Y.Y.tar.gz
tar zxvf drbd-X.X.X-1.tar.gz
tar zxvf drbd-utils-Y.Y.Y.tar.gz

Build DRBD module and utils

NOTE: configure scripts will automatically use the $KVER variable defined above, but making the module need to specify the KDIR parameter manually.

Build drbd module:

cd /usr/src/drbd-8.4.7-1
make clean
cd drbd
make KDIR=/usr/src/linux-headers-$KVER

you can now (optionally) strip the binaries to make them smaller

strip --strip-unneeded drbd.ko

NOTE: building drbd-8.4.7-1 on some 4.4 kernels could fail with the error: drbd_bitmap.c:1033:60: error: ‘__GFP_WAIT’ undeclared (first use in this function)
You need to apply a small patch (see here: http://www.engisoftcloud.com/2016/04/07/instalacion-drbd-en-amazon-linux...):
UPDATE: kernel 4.4.13-2-pve compiled successfully without this patch

find /usr/src/drbd-8.4.7-1 -type f -exec sed -i  s/__GFP_WAIT/__GFP_RECLAIM/g {} \;

Then build drbd-tools:

NOTE: this step is optional if DRBD tools sources were not updated.

cd /usr/src/drbd-utils-8.9.6
./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc --without-83support --with-84support --without-manual --with-distro=debian
make clean
make

and (optionally) strip the binaries to make them smaller

strip --strip-unneeded drbdadm-84
strip --strip-unneeded drbdsetup-84
strip --strip-unneeded drbdmeta

Replace the bundled DRBD 9.x module with our own 8.4 version

cd /lib/modules/$KVER/kernel/drivers/block/drbd
mv drbd.ko drbd.ko-9.0.0
mv drbd_transport_tcp.ko drbd_transport_tcp.ko-9.0.0
cp /usr/src/drbd-8.4.7-1/drbd/drbd.ko .

Replace DRBD 9.x tools with version 8.4

cd /usr/sbin
mv drbdadm drbdadm-9.0.0
mv drbdmeta drbdmeta-9.0.0
mv drbd-overview drbd-overview-9.0.0
mv drbdsetup drbdsetup-9.0.0
cp /usr/src/drbd-utils-8.9.6/user/v84/drbdadm-84 .
ln -s drbdadm-84 drbdadm
cp /usr/src/drbd-utils-8.9.6/user/v84/drbdsetup-84 .
ln -s drbdsetup-84 drbdsetup
cp /usr/src/drbd-utils-8.9.6/user/v9/drbdmeta .
cp /usr/src/drbd-utils-8.9.6/scripts/drbd-overview.pl drbd-overview

Reboot the node and check resource status

Move all the virtual machines to the other node(s) and reboot this node.
When it came back up check if DRBD resources are up and running on the newer kernel.

uname -r
--> must print new kernel version!
 
sudo drbdsetup status --verbose

Downgrade the other node

Follow this same guide to upgrade the other host kernel.
If both nodes have the same kernel and hardware, you could simply copy binary compiled files over it and install them, no need to install build tools.
Beware: copy the *-84 binaries then recreate the links as above.

 

Hope these instructions will help other users experiencing my same issue...

03 Aug 2016
- just updated to kernel 4.4.13-2-pve without issues, using this same instructions

20 May 2016
- added instructions to disable drbd-utils warning message

12 Apr 2016
- updated procedure (thanks to Jean-Laurent Ivars suggestions)
- added the kernel update procedure

02 May 2016
- added workarounds to build modules for kernel 4.4

05 May 2016
- added optional binary strip commands

06 May 2016
- removed sudo usage (not installed on proxmox by default)
- utils binary update is not optional but mandatory on kernel update

Category: 

Comments

Hello :)

After having succesfully followed your procedure for the initial downgrade, I followed your procedure for kernel upgrades:
 
- I made the apt-get dis-upgrade, so it installed the latest kernel version : 4.4.6-1-pve
- I did not forgot to install pve-headers-4.4.6-1-pve too
- I used your tip : export KVER=4.4.6-1-pve
 

But I can’t obtain the make to use the right kernel version, when I launch the compilation, I clearly see it’s using the actual running kernel version (4.2.8-1-pve), I opened the makefile and I saw the line : KVER = `uname -r` so I decided to comment it but it did not change nothing, I even tried to put directly the right value in the makefile but it’s ignoring these values and using the running kernel ones :

[email protected] /usr/src/drbd-8.4.7-1/drbd # make

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/4.2.8-1-pve/build

make -C /lib/modules/4.2.8-1-pve/build   SUBDIRS=/usr/src/drbd-8.4.7-1/drbd  modules
  CHK     /usr/src/drbd-8.4.7-1/drbd/.compat_test.have_bdev_discard_alignment.result
  UPD     /usr/src/drbd-8.4.7-1/drbd/.compat_test.have_bdev_discard_alignment.result

I don’t know what more to do, I don’t dare reboot to compile it in the right environnement, I really would prefer compile everything correctly before rebooting…
 
Tanks in advance for your answer ! (I’m going to post this as a comment on your website as it can help other people maybe)
Best regards

Thanks for your comment, I've updated the post to let the build complete successfully for 4.4 kernels.

Hello,
 
i did a fresh install of proxmox 4.2 and test to downgrade drbd to 8.4.
I did all the commands lines, except for moving vm.
I have the following message when I restart drbd :

  [email protected]:/usr/sbin# /etc/init.d/drbd restart
  DRBD module version: 8.4.7
     userland version: 8.9.6
  please don't mix different DRBD series.
 
it's normal or i missed something ?
thank you, best regards.
 

That message is normal: drbd-utils-8.9.6 looks forward to 9.x series so it alerts if used with previous ones. I'll live with it but there's a way to disable; I've updated the blog with instructions on how to do it.

I thought about that but I wondered if this would interfere with the good functioning of drbd version 8.4. Thank you for this tutorial very clear and usefull :)

Hello,

I recently (just a few days ago) upgraded the kernel to 4.4.8-1-pve following the upgrade procedure and thank to it everything went perfefctly fine (thanks) but today when I enter apt-get dist-upgrade, i can see in the list, the system want to upgrade my kernel again, even if it's the same version number !

uname -r
4.4.8-1-pve
 
apt-get dist-upgrade 
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances       
Lecture des informations d'état... Fait
Calcul de la mise à jour... Fait
Les NOUVEAUX paquets suivants seront installés :
  pve-docs
Les paquets suivants seront mis à jour :
  libexpat1 libpve-common-perl libpve-storage-perl proxmox-ve pve-cluster pve-container
  pve-firewall pve-headers-4.4.8-1-pve pve-kernel-4.4.8-1-pve pve-manager pve-qemu-kvm qemu-server
12 mis à jour, 1 nouvellement installés, 0 à enlever et 0 non mis à jour.
Il est nécessaire de prendre 62,9 Mo dans les archives.
Après cette opération, 5 823 ko d'espace disque supplémentaires seront utilisés.
Souhaitez-vous continuer ? [O/n] 

I don't really understand how a kernel with the same number version can appear in this list but it seem the command "apt-cache show  pve-kernel-4.4.8-1-pve" give more information and there is more version number that showned (-49, -50, -51...) so, I suppose, as it's not the same version and the procedure has to be followed one more time...?

it's not that it's long or complicated but always a big moment of stress for me because it's on a production cluster and I don't feal comfortable to rely for my production on the good willing of a compilation :(

 

Thanks in advance for you answer

That should be the version of the package and it's up to package mantainer.
You could create a package containing kernel 4.4.8, and name it pve-kernel-4.4.8-pve.
Then apply a patch to the kernel and bring it to version 4.4.8-1, then build a package pve-kernel-4.4.8-1-pve.

Now you find a minor error/bug/discrepancy inside your package or in its install/post-install script; this is completely unrelated with the contained kernel, but you need to rebuild a new package and let your users install it.
That's what the last part of the version number is usually used for.

IMHO this should be included also in package name, like pve-kernel-4.4.8-1-51-pve (like Ubuntu team does), anyway it's up to package mantainer...

I understand, thank you for the clarification :)

regards,

Hello Claudio,

How are you ? I hope good :)

There is again a new kernel ! 

I’m tired to hold my breath every time there is an upgrade and I was wondering if you recently gave a new try to version 9 ?

I had a few weeks ago a communication with someone from Linbit and he told me that the bug that we had had been clearly identified and even corrected in the lasts dev version of drbd9 but I don’t know exactly witch drbd9 version is delivered with proxmox but it could be possible the kernel panic bug is gone ? (which wouldn’t necessary mean there would no be other issues though)

I just installed pve in a vm to see the drbd used module info :

[email protected]:~# modinfo drbd
filename:       /lib/modules/4.4.10-1-pve/kernel/drivers/block/drbd/drbd.ko
alias:          block-major-147-*
license:        GPL
version:        9.0.2-1
description:    drbd - Distributed Replicated Block Device v9.0.2-1

I don’t know if the kernel hang bug is settle in this version… but it seem to be the last dev version according to this page : http://www.drbd.org/en/community/download

I don’t have the time to test it right now but as soon as i have the time to do so I'll rent two servers for one week (you can do this with OVH) and I’ll give a try because I’m tired to freak out for every update on my production system…

Best regards.

Hi jeanlau,

no, I haven't upgraded to the new kernel and, most of all, I've not got back to DRBD 9.x.
I think I won't do it till forced (i.e. the 8.4 series won't compile with newer kernels anymore).

I'm actually completely satisfied with my setup and I'm not going to change it so much.
My hosts are isolated from outside world, so kernel upgrades are not my priority.

Anyway I'm interested in your test results: if the bug we're experiencing is fixed in current DRBD9 versions, well, I'll schedule an upgrade as soon as possible.

Cheers
Claudio

Hi Claudio,

I followed the tutorial but I did not have the file /etc/init.d/drbd

so, for drbd started every reboot, I did this:
apt-get install drbd8-utils    before compiling the module 8.4
update-rc.d drbd defaults

Now I have the following message:

[email protected]: ~ # /etc/init.d/drbd status
â drbd.service - DRBD - please disable. UNLESS you are NOT using a cluster manager.
    Loaded: Loaded (/lib/systemd/system/drbd.service; disabled)
    Active: active (exited) since Fri 2016-07-08 2:49:30 p.m. EST; 3 weeks 4 days ago
  Main PID: 3908 (code = exited, status = 0 / SUCCESS)
    CGroup: /system.slice/drbd.service

Maybie a better solution to have drbd starting at everyreboot, or something to modify in a config file ?
thank you
best regards

It seems you're missing DRBD configuration and it warns about it at start.

Please note that this tutorial is for downgrading an already existing and well configured DRBD installation on a ProxmoxVE host.
It is not enough for using DRBD on a clean system.

It's better you read the DRBD9 page at Proxmox wiki and check your config before trying to downgrade (if needed).

Hi,
for information, there is an update drbd version to 8.4.8-1.
I compiled it with last kernel 4.4.15-1-pve without the patch. No problem. Its works :)
Seems there is optimisation for kernel v4.x in changelogs.

Thanks for your comment, will update the post...

Hi Claudio

Thanks a lot for this tutorial. I spent several days trying to setup drbd9 before finding your tutorial wich solves all the problems with this downgrade.

Just a question about drbd utils install: why do you copy the new binaries in the original drbd utils install instead of runing a make install ? I'm not familar with debian OS and packaging (using rpm based linux distro) but running:

1) # apt-get remove drbdmanage drbd-utils
2) # cd /usr/src/drbd-utils-8.9.6
3) # ./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc --without-83support --with-84support --without-manual --with-distro=debian
4) # make clean
5) # make
6) # make install

seems to do the job.

Any subtle difference ?

Well, your list should be the one to follow BUT it could have some drawbacks:

1) This could remove other dependent packages, or at least mark them as "removable".

6) Make install does not only install binaries, it could do anything related to the software being installed: patching config files, update libs config, cleaning up something.ù

This is why I'd prefer a less-invasive approach by replacing only the needed binaries and keep a copy of the old ones.

PS: the best solution should be to have backported packages like drbd-84 and drbd-utils-84 coming from official Proxmox repos...

Hi everyone !

I'm tired of making all these manipulations everytime a kernel upgrade comes... (and feeling so uncomftable, I'm on a production cluster...) 

It would be so much peace of mind if I could update my system without all these manip...

Did someone gave a new try to the version included in the lastest proxmox ? aparently there's a new release of drbd (9.0.5) that could have corrected the issue...

Thank you very much for your feedback and until it is the case, thank you so much Claudio for your help !

regards,

I agree, but I don't have any experience on newer versions because I'm afraid to get back to kernel panic nightmare...

Hope someone will jump in here and post a positive feedback ;)

I was wondering about renting for a week 2 or 3 servers at ovh and give this a new try but as I remember that I already did it a few month ago and it was a complete waste of time and money... two things I don't have so much ;) I'm yet hesitating...

 

Add new comment

warning

Warning, JavaScript is disabled!

JavaScript is not available, maybe because you disabled it globally into your browser settings or you are using an addon like NoScript.

We do not have any dangerous JavaScript running here.
Please enable JavaScript; if you're using NoScript this image will help you adding CoolSoft to your whitelist.

Thanks for your comprehension and enjoy CoolSoft.