My witness node on WSL2 on a Windows 10 Home PC has been running for 1.5 months now and has been nicely stable.
The only outages were caused by my kids pressing buttons (now covered) and scheduled maintenance.
I have a plan to turn this machine into an API node and run the witness node on a less beefy machine.
But first it needed upgraded Storage and RAM, as Hive node performance is very dependent on these two things.
Upgrade of SSD Storage
For the first 6 weeks my witness was using my 500Gb NVMe C: drive running on a Gen 2 PCIe x2 slot.
Given the blockchain is 330 Gb by itself and snapshots are about 3Gb each, I was running out of storage space fast.
Also, the Gen 2 PCIe x2 was limiting performance.
I had an Acorn Nest PCIe card that had two M.2 slots (PCIe Gen 3 x4) and a PLX switch (to allow bifurcation) lying around so I thought I'd buy two 1Tb NVMe drives and create a 2TB RAID 0 array as super fast storage for my Hive node.
Thanks to @brianoflondon I was about to get two of these babies for $105 each.
I installed them in the Acorn Nest card and plugged them into a 16x Gen 3 PCIe slot on the X99 motherboard of my Windows Home PC server.
The X99 Motherboard only supports SATA RAID so I used Windows Disk Manager to create a stripped array (AKA RAID 0).
It is software RAID but according to reports its performance is just as good as hardware RAID.
I then ran CrystalDiskMark and was blown away - Almost 5GB/s read and 4Gb/s write speeds!!
More than 5 times faster than my 500Gb NVMe C: drive.
Storage hardware upgrade done
Upgrade of RAM to 128Gb
The X99 motherboard has 8 RAM slots and I had 96 GB of DDR4 DRAM in place. But I had another two 16Gb DRAM sticks of the same type, which I had been using to test other machines. I decided to put it back in to upgrade to 128 Gb RAM.
Unfortunately when I first installed it the system didn't recognise it and after an hour playing around I realised I had to take out RAM and reinstall it in a specific order for it to be all recognised.
RAM upgrade done
Moving WSL2 to new RAID 0 Array
I now had to move my WSL2 from the 500GB C: drive to the new 2TB RAID 0 F: drive.
I did a lot of research on the internet to find out how to do this - you can't just move the WSL2 virtual hard disk in Windows Explorer.
I eventually worked out what seemed a quite simple process using the WSL import export process and wrote out the commands so I could check them and avoid mistakes.
wsl --export docker-desktop-data docker-desktop-data.tar
wsl --unregister docker-desktop-data
wsl --import docker-desktop-data F:\Shared\WSL2\dockerdata docker-desktop-data.tar --version 2
wsl --export docker-desktop docker-desktop.tar
wsl --unregister docker-desktop
wsl --import docker-desktop F:\Shared\WSL2\dockerdesktop docker-desktop.tar --version 2
wsl --export Ubuntu-20.04 Ubuntu-20.04.tar
wsl --unregister Ubuntu-20.04
wsl --import Ubuntu-20.04 F:\Shared\WSL2\Ubuntu Ubuntu-20.04.tar --version 2
The whole process should have taken less than an hour but I put off doing this until it was 11am and I had most of a day free.
Everything went fine with the export and re-import to F: of the dockerdata and dockerdesktop.
Ubuntu exported fine.
It took about 20 min because the .tar file created was 330 Gb because it contained the Hive block_log.
Then I came to the final import.
It ran for about 15 minutes and then failed with "Unspecified Error".
How helpful of Microsoft!
I looked it up on the internet and found suggestions to import as WSL version 1 and then convert to version 2.
So I did that.
The import worked but the conversion failed with a message regarding write fail on the block_log file.
The process was using 85% of my then available 96GB RAM so I decided to do the RAM upgrade described above at this point.
I tried both direct import and WSL1 import and conversion again with more RAM. Same problem.
I tried increasing the size of the swap file. Tried both again - same problem.
I started to suspect that the problem might be related to the 256GB standard limit on WSL2 virtual hard disks.
Normally you can increase the maximum size of the virtual hard disk via a series of commands (instructions in my previous post).
But when importing a WSL2 distro that isn't possible because the virtual hard disk is only created as part of the import process.
But surely Microsoft would have code that checked the size of the import file and adjusted the maximum size of the virtual HD accordingly?
After all, the main reason people want to move their WSL2 distros to another disk is that they are running out of space.
It turns out it is impossible to directly import a WSL2 distro over 256GB in size.
This is a huge Microsoft fail as WSL2 is core Windows functionality used my millions of Devs around the world.
Big distro files are common for all sorts of reasons, not just blockchains.
I devise a solution
After a lot of research I found that there was no documentation of the reason for this problem or its solution.
I was on my own. 7 hours of working on this and a witness that was down as a result and I was really frustrated.
I realised that the problem was the block_log file and noticed that after a WSL1 import it was directly accessible in a F: folder on Windows Explorer. So I decided to copy the block_log file out of WSL and into a normal Windows directory.
The process was slow but it worked.
I deleted the block_log file in the WSL directory and ran the process to convert from WSL1 to WLS2 again.
wsl --set-version Ubuntu-20.04 2
A WSL2 virtual hard disk was created with everything except the block_log.
I then went through the process for expanding the virtual hard disk (linked above).
I then copied the block_log file back into the virtual hard disk from within Ubuntu using cp mnt/f/
But because I am not a Linux expert I made a mistake and used "/" for the destination current directory rather than "."
It all seemed to work. The virtual hard disk was 335Gb in Windows.
I loaded my latest Hive snapshot with
./run.sh loadsnap latestsnapshot2
12 minutes later I get the message that the block_log file is empty. WTF!
After checking the file size via various means I realised that maybe I copied the block_log file to the wrong place.
I go looking for it and finally find it and mv it to the right place.
I loadsnap again and it is 11:45pm and my kids will wake at 6am.
I go to bed, hoping it will finally work.
When all 3 of my little kids wake up screaming at 4am I check.
Hooray!!. My witness node is working again on the new super fast RAID Array.
12 hours of my life to do something that should have taken 1 hour if Microsoft would code properly.
Anyway I'm learning lots of skills as a Hive witness and writing this out is both cathartic and will hopefully help others.
Please vote for my Hive witness. (KeyChain or HiveSigner)