What happened?
Essentially our database got messed up and with the help of poot222 we were able to rebuild it.
We believe the corruption was a result of a bug in the hypervisor that caused the virtual disk to disconnect.
Why did it take so long?
System logs showed the server waiting for disk operations. At the same time disk usage showed 100% while no read or write operations were being done. Every few seconds this happened, causing big lag on the site.
With CPU in wait state it becomes totally unresponsive, which also means unresponsive to any debugging I was trying to do.
Every few hours or so the entire vm would crash (probably causing database corruption), the logs showed the system lost connection to the harddisk was the cause.
While this was going on the host system did not show these loads on either the harddisk or cpu, it DID however show kernel panic in relation to the harddisk and hypervisor driver.
I discovered this was a known bug that was resolved in a new version, we updated, preventing further crashes.
Unfortunately the harm was already done and our backups also contained the corrupted tables.
With now a stable server we had to clean out the database and rebuild it. All is up and running again.
This is where we are at now. Please report any problems you run into.
Thanks to poot22 and Indy for helping to fix this!
Thank you all for being patient!