I’m writing this at 11:20AM EST (ignore whatever the timestamp here says). As of now, the store that built the server is almost done its tests and hasn’t been able to replicate any of the crashes that I got. At about 1:30 I’ll head down there and walk them through a series of tests that should crash it. If it doesn’t crash, maybe we got lucky and either solidified a loose component or the bios and firmware upgrades we did solved the problem.
So assuming I can’t make it crash in a test environment, I’ll plug the machine back in this afternoon or early evening and see what happens, although I’m fully prepared to build a new machine from scratch if it comes to that (and that’s what I expect will happen, personally). Oh, and if you see people asking elsewhere what’s up with IAM, please let them know that I’m posting updates here on ModBlog.
And I’ll add again for anyone just seeing this that no data is lost, and it’s both unharmed on the server, and additionally backed up as well. So when we come back up, there should be no unpleasant surprises.
UPDATE: 1:30PM – The store has been unable to make it crash. So I’m going to go get it, plug it in, and see what happens. If it works, we’re up today. If it doesn’t, we’re up most likely tomorrow (on a new box).
UPDATE: 2:30PM – The site is currently online experimentally on the old hardware. Even though every piece has been stress tested and comes out OK, I’m not sold this will solve it. However, I have now got a duplicate of every piece of hardware in this box, so if it doesn’t stay stable tonight I can rebuild it.