Linode crashing

[8135] Linode crashing
Status OPEN
Opened On 27-Nov-04
Closed On:
Last Updated 08-Dec-04
Description Hi Chris,

My linode is crashing now and then and not leaving any evidence that I can see. Logview just shows it sitting at the login prompt, and then it's dead. Can you tell anything from your end? So far it's crashed twice, most recently at about 1:10am on 11/27. I'm running 2.6.9, but downgrading isn't an option since I am using device mapper.

Thanks,
Dave
Update by caker
27-Nov-04
Hmm .. well, I looked at your previous console log, and all it showed was you halting the system. I have nothing more to go on from my end.

When you say it crashes, does the LPM state your Linode is powered off? There should be *something* in the console logs if there was a kernel crash.

At any rate, there are a whole mess of patches for UML for 2.6 in the queue, but it's current state is unstable. I'm waiting for a response on the UML mailing list for a fix for the most recent bug, and once I have it working I'll release a new 2.6-um kernel.

-Chris
Update by davemuench
27-Nov-04
There's nothing more to see from logview, when the linode dies the last thing seen in the lish log is the login prompt sitting there. No errors, no shutting down, no panicing. Nothing in the linode's syslog either. I was hoping there might be a core or something. LPM states that it's powered off.

I'm going to hack together a script to check it's status (from remote) in lish, and issue a boot if it's down. That should get me by until the next 2.6 kernel is available.

Please consider this another request for some sort of watchdog, I know you're probably way past busy but the users have been asking for it for more than a year now. Even if my linode never crashed (like my 2.4 linodes do), it'd still give me peace of mind.

Thanks,
Dave
Update by caker
29-Nov-04
Dave,

Would you mind trying the 2.6.10-rc2-mm3-noinc kernel, and letting me know how it goes?

Thanks,
-Chris
Update by davemuench
29-Nov-04
I've booted into it and it seems functional. 2.6.9 hadn't crashed on me since I filed the ticket, but I'll keep an eye on it. On 2.6.9 I also had vixie cron dying off periodically, I had mentioned it in the forum a while back but didn't submit a ticket. I have daemontools monitoring that to restart it when needed and I'll let you know if that situation improves also.

Thanks,
Dave
Update by davemuench
30-Nov-04
It died again last night somewhere around 2:25am. Here's what logview shows:

This is arctic.wasteland.org (Linux i686 2.6.10-rc2-mm3-noinc) 20:38:00

arctic login: Checking for the skas3 patch in the host...found
Checking for /proc/mm...found
Checking PROT_EXEC mmap in /tmp...OK
Linux version 2.6.10-rc2-mm3-noinc (root@nova1.theshore.net) (gcc version 3.3.320040412 (Red Hat Linux 3.3.3-7)) #1 Mon Nov 29 19:10:13 EST 2004

One minute it's running, the next it's starting up from my watchdog restarting it.

Dave
Update by davemuench
30-Nov-04
Oh, and it may just be coincidence, but my nightly backup job kicks off at 2:10am and the linode crashed shortly thereafter. My backup job is a rsync to another host. Previous crashes haven't coincided with a backup though, so it's possible it's just coincidental.

Dave
Update by caker
01-Dec-04
I spoke with Jeff Dike about this, but not much to report.

Can you try 2.6.9-linode9? It's completely up to date, with none of the -mm cruft.

http://www.linode.com/forums/viewtopic.php?t=1318

Thanks,
-Chris
Update by davemuench
01-Dec-04
Yeah, there's not much to go on. I am I the only linode user having this problem?

I've booted up linode9, I'll let you know if I run into glitches.

Dave
Update by davemuench
04-Dec-04
No crashes with linode9 so far, it's been over 3 days. I've got my fingers crossed.

Dave
Update by davemuench
06-Dec-04
I just noticed that crond hasn't died off with 2.6.9-linode9 either, and this is the first linode 2.6 kernel that hasn't had that problem. Looks great.

Dave
Update by davemuench
08-Dec-04
I guess I spoke too soon, it crashed this morning just after 8am. Exactly the same way - logview shows it at the login prompt, then booting back up from my watchdog issuing a boot command.

Dave
Update by davemuench
08-Dec-04
I just crashed it again, at 8:22. I ran evmsn (the ncurses interface for EVMS), and during it's startup scan it froze at "cleaning up /dev/evms". Turns out it's because the box rebooted. Wierd.

Dave