Don’t dig deeper than you have to…
I replaced Windows XP with FC7 on my fast machine last week. Compiling and linking is much faster than on Windows, and I’ve managed to get 32-bit builds out of a 64-bit OS. But every once in a while I have problems with processes randomly hanging on startup at 100% CPU. I’m guessing about 1 out of every 1000 processes. It was all kinds of processes: /bin/sh, /usr/bin/perl, nsinstall for mozilla. There was nothing going on in strace, so I attached a hanging process in gdb:
#0 0x0000003ee327d323 in init_cacheinfo () from /lib64/libc.so.6 #1 0x0000003ee321d8e6 in __libc_global_ctors () from /lib64/libc.so.6 #2 0x0000003ee2e0d11b in call_init () from /lib64/ld-linux-x86-64.so.2 #3 0x0000003ee2e0d225 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2 #4 0x0000003ee2e00a9a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
Stepping through the assembly, it’s repeating this loop infinitely:
0x0000003ee327d31d <init_cacheinfo+141>: mov %esi,%eax 0x0000003ee327d31f <init_cacheinfo+143>: mov %edi,%ecx 0x0000003ee327d321 <init_cacheinfo+145>: cpuid 0x0000003ee327d323 <init_cacheinfo+147>: mov %eax,%r8d 0x0000003ee327d326 <init_cacheinfo+150>: shr $0x5,%eax 0x0000003ee327d329 <init_cacheinfo+153>: add $0x1,%edi 0x0000003ee327d32c <init_cacheinfo+156>: and $0x7,%eax 0x0000003ee327d32f <init_cacheinfo+159>: cmp %r10d,%eax 0x0000003ee327d332 <init_cacheinfo+162>: jne 0x3ee327d31d <init_cacheinfo+141>
I eventually tracked down an interesting comment/patch here. Now all I need to do is figure out how to get this fix onto my machine with the least amount of pain and suffering.
If you are a glass-half-full person, this is why open-source is cool: you have the ability to see all the relevant code, find the problem, and fix it (or at least find the person who already fixed it). If you are a glass-half-empty, you’d be complaining that FC7 just sucks and why haven’t they deployed this fix already. And you’d also complain that it’s really hard to figure out how to apply a patch to a system package and rebuild it. I’m just happy it was happening to me and not my wife. And I’m still a little confused why it only happens occasionally.
October 22nd, 2007 at 11:14 pm
https://bugzilla.redhat.com/show_bug.cgi?id=324081
October 23rd, 2007 at 3:00 am
Pardon me for asking, Ben, but does your coming off WinXP affect future MozillaBuild editions?
October 23rd, 2007 at 12:57 pm
Well, it’s a “hardware bug”, apparently solvable with a BIOS update, just blame intel :)