Don’t dig deeper than you have to…
Monday, October 22nd, 2007I replaced Windows XP with FC7 on my fast machine last week. Compiling and linking is much faster than on Windows, and I’ve managed to get 32-bit builds out of a 64-bit OS. But every once in a while I have problems with processes randomly hanging on startup at 100% CPU. I’m guessing about 1 out of every 1000 processes. It was all kinds of processes: /bin/sh, /usr/bin/perl, nsinstall for mozilla. There was nothing going on in strace, so I attached a hanging process in gdb:
#0 0x0000003ee327d323 in init_cacheinfo () from /lib64/libc.so.6 #1 0x0000003ee321d8e6 in __libc_global_ctors () from /lib64/libc.so.6 #2 0x0000003ee2e0d11b in call_init () from /lib64/ld-linux-x86-64.so.2 #3 0x0000003ee2e0d225 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2 #4 0x0000003ee2e00a9a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
Stepping through the assembly, it’s repeating this loop infinitely:
0x0000003ee327d31d <init_cacheinfo+141>: mov %esi,%eax 0x0000003ee327d31f <init_cacheinfo+143>: mov %edi,%ecx 0x0000003ee327d321 <init_cacheinfo+145>: cpuid 0x0000003ee327d323 <init_cacheinfo+147>: mov %eax,%r8d 0x0000003ee327d326 <init_cacheinfo+150>: shr $0x5,%eax 0x0000003ee327d329 <init_cacheinfo+153>: add $0x1,%edi 0x0000003ee327d32c <init_cacheinfo+156>: and $0x7,%eax 0x0000003ee327d32f <init_cacheinfo+159>: cmp %r10d,%eax 0x0000003ee327d332 <init_cacheinfo+162>: jne 0x3ee327d31d <init_cacheinfo+141>
I eventually tracked down an interesting comment/patch here. Now all I need to do is figure out how to get this fix onto my machine with the least amount of pain and suffering.
If you are a glass-half-full person, this is why open-source is cool: you have the ability to see all the relevant code, find the problem, and fix it (or at least find the person who already fixed it). If you are a glass-half-empty, you’d be complaining that FC7 just sucks and why haven’t they deployed this fix already. And you’d also complain that it’s really hard to figure out how to apply a patch to a system package and rebuild it. I’m just happy it was happening to me and not my wife. And I’m still a little confused why it only happens occasionally.