Don’t dig deeper than you have to…

Monday, October 22nd, 2007

I replaced Windows XP with FC7 on my fast machine last week. Compiling and linking is much faster than on Windows, and I’ve managed to get 32-bit builds out of a 64-bit OS. But every once in a while I have problems with processes randomly hanging on startup at 100% CPU. I’m guessing about 1 out of every 1000 processes. It was all kinds of processes: /bin/sh, /usr/bin/perl, nsinstall for mozilla. There was nothing going on in strace, so I attached a hanging process in gdb:

#0  0x0000003ee327d323 in init_cacheinfo () from /lib64/
#1  0x0000003ee321d8e6 in __libc_global_ctors () from /lib64/
#2  0x0000003ee2e0d11b in call_init () from /lib64/
#3  0x0000003ee2e0d225 in _dl_init_internal () from /lib64/
#4  0x0000003ee2e00a9a in _dl_start_user () from /lib64/

Stepping through the assembly, it’s repeating this loop infinitely:

0x0000003ee327d31d <init_cacheinfo+141>:        mov    %esi,%eax
0x0000003ee327d31f <init_cacheinfo+143>:        mov    %edi,%ecx
0x0000003ee327d321 <init_cacheinfo+145>:        cpuid  
0x0000003ee327d323 <init_cacheinfo+147>:        mov    %eax,%r8d
0x0000003ee327d326 <init_cacheinfo+150>:        shr    $0x5,%eax
0x0000003ee327d329 <init_cacheinfo+153>:        add    $0x1,%edi
0x0000003ee327d32c <init_cacheinfo+156>:        and    $0x7,%eax
0x0000003ee327d32f <init_cacheinfo+159>:        cmp    %r10d,%eax
0x0000003ee327d332 <init_cacheinfo+162>:        jne    0x3ee327d31d <init_cacheinfo+141>

I eventually tracked down an interesting comment/patch here. Now all I need to do is figure out how to get this fix onto my machine with the least amount of pain and suffering.

If you are a glass-half-full person, this is why open-source is cool: you have the ability to see all the relevant code, find the problem, and fix it (or at least find the person who already fixed it). If you are a glass-half-empty, you’d be complaining that FC7 just sucks and why haven’t they deployed this fix already. And you’d also complain that it’s really hard to figure out how to apply a patch to a system package and rebuild it. I’m just happy it was happening to me and not my wife. And I’m still a little confused why it only happens occasionally.