Don’t dig deeper than you have to…

I replaced Windows XP with FC7 on my fast machine last week. Compiling and linking is much faster than on Windows, and I’ve managed to get 32-bit builds out of a 64-bit OS. But every once in a while I have problems with processes randomly hanging on startup at 100% CPU. I’m guessing about 1 out of every 1000 processes. It was all kinds of processes: /bin/sh, /usr/bin/perl, nsinstall for mozilla. There was nothing going on in strace, so I attached a hanging process in gdb:

#0  0x0000003ee327d323 in init_cacheinfo () from /lib64/libc.so.6
#1  0x0000003ee321d8e6 in __libc_global_ctors () from /lib64/libc.so.6
#2  0x0000003ee2e0d11b in call_init () from /lib64/ld-linux-x86-64.so.2
#3  0x0000003ee2e0d225 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#4  0x0000003ee2e00a9a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2

Stepping through the assembly, it’s repeating this loop infinitely:

0x0000003ee327d31d <init_cacheinfo+141>:        mov    %esi,%eax
0x0000003ee327d31f <init_cacheinfo+143>:        mov    %edi,%ecx
0x0000003ee327d321 <init_cacheinfo+145>:        cpuid  
0x0000003ee327d323 <init_cacheinfo+147>:        mov    %eax,%r8d
0x0000003ee327d326 <init_cacheinfo+150>:        shr    $0x5,%eax
0x0000003ee327d329 <init_cacheinfo+153>:        add    $0x1,%edi
0x0000003ee327d32c <init_cacheinfo+156>:        and    $0x7,%eax
0x0000003ee327d32f <init_cacheinfo+159>:        cmp    %r10d,%eax
0x0000003ee327d332 <init_cacheinfo+162>:        jne    0x3ee327d31d <init_cacheinfo+141>

I eventually tracked down an interesting comment/patch here. Now all I need to do is figure out how to get this fix onto my machine with the least amount of pain and suffering.

If you are a glass-half-full person, this is why open-source is cool: you have the ability to see all the relevant code, find the problem, and fix it (or at least find the person who already fixed it). If you are a glass-half-empty, you’d be complaining that FC7 just sucks and why haven’t they deployed this fix already. And you’d also complain that it’s really hard to figure out how to apply a patch to a system package and rebuild it. I’m just happy it was happening to me and not my wife. And I’m still a little confused why it only happens occasionally.

Atom Feed for Comments 3 Responses to “Don’t dig deeper than you have to…”

  1. ignacio Says:

    https://bugzilla.redhat.com/show_bug.cgi?id=324081

  2. Alex Vincent Says:

    Pardon me for asking, Ben, but does your coming off WinXP affect future MozillaBuild editions?

  3. Diego Says:

    Well, it’s a “hardware bug”, apparently solvable with a BIOS update, just blame intel :)

Leave a Reply