Click to See Complete Forum and Search --> : core dumps on fedora. where/if?


ArtVandelay
04-16-2007, 06:47 AM
I'm trying to debug a program I wrote that has a weird crash that
happens only on rare occasions. (doing this on Fedora 5)

I always seem to not be able to reproduce it when I run it
from gdb.

It's my understanding that you can get your program back
to that horrible state again within gdb using the core file.

It doesn't say "core dumped" when it crashes though, so
the first quesion would be "is there a core file?"

If there is, where is it? I ran find from the root dir
and the only thing that looked non-application specific
was /dev/core, but gdb wasn't able to use this, suggesting
that this isn't the matching core file.

This problem would be easy if I was smart enough to
run from gdb every time, but I'm not smart enough and it only
ever happens when I don't (Murphy's Law).

Here's the error, by the way:


[gjduff@minidex y25]$ ./zexe
*** glibc detected *** ./zexe: free(): invalid pointer: 0x099cfca8 ***
======= Backtrace: =========
/lib/libc.so.6[0x1fda68]
/lib/libc.so.6(__libc_free+0x78)[0x200f6f]
/usr/lib/libSDL-1.2.so.0(SDL_FreeSurface+0xc0)[0x6584ff0]
./zexe[0x804d9bc]
./zexe[0x804cfeb]
/lib/libc.so.6(__libc_start_main+0xdc)[0x1af4e4]
./zexe(__gxx_personality_v0+0x7d)[0x8049131]
======= Memory map: ========
00110000-00131000 r-xp 00000000 fd:00 1052207 /usr/lib/libjpeg.so.62.0.0
00131000-00132000 rwxp 00020000 fd:00 1052207 /usr/lib/libjpeg.so.62.0.0
0017d000-00196000 r-xp 00000000 fd:00 5821472 /lib/ld-2.4.so
...
other various crap / libraries in memory
...
05444000-05526000 r-xp 00000000 fd:00 1060220 /usr/lib/libstdc++.so.6.0.8
0552600Aborted

bwkaz
04-16-2007, 06:45 PM
Whee, memory corruption! ;)

I'm going to guess that you're free()ing the same pointer twice, although you might just be handing free() and invalid pointer value. If this helps you narrow down the problem, then great.

If not, then core files are not system-wide; /dev/core is the kernel image (on my system at least, it's a symlink to /proc/kcore, which is only useful if you want to load the kernel into a debugger). What you need is the file that gets created every time the program crashes; it will be in whatever the process's current directory is. (Usually this is the directory you started it from.) Generally the files are named "core", but I think you can configure the kernel to name them differently, too. Hopefully they have "core" in the name at least.

However, core files will not be created if the current core-file-size ulimit of the process that crashed is set to zero. Run ulimit -c to see what it's set to, and if it's set to zero, then run ulimit -c unlimited and be sure to start the program from that shell. (The ulimit shell builtin only affects the shell itself, but the value is inherited. You can't globally set any ulimit across the entire system. So you have to run the program from the shell that the ulimit was set to unlimited in. You can exit the shell afterward if you want to, but you might as well let it go so you can see the output.)

Also, note that you will need debugging information in your program, otherwise you won't get any useful information from the debugger. And if you have debugging information, then the stack backtrace that glibc prints out should actually have line numbers on it for the functions inside ./zexe, so you'll be able to tell much more information even without the debugger. As it is, all you have are addresses; with debugging info, you should be able to get function names and line numbers of each call.

ArtVandelay
04-17-2007, 02:01 AM
Thanks. ulimit -c unlimited is exactly what I was looking for.

It turned out I solved the problem without needing the core
dump as I managed to make it happen while in gdb by fluke.
(and yes, those line numbers in the stacktraces are exactly
the reason I wanted to do this in the first place. Those have
helped me solve many memory problems)

But it's good to know that in the future I can actually have
core dumps. I do compile with debugging info until I finally
put it into the wild.

I wasn't free()ing twice. There was just this special case
where no memory gets allocated so I just did a check for that
case in the destructor.

bwkaz
04-17-2007, 06:24 PM
I wasn't free()ing twice. There was just this special case where no memory gets allocated so I just did a check for that case in the destructor. Ah, OK, never mind then. I seem to have guessed wrong. Well, double-free()s are one of the more common causes for that kind of error that I've seen.

But handing a random pointer to free() is another, and it sounds like that might be what happened. (You weren't handing NULL to free(), at least.) In any case, glad you got it figured out. :)