t0t3r
05-14-2004, 06:14 AM
anyone experiences with the official linux intel c++ compiler? .... on thi site they say something like 30% performance gain.
|
Click to See Complete Forum and Search --> : Intel c++ Compiler t0t3r 05-14-2004, 06:14 AM anyone experiences with the official linux intel c++ compiler? .... on thi site they say something like 30% performance gain. JayMan8081 05-14-2004, 06:24 AM We use it at my work for some of our larger programs. It does give a performance boost, but I wouldn't say that it's 30%. We've found that using good optimizations with g++ also increases performance to within a couple percent of the Intel compiler's performance. t0t3r 05-14-2004, 06:44 AM so is it worth a try ? ... compiling kernel etc? JayMan8081 05-14-2004, 08:30 AM I would wait and see what some other people have to say. It might give a better performance boost than what I saw at work if it's something where the libraries being used can be recompiled with icc as well. The application we use it for uses OpenGL and wxWidgets so part of the lack of performance gain might be in those libraries. I know we did recompile wxWidgets with icc. If I remember correctly to get the compiler had a pretty hefty cost too so I don't know that it would be worth it for a home user. t0t3r 05-14-2004, 03:25 PM fo home users its free .... theres a version for non commercial use. so if u compile most og the needed libraries too with icc there shoulfbe some performance gain. JayMan8081 05-14-2004, 04:31 PM I would give it a try the next time you set up a system and see if it really makes a difference. I wonder if there would be anyway to do like a Gentoo stage1 install using the Intel compiler that way everything would be extremely optimized. Strogian 05-14-2004, 04:37 PM I wonder if there would be anyway to do like a Gentoo stage1 install using the Intel compiler that way everything would be extremely optimized. This is Linux ... of course there is a way! t0t3r 05-14-2004, 05:06 PM thought of something like that too ... "complet" system compiled with icc ...! in faczt i just downloaded it ... and i will try to get it installed and work with it GaryJones32 05-14-2004, 11:30 PM this is only worth the trouble if you are running P4 If you are running P4 30% is a very low end estimate of performance increase. I did my kernel with it and it's like a different machine. The kernel throughput is amazing -- everything is zing bling. It's like night and day -- for some programs i would say into the 100% increase range and beyond. I saw on the web a P4 benchmark with some common benchmark program that came out 600% performance gain when they began ramping up icc optimization flags. (they had to put a *possible error* footnote on the results it was so unbelievable) -- like always in real life high levels of optimzation breaks things. athough i think better understanding of the flags might help -ipo seems to create ld errors and when i try to use -O3 with the kernel it automatically scales it back to -O2 the compiler uses vectorization and the P4 see2 registers and most importantly of all for the multithreaded P4 recognized as two different processors by the kernel if you write say a simple (for) loop icc will fork out each aspect of the loop and do the entire thing as parallel threads all at once -- YEA !!!!!!!!! that's what i'm talking about. needless to say i'm impressed. And no i don't think it's up to the programmer to do that kind of crap -- if that were the case we could just all write machine code and forget higher level languages altogether. As for compiling the system with it. It's real picky about stuff gcc will let you get away with like funky memory allocation and like that so Linux core will not compile straight off without major hacking troubles i bet. I have observed the gentoo people getting pretty far but ending up with a segfaulting mess. there is a utility on the net called gccicc that will allow both compilers to do stuff and can accept flags for both in a --- divided flag line --- (failover to gcc when icc gets stuck either for files you choose or you can set it to do it automagically) (the binary code is compatable) google for the icc kernel patches and you will find gccicc major hacking of Makefiles ahead to use it alot and possible still object linking troubles with previous gcc produced files or just plain runtime troubles. t0t3r 05-15-2004, 07:54 AM uhm very interesting .... sound incredible. As far as the kernel concerns ... are there any problems in compiling it? anyhackin patching to do before or something? ... can u post alittle howto how to do it . i just worked with gcc so far ... but after ure post im really keen on testing icc. t0t3r 05-15-2004, 08:52 AM ok i now got the icc for non commercial use with a license file via mail from intel. i installed icc. the i tried to install this flexlm license server ... all is done. but when i try to start ./iccbin it says no license file could be obtained. dunno if i actually need a licence. How ido i get that thing working and how can i test it with a simple c prog? Moroni 05-15-2004, 09:08 AM When you refer to LFS running a lot better than Fedora, what do you mean? I just installed Mandrake on my laptop, but was thinking to go to REdHat so I can use the Ximian Desktop, but when I saw your post, made me think about trying something else to get a better performance. Although I will use my linux box at work not for developing but for my usual duties (monitor systems thru vnc, email, office documents, browse several company sites, etc). Do you think is worth the pain to get the icc in place in my distribution for this purposes only? Waiting your comments and, if possible, a detailed How-To :) t0t3r 05-15-2004, 09:18 AM think compiling a whole system is really a hard struggle ... but in fact compiling the most used libraries the kernel and progs should give a performance boost. lol but in fact the icc installation sux hard if u do not have rpm based system. maccorin 05-15-2004, 09:41 AM Originally posted by GaryJones32 this is only worth the trouble if you are running P4 If you are running P4 30% is a very low end estimate of performance increase. I did my kernel with it and it's like a different machine. The kernel throughput is amazing -- everything is zing bling. It's like night and day -- for some programs i would say into the 100% increase range and beyond. I saw on the web a P4 benchmark with some common benchmark program that came out 600% performance gain when they began ramping up icc optimization flags. (they had to put a *possible error* footnote on the results it was so unbelievable) -- like always in real life high levels of optimzation breaks things. athough i think better understanding of the flags might help -ipo seems to create ld errors and when i try to use -O3 with the kernel it automatically scales it back to -O2 the compiler uses vectorization and the P4 see2 registers and most importantly of all for the multithreaded P4 recognized as two different processors by the kernel if you write say a simple (for) loop icc will fork out each aspect of the loop and do the entire thing as parallel threads all at once -- YEA !!!!!!!!! that's what i'm talking about. needless to say i'm impressed. And no i don't think it's up to the programmer to do that kind of crap -- if that were the case we could just all write machine code and forget higher level languages altogether. As for compiling the system with it. It's real picky about stuff gcc will let you get away with like funky memory allocation and like that so Linux core will not compile straight off without major hacking troubles i bet. I have observed the gentoo people getting pretty far but ending up with a segfaulting mess. there is a utility on the net called gccicc that will allow both compilers to do stuff and can accept flags for both in a --- divided flag line --- (failover to gcc when icc gets stuck either for files you choose or you can set it to do it automagically) (the binary code is compatable) google for the icc kernel patches and you will find gccicc major hacking of Makefiles ahead to use it alot and possible still object linking troubles with previous gcc produced files or just plain runtime troubles. I would _love_ to see these supposed benchmarks, please link. GaryJones32 05-15-2004, 10:37 PM Questions -- that's fair -- i started it ! As far as the kernel concerns need kernel patch -- works for kernel 2.6.3 and some older http://www.pyrillion.org/index.html?showframe=linuxkernelpatch.html comes with gccicc and instructions -- works fine to compile nvidia for icc kernel ./NVIDIA-Linux-x86-1_0-5336-pkg1.run -x cd ./NVIDIA-Linux-x86-1_0-5336-pkg1 export IGNORE_CC_MISMATCH=1 export CC=gccicc export ICC2GCCFILES="DELEGATE" make install When you refer to LFS running a lot better than Fedora, what do you mean? LFS compiled on machine runs lots faster and everything works perfectly takes a long time to build and you need a blank partition lots more simple and to the point. start off with building your own kernel -- that will help alot Fedora kernel even has filesystem debugging turned on -- are they crazy??? I would _love_ to see these supposed benchmarks, please link. ok first by the people that supposedly write gcc allegedly it is rumored but unsubstantiated as of yet http://gcc.gnu.org/ml/gcc/2004-05/msg00021.html shows icc wins by 550% on mole test 200% on alma about 50% overall i don't really remember the exact site i mentioned but i can google http://www.coyotegulch.com/reviews/intel_comp/intel_gcc_bench2.html this one shows the older intel 7 sometimes it a small amount behind gcc -- sometimes it's as much as 400% faster http://www.mcsr.olemiss.edu/parallelogram/01_03/icc.html i quote from this one on the whetstone test with -wp_ipo This is a truly amazing performance increase, approximately 675%! It is certainly not typical, however, so your mileage may vary. i think olemiss is a suposed university GaryJones32 05-15-2004, 10:42 PM Originally posted by t0t3r ok i now got the icc for non commercial use with a license file via mail from intel. i installed icc. the i tried to install this flexlm license server ... all is done. but when i try to start ./iccbin it says no license file could be obtained. dunno if i actually need a licence. How ido i get that thing working and how can i test it with a simple c prog? just put a copy of that license in /opt/intel_cc_80/licenses if that don't work set the environmental variable INTEL_LICENSE_FILE=/opt/intel_cc_80/licenses maccorin 05-16-2004, 03:35 AM I would _love_ to see these supposed benchmarks, please link. ok first by the people that supposedly write gcc allegedly it is rumored but unsubstantiated as of yet http://gcc.gnu.org/ml/gcc/2004-05/msg00021.html shows icc wins by 550% on mole test 200% on alma about 50% overall i don't really remember the exact site i mentioned but i can google http://www.coyotegulch.com/reviews/intel_comp/intel_gcc_bench2.html this one shows the older intel 7 sometimes it a small amount behind gcc -- sometimes it's as much as 400% faster http://www.mcsr.olemiss.edu/parallelogram/01_03/icc.html i quote from this one on the whetstone test with -wp_ipo This is a truly amazing performance increase, approximately 675%! It is certainly not typical, however, so your mileage may vary. i think olemiss is a suposed university [/B] Ok, interesting. I would like to point out that most of those benchmarks are by people that simply don't understand optimization on gcc, while i do expect icc is better by a bit, i doubt it's that much example. one of the benchmarks use -O9.... is he stupid? or did he just not RTFM all of them used at least -O3 and -funroll-loops (which is pointless if you using -O3 anyways....) this makes for large amounts of cache thrashing and it has been shown that -Os or sometimes -O2 even is faster (in some applications, it really depends on your code) another interesting note is _none_ of them used -mfpmath=sse, which would probably give you the closest results to what icc does, as it uses the sse floating point instruction set. now, that said, icc probably _is_ gonna put out a bit faster code for an intel cpu, but the 600% statement and the like is just unrealistic. t0t3r 05-16-2004, 03:59 AM ok ... as i cant get icc running on my debian system i will check out icc on another machine. rpm really sux. could some one recompilie me a kernel with icc when i send the config? .... wanna just see if its worth the trouble. Do some performacne benchs etc would be great .... t0t3r 05-16-2004, 08:14 AM with a workaround i got it runnin .... when i wann compile the kernel after patching it ... which went through without errors i get the following msg CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o CC init/do_mounts.o LD init/mounts.o /bin/sh: line 1: xild: command not found make[1]: *** [init/mounts.o] Error 127 make: *** [init] Error 2 do i have to set some variables before doing "make bzImage" ... ?? in fact i tried with a 2.4.20 kernel i also get errors a /usr/src/linux-2.4.20/arch/i386/lib/lib.a \ --end-group \ -o vmlinux net/network.o(.text+0x981f): In function `br_write_unlock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x9837): In function `br_write_lock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x16c13): In function `br_write_unlock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x16c2b): In function `br_write_lock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x3fb4b): In function `br_write_unlock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x3fb63): more undefined references to `__br_lock_usage_bug' follow make: *** [vmlinux] Error 1 ..... any ideas? Strogian 05-16-2004, 11:05 AM So what you are saying is, if you don't know much about gcc optimizations, (which I'll bet describes most people ;) including me ) then icc will give you a 400% speed boost? t0t3r 05-16-2004, 12:33 PM ilol .... very realistiv values. i spoke with a friend who works on intel machines and has some experiences with different compilers. icc will give u a speed boost. it makes better/faster optimized code compared to gcc. in fact the performacne gain is approx between 10% and 20% .... on some apps more then 500% he had a prog compiled with gxx it took 12 secs to do the job ... compiled with icc it took 0.2 secs. But such examples are very rare. GaryJones32 05-16-2004, 04:42 PM Originally posted by maccorin Ok, interesting. I would like to point out that most of those benchmarks are by people that simply don't understand optimization on gcc Is this to say they DO know alot about optimizing with icc but DON"T know alot about optimizing with gcc ??????? That seems a little biased and presumtive yes. _none_ of them used -mfpmath=sse this is true and a valid point -- i noticed that too P4 uses sse2 not sse so the flag should be -mfpmath=sse2 however to simple say that would make the two equal without data is invalid. -O9 harms nothing and is exactly the same as -O3 funroll-loops is not a part of -O3 and is a good flag to use if the loop isn't too big for the cache cache thrashing of course is an issue especially for P4 with it's shared cache plus parallelism. Also has to do with smpt kernel sceduling or choice of processors during scheduling. obviously by the bancmarks icc is using parralelism (if that's what -ipo does ???) and not causing thrashing. It's fair to assume that better flags for icc can also yeild faster code than the tests. gcc does not use parallelizing of loops so i think i fail to see how -funroll-loops can cause cache thrashing accept in multi-threaded apps and these simple artificial benchmarks are not multi-threaded so i don't get how that flag is harming the results. I don't think anybody is trying to say the flags used for either compiler are perfect for all situations or even perfect for the tests. GaryJones32 05-16-2004, 06:25 PM Originally posted by t0t3r with a workaround i got it runnin .... when i wann compile the kernel after patching it ... which went through without errors i get the following msg CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o CC init/do_mounts.o LD init/mounts.o /bin/sh: line 1: xild: command not found make[1]: *** [init/mounts.o] Error 127 make: *** [init] Error 2 do i have to set some variables before doing "make bzImage" ... ?? in fact i tried with a 2.4.20 kernel i also get errors a /usr/src/linux-2.4.20/arch/i386/lib/lib.a \ --end-group \ -o vmlinux net/network.o(.text+0x981f): In function `br_write_unlock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x9837): In function `br_write_lock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x16c13): In function `br_write_unlock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x16c2b): In function `br_write_lock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x3fb4b): In function `br_write_unlock': : undefined reference to `__br_lock_usage_bug' net/network.o(.text+0x3fb63): more undefined references to `__br_lock_usage_bug' follow make: *** [vmlinux] Error 1 ..... any ideas? on the first one /opt/intel_cc_80/bin/xild is a part of the intel install so you have to add /opt/intel_cc_80/bin to your PATH variable and try that one again on the second one i get these problems with the object files not being compatible between the two compilers alot. Especially with optimizations. you can try taking the -unroll flag out of the top level Makefile in the +OPTFLAGS setting ?????? i don't even see -unroll in the list of flags for icc version 8 so i'm stumped on that one -- perhaps those patches were written for version 7 ???? anyway i user kernel 2.6.3 and gcc 3.3.1 and icc version 8 and it worked it says to use gcc 2.95 for the earlier kernel versions perhaps that's why maccorin 05-16-2004, 07:10 PM Originally posted by GaryJones32 ]Is this to say they DO know alot about optimizing with icc but DON"T know alot about optimizing with gcc ??????? That seems a little biased and presumtive yes. I don't know how much they know about icc, because I know nil about it's flags. _none_ of them used -mfpmath=sse this is true and a valid point -- i noticed that too P4 uses sse2 not sse so the flag should be -mfpmath=sse2 there is no -mfpmath=sse2 on gcc, this is one area that icc has it licked, but then again... sse2 is only useful if you have an intel cpu ;p sse actually works on some of the later athlons however to simple say that would make the two equal without data is invalid. that is true, just as "benchmarking" w/ dumb flags doesn't make anything valid -O9 harms nothing and is exactly the same as -O3 I know that, but it just shows incompetence, not the type of person i would want to trust funroll-loops is not a part of -O3 you got me there, i just checked the docs and it seems i was remembering incorrectly and is a good flag to use if the loop isn't too big for the cache it is to big 99% of the time, remember we are talking x86 here, last I checked, my shiny new XP2800 only had a 512K cache cache thrashing of course is an issue especially for P4 with it's shared cache plus parallelism. Also has to do with smpt kernel sceduling or choice of processors during scheduling. I am assuming you mean using multiple pipelines in the FPU by parallelism, if i'm wrong correct me. But anyways, that has nothing to do w/ cache thrashing AFAIK (although there may be some way that it does that i'm missing obviously by the bancmarks icc is using parralelism (if that's what -ipo does ???) and not causing thrashing. see above. It's fair to assume that better flags for icc can also yeild faster code than the tests. that is true, but one of the things i have read about icc is that it's defaults are much better chosen (for speed purposes) then gcc's, that would lead me to believe that you could optimize it a bit more, but not as drastically as you could gcc gcc does not use parallelizing of loops so i think i fail to see how -funroll-loops can cause cache thrashing accept in multi-threaded apps and these simple artificial benchmarks are not multi-threaded lets see... 10 instructions looped, or 10000s of instructions. think about that so i don't get how that flag is harming the results. I don't think anybody is trying to say the flags used for either compiler are perfect for all situations or even perfect for the tests. that's true, but i have read reports online of 20% increase, 10% increase and so on, _not_ of 600% and then when you give me a benchmark "proving" it by someone that uses -O9 that means nothing GaryJones32 05-17-2004, 01:59 AM OK this isn't any fun so i'm not going to do this anymore the numbers are the numbers... that's the usefull thing about numbers. they are not a belief system or a personal character issue. they just are. from gcc 3.3.3 changelogs The following changes have been made to the IA-32/x86-64 port: SSE2 and 3dNOW! intrinsics are now supported. you wrote sse2 is only useful if you have an intel cpu hello !!!!!! we are or were trying to discuss the INTEL compiler made specifically for INTEL cpu. This stuff is not so simple as we make out and this ain't your grandmas cpu and compiler I am far from competent in these matters but: no i am not talking about pipelining though that is the point of unrolling loops. loops are unrolled specifically so they can be piped and yes i guess that is an older form of parallelism. Which is why it is so fast, there is better and more predictable scheduling of memory access which allows parallelism. That is -- the entire loop (or close to the entire loop is loaded and used before it is evicted from the cache) Even if the loop is huge the first part is just evicted and more (the spilled part) is loaded and dealt with. This is the opposite of cache thrashing. as a matter of fact this by definition precludes cache thrashing. Thrashing or (data cache missing) is the swapping in and out of different data elements mapped over and over to the same cache location. nested loops that are NOT unrolled can cause data cache misses or thrashing with each call to the other array wiping out the first array or a part of it in cache and vice versa to varying degrees. This is more true not less for the P4 - it's L1 cache is only 8K for speed - P4 L2 has huge bandwidth and is 256K I still fail to see how unrolling loops -- even really huge ones can cause thrashing. I do see how it could cause instruction thrashing BUT Pentium4 replaces the conventional L1 instruction cache with an execution trace cache that can hold 12,000 micro- ops. Perhaps someone else can enlighten the discussion. what i was trying to refer to was openPM and Hyper-Threading (thread-level-parallelism) and the SMP kernel and their relationship to potential cache thrashing this is also something that gcc doesn't begin to try to deal with nor are our benchmark examples for icc using -openmp or -parallel or -par_threshold[n]. so our examples have not touched on auto-parallelization of loops at all which is where one real potential for speed lies in icc. Also i think using -xN instead of -xW turns off vectorization where a loop using see2 is stripped into just one single instruction by icc but i'm not sure. FWIW: since it was said sse would make the results equal i'm looking now at the results of an almabench (floating point math) on a P4 2.8 gcc w/ -O3 -march=pentium4 -mcpu=pentium4 -msse -msse2 -mmmx -mfpmath=sse -ffast-math 30.6 seconds gcc w/ -O3 -march=pentium4 -mcpu=pentium4 -msse -msse2 -mmmx -mfpmath=sse -ffast-math -funroll-loops 30.4 seconds note to self: unroll loops improved not hurt ! gcc w/ -O3 -march=pentium4 -mcpu=pentium4 -msse -msse2 -mmmx -funroll-loops -ffast-math 28.8 seconds note to self: better without -mfpmath=sse icc w/ -xW -tpp7 -O3 -ipo -i_dynamic -openmp 8.9 seconds a 220% win for icc over gcc even with your flag suggestion which turned out to be quite wrong So don't argue just for the sake of arguing for some idealogical reason -- that's booring ! maccorin 05-17-2004, 09:14 AM 220 is a far cry from 600, but in any case, yes, you win... ok? gonna quit taking this personally now? side note: all those tests used -O3 which i specifically recommended against (for most cases), but it won't make anywhere near the difference that it would need to catch up anyway t0t3r 05-17-2004, 10:46 AM anyway ... i got 2.6.3 with nvidia working just as u said. kinda stupid that xild issue. and teh 2.4 pacth was made for version 7. kernel just runs very performant. in fact lets just say icc for intel cpus is better than gcc. :) GaryJones32 05-19-2004, 12:55 AM Originally posted by maccorin yes, you win... ok? gonna quit taking this personally now? t0t3r Yea i wasn't really trying to say i'm right and you were wrong -- benchmarking is weird it's more about interpreting the data correctly than the data itself. and interpreting the data is really hard cause it's all over the place. right now i'm trying to figure out an issue where a certain gcc flag is increasing math overhead speed by over 3000% -- large number yes but what the heck does it mean in terms of overall perfomance?? i don't have a clue. certainly floating point math like we were taling about with icc amounts to like .00001% of the overall performance so it really ain't important. more like a curiosity for us who follow stupid crap like that. i ran some tests on unrolling the loops like what you were saying and -funroll-loops seems to work fine but i did get some extra data cache missing with -funroll-all-loops so what you were saying had some validity. here are the results if you are interested. used recursion test Tower of Hanoi for some(no) reason?????? might be more pronounced with another with -march=pentium4 -O3 -s -ftracer -momit-leaf-frame-pointer Usage: ./hanoi duration [disks] ==25427== ==25427== I refs: 12,475 ==25427== I1 misses: 368 ==25427== L2i misses: 219 ==25427== I1 miss rate: 2.94% ==25427== L2i miss rate: 1.75% ==25427== ==25427== D refs: 6,483 (4,663 rd + 1,820 wr) ==25427== D1 misses: 381 ( 343 rd + 38 wr) ==25427== L2d misses: 272 ( 242 rd + 30 wr) ==25427== D1 miss rate: 5.8% ( 7.3% + 2.0% ) ==25427== L2d miss rate: 4.1% ( 5.1% + 1.6% ) ==25427== ==25427== L2 refs: 749 ( 711 rd + 38 wr) ==25427== L2 misses: 491 ( 461 rd + 30 wr) ==25427== L2 miss rate: 2.5% ( 2.6% + 1.6% ) with -march=pentium4 -O3 -s -funroll-loops -ftracer -momit-leaf-frame-pointer same thing Usage: ./hanoi duration [disks] ==1233== ==1233== I refs: 12,473 ==1233== I1 misses: 367 ==1233== L2i misses: 219 ==1233== I1 miss rate: 2.94% ==1233== L2i miss rate: 1.75% ==1233== ==1233== D refs: 6,479 (4,659 rd + 1,820 wr) ==1233== D1 misses: 379 ( 344 rd + 35 wr) ==1233== L2d misses: 272 ( 242 rd + 30 wr) ==1233== D1 miss rate: 5.8% ( 7.3% + 1.9% ) ==1233== L2d miss rate: 4.1% ( 5.1% + 1.6% ) ==1233== ==1233== L2 refs: 746 ( 711 rd + 35 wr) ==1233== L2 misses: 491 ( 461 rd + 30 wr) ==1233== L2 miss rate: 2.5% ( 2.6% + 1.6% ) BUT with this next one data misses go up to 6.4% enough to degrade performance as you said -march=pentium4 -O3 -s -funroll-all-loops -ftracer -momit-leaf-frame-pointer Usage: ./hanoi duration [disks] ==25414== ==25414== I refs: 12,475 ==25414== I1 misses: 369 ==25414== L2i misses: 218 ==25414== I1 miss rate: 2.95% ==25414== L2i miss rate: 1.74% ==25414== ==25414== D refs: 6,483 (4,663 rd + 1,820 wr) ==25414== D1 misses: 418 ( 375 rd + 43 wr) ==25414== L2d misses: 273 ( 243 rd + 30 wr) ==25414== D1 miss rate: 6.4% ( 8.0% + 2.3% ) ==25414== L2d miss rate: 4.2% ( 5.2% + 1.6% ) ==25414== ==25414== L2 refs: 787 ( 744 rd + 43 wr) ==25414== L2 misses: 491 ( 461 rd + 30 wr) ==25414== L2 miss rate: 2.5% ( 2.6% + 1.6% ) thanks for the insight BTW i'm also not saying gcc aint cool cause it's a world class compiler that is way cross platform and open source !!! maccorin 05-19-2004, 01:33 PM Originally posted by GaryJones32 BTW i'm also not saying gcc aint cool cause it's a world class compiler that is way cross platform and open source !!! :) those are the 2 best things about gcc IMHO, but I'm one of those "free software zealouts"... in fact i did go to d/l icc for my laptop (the only intel cpu i have in the house), but as soon as i hit a user agreement i stopped. Well, I've got a U60 (2 x 300MHz) coming in the mail soon, so I'll get to have fun figuring out what CFLAGS are the best for it as soon as I've got it, I'm guessing it will be quite a bit different. Esp considering the bigger cache and massive amount of registers available. I did some sys admin stuff on sparc for an old job a lot, but never really had time to experiment much at all :( Your right about the benchmarking being mostly about interpretation. It's one of those cases where numbers just simply don't speak for themselves. But yea, it's safe to say icc can beat out gcc _easily_ on a P4. Since i'm so damn religious about Free Software I probably won't test this myself, but I wonder how icc would hold up on another x86 (i know you would have to disable things like sse2). It would be interesting to find out at least. I would be suprised if it _didn't_ do well. Because that would make icc kinda pointless for closed-source software (how are you gonna know down to the brand what proc everyone is running?). Which seems to be what they are catering to. justlinux.com
Copyright Internet.com Inc. All Rights Reserved. |