Core 2 x87 Floating Point Performance
I'm working with some number crunching code that, by its nature, is floating-point intensive and and just plain slow. It's research code, so it can be tailored to one architecture, and is running on a Core 2 Quad box. My understanding is that, for the Pentium 4/Netburst architecture, Intel severely stripped down the x87 FPU, and adopted a more SSE2-centric design. This resulted in horrible performance on x87 code. However, the Core 2 architecture is more closely related to the P6 architecture than Netburst.
My compiler does not target SSE at all AFAIK and my understanding is that very few compilers do this well. Furthermore, I am using the D language, which is fairly bleeding edge, so there just aren't many compilers available for it. However, I don't want to switch languages, both because of the inertia of my existing code and because, despite its immaturity, I really like D.
Does the Core 2 architecture also have a stripped down x87 FPU? If so, what is the best way around this?
Get yourself to a profiler - there's way too many factors like cache misses and memory access latency to be able to attribute bad performance to specific processor features. If you want to find out what's fast, implement the same algorithm using several different methods and profile it.
I'd also recommend looking at the liboil library, which lets you optimize using SSE without writing assembly; I don't know how it integrates with D however.