Intel FP Optimization Differences
different optimizations are required for different family members
- 486 - avoid an excessive number of FXCH instructions since they
require execution by the FPU and thus reduce the FP performance
- Pentium - FXCH instructions can be strategically paired
with other instructions to improve performance
- P6 (Pentium II/III/M) - FXCHs are essentially free since they are
handled by the register renaming hardware
- Pentium 4 - avoid FXCHs since they consume slots in the trace
cache and there are also issue slot restrictions (use SSE2
instructions instead where possible)