cpu: Fix cache blocked load behavior in o3 cpu
This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than
flushing the entire pipeline, this patch replays loads once the cache becomes
unblocked.
Additionally, deferred memory instructions (loads which had conflicting stores),
when replayed would not respect the number of functional units (only respected
issue width). This patch also corrects that.
Improvements over 20% have been observed on a microbenchmark designed to
exercise this behavior.