dcache: Implement the dcbz instruction
This adds logic to dcache and loadstore1 to implement dcbz. For now
it zeroes a single cache line (by default 64 bytes), not 128 bytes
like IBM Power processors do.
The dcbz operation is performed much like a load miss, except that
we are writing zeroes to memory instead of reading. As each ack
comes back, we write zeroes to the BRAM instead of data from memory.
In this way we zero the line in memory and also zero the line of
cache memory, establishing the line in the cache if it wasn't already
resident. If it was already resident then we overwrite the existing
line in the cache.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>