Christopher Torng [Sun, 2 Mar 2014 05:35:23 +0000 (23:35 -0600)]
cpu: Enable fast-forwarding for MIPS InOrderCPU and O3CPU
A copyRegs() function is added to MIPS utilities
to copy architectural state from the old CPU to
the new CPU during fast-forwarding. This
addition alone enables fast-forwarding for the
o3 cpu model running MIPS.
The patch also adds takeOverFrom() and
drainResume() functions to the InOrderCPU to
enable it to take over from another CPU. This
change enables fast-forwarding for the inorder
cpu model running MIPS, but not for Alpha.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Nilay Vaish [Sun, 2 Mar 2014 05:35:21 +0000 (23:35 -0600)]
ruby: profiler: statically allocate stats variable
Couple of users observed segmentation fault when the simulator tries to
register the statistical variable m_IncompleteTimes. It seems that there
is some problem with the initialization of these variables when allocated
in the constructor.
Nilay Vaish [Tue, 25 Feb 2014 02:50:06 +0000 (20:50 -0600)]
stats: updates due to
c0db268f811b
Nilay Vaish [Tue, 25 Feb 2014 02:50:05 +0000 (20:50 -0600)]
ruby: correct errors in changeset
4eec7bdde5b0
Couple of errors were discovered in
4eec7bdde5b0 which necessitated this patch.
Firstly, we create interrupt controllers in the se mode, but no piobus was
being created. RubyPort, which earlier used to ignore range changes now
forwards those to the piobus. The lack of piobus resulted in segmentation
fault. This patch creates a piobus even in se mode. It is not created only
when some tester is running. Secondly, I had missed out on modifying port
connections for other coherence protocols.
Nilay Vaish [Mon, 24 Feb 2014 01:16:16 +0000 (19:16 -0600)]
stats: updates due to changes to ruby pio access handling
Nilay Vaish [Mon, 24 Feb 2014 01:16:16 +0000 (19:16 -0600)]
ruby: route all packets through ruby port
Currently, the interrupt controller in x86 is connected to the io bus
directly. Therefore the packets between the io devices and the interrupt
controller do not go through ruby. This patch changes ruby port so that
these packets arrive at the ruby port first, which then routes them to their
destination. Note that the patch does not make these packets go through the
ruby network. That would happen in a subsequent patch.
Andreas Hansson [Mon, 24 Feb 2014 01:16:16 +0000 (19:16 -0600)]
ruby: Simplify RubyPort flow control and routing
This patch simplfies the retry logic in the RubyPort, avoiding
redundant attributes, and enforcing more stringent checks on the
interactions with the normal ports. The patch also simplifies the
routing done by the RubyPort, using the port identifiers instead of a
heavy-weight sender state.
The patch also fixes a bug in the sending of responses from PIO
ports. Previously these responses bypassed the queue in the queued
port, and ignored the return value, potentially leading to response
packets being lost.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Nilay Vaish [Mon, 24 Feb 2014 01:16:15 +0000 (19:16 -0600)]
config: topologies: slight code refactor
Nilay Vaish [Mon, 24 Feb 2014 01:16:15 +0000 (19:16 -0600)]
ruby: message buffer: refactor code
Code in two of the functions was exactly the same. This patch moves
this code to a new function which is called from the two functions
mentioned initially.
Nilay Vaish [Mon, 24 Feb 2014 01:16:15 +0000 (19:16 -0600)]
ruby: remove few not required #includes
Nilay Vaish [Mon, 24 Feb 2014 01:16:15 +0000 (19:16 -0600)]
ruby: slicc: remove unused COPY_HEAD functionality
Nilay Vaish [Mon, 24 Feb 2014 01:16:15 +0000 (19:16 -0600)]
ruby: protocols: remove unused action z_stall
Nilay Vaish [Fri, 21 Feb 2014 14:02:06 +0000 (08:02 -0600)]
config: ruby_random_test: updates due to recent unrelated changes
Nilay Vaish [Fri, 21 Feb 2014 14:02:05 +0000 (08:02 -0600)]
ruby: network: move message buffers to base network class.
Nilay Vaish [Fri, 21 Feb 2014 14:02:04 +0000 (08:02 -0600)]
ruby: network: garnet: fixed: removes net_ptr from links
Nilay Vaish [Fri, 21 Feb 2014 14:02:02 +0000 (08:02 -0600)]
ruby: cache: remove not required variable m_cache_name
Nilay Vaish [Thu, 20 Feb 2014 23:28:01 +0000 (17:28 -0600)]
ruby: network: garnet: fixed: removes next cycle functions
At several places, there are functions that take a cycle value as input
and performs some computation. Along with each such function, another
function was being defined that simply added one more cycle to input and
computed the same function. This patch removes this second copy of the
function. Places where these functions were being called have been updated
to use the original function with argument being current cycle + 1.
Nilay Vaish [Thu, 20 Feb 2014 23:27:45 +0000 (17:27 -0600)]
ruby: controller: slight code refactoring
Nilay Vaish [Thu, 20 Feb 2014 23:27:17 +0000 (17:27 -0600)]
ruby: mesi three level: rename incorrectly named files
Two files had been incorrectly named with a .cache suffix.
--HG--
rename : src/mem/protocol/MESI_Three_Level-L0.cache => src/mem/protocol/MESI_Three_Level-L0cache.sm
rename : src/mem/protocol/MESI_Three_Level-L1.cache => src/mem/protocol/MESI_Three_Level-L1cache.sm
Nilay Vaish [Thu, 20 Feb 2014 23:27:07 +0000 (17:27 -0600)]
ruby: network: removes unused code.
Nilay Vaish [Thu, 20 Feb 2014 23:26:49 +0000 (17:26 -0600)]
ruby: slicc: slight code refactoring
Nilay Vaish [Thu, 20 Feb 2014 23:26:41 +0000 (17:26 -0600)]
ruby: message buffer: removes some unecessary functions.
Andreas Sandberg [Thu, 20 Feb 2014 14:43:53 +0000 (15:43 +0100)]
kvm: Add support for multi-system simulation
The introduction of parallel event queues added most of the support
needed to run multiple VMs (systems) within the same gem5
instance. This changeset fixes up signal delivery so that KVM's
control signals are delivered to the thread that executes the CPU's
event queue. Specifically:
* Timers and counters are now initialized from a separate method
(startupThread) that is scheduled as the first event in the
thread-specific event queue. This ensures that they are
initialized from the thread that is going to execute the CPUs
event queue and enables signal delivery to the right thread when
exiting from KVM.
* The POSIX-timer-based KVM timer (used to force exits from KVM) has
been updated to deliver signals to the thread that's executing KVM
instead of the process (thread is undefined in that case). This
assumes that the timer is instantiated from the thread that is
going to execute the KVM vCPU.
* Signal masking is now done using pthread_sigmask instead of
sigprocmask. The behavior of the latter is undefined in threaded
applications.
* Since signal masks can be inherited, make sure to actively unmask
the control signals when setting up the KVM signal mask.
There are currently no facilities to multiplex between multiple KVM
CPUs in the same event queue, we are therefore limited to
configurations where there is only one KVM CPU per event queue. In
practice, this means that multi-system configurations can be
simulated, but not multiple CPUs in a shared-memory configuration.
Andreas Hansson [Wed, 19 Feb 2014 12:59:46 +0000 (07:59 -0500)]
arm: Bump stats after FS config script update
This patch updates the stats to reflect the change in kernel options
needed for armv8 (but used for all FS regressions).
Anthony Gutierrez [Tue, 18 Feb 2014 22:20:56 +0000 (17:20 -0500)]
arm: armv8 boot options to enable v8
Modifies FSConfig.py to enable ARMv8 compatibility.
To boot gem5 with ARMv8:
Download the v8 kernel, .dtb file, and root FS from: http://gem5.org/Download
Download the ARMv8 toolchain, and add the bin dir to your path:
http://www.linaro.org/engineering/engineering-projects/armv8
Build gem5 for ARM
Build the v8 bootloader (in gem5/system/arm/aarch64_bootloader)
Make script in gem5/system/arm/aarch64_bootloader will require v8 toolchain,
drop the produced boot_emm.arm64 in $(M5_PATH)/binaries/
Run:
$ build/ARM/gem5.fast configs/example/fs.py --machine-type=VExpress_EMM64 \
--kernel=/path/to/kernel/vmlinux-linaro-tracking \
--dtb-filename=/path/to/dtb/rtsm_ve-aemv8a.dtb \
--disk-image=/path/to/img/linaro-minimal-armv8.img
Andreas Hansson [Tue, 18 Feb 2014 10:51:01 +0000 (05:51 -0500)]
mem: Fix bug in PhysicalMemory use of mmap and munmap
This patch fixes a bug in how physical memory used to be mapped and
unmapped. Previously we unmapped and re-mapped if restoring from a
checkpoint. However, we never checked that the new mapping was
actually the same, it was just magically working as the OS seems to
fairly reliably give us the same chunk back. This patch fixes this
issue by relying entirely on the mmap call in the constructor.
Andreas Hansson [Tue, 18 Feb 2014 10:50:59 +0000 (05:50 -0500)]
dev: Include basic devices in NULL ISA build
This patch enbles use of the basic PIO devices as part of the NULL
build. Although it might seem counter intuitive to have a PIO device
without being able to execute a driver, this change enables us to
break a device class hierarchy into an ISA-agnostic part, and an
ISA-specific part, without requiring multiple-inheritance. The
ISA-agnostic base class is a PIO device, but does not make use of the
port.
Andreas Hansson [Tue, 18 Feb 2014 10:50:58 +0000 (05:50 -0500)]
scons: Add PROTOC from the environment
This patch adds PROTOC to the build environment.
Andreas Hansson [Tue, 18 Feb 2014 10:50:58 +0000 (05:50 -0500)]
mem: Filter cache snoops based on address ranges
This patch adds a filter to the cache to drop snoop requests that are
not for a range covered by the cache. This fixes an issue observed
when multiple caches are placed in parallel, covering different
address ranges. Without this patch, all the caches will forward the
snoop upwards, when only one should do so.
Andreas Hansson [Tue, 18 Feb 2014 10:50:53 +0000 (05:50 -0500)]
mem: Add a wrapped DRAMSim2 memory controller
This patch adds DRAMSim2 as a memory controller by wrapping the
external library and creating a sublass of AbstractMemory that bridges
between the semantics of gem5 and the DRAMSim2 interface.
The DRAMSim2 wrapper extracts the clock period from the config
file. There is no way of extracting this information from DRAMSim2
itself, so we simply read the same config file and get it from there.
To properly model the response queue, the wrapper keeps track of how
many transactions are in the actual controller, and how many are
stacking up waiting to be sent back as responses (in the wrapper). The
latter requires us to move away from the queued port and manage the
packets ourselves. This is due to DRAMSim2 not having any flow control
on the response path.
DRAMSim2 assumes that the transactions it is given are matching the
burst size of the choosen memory. The wrapper checks to ensure the
cache line size of the system matches the burst size of DRAMSim2 as
there are currently no provisions to split the system requests. In
theory we could allow a cache line size smaller than the burst size,
but that would lead to inefficient use of the DRAM, so for not we
fatal also in this case.
Andreas Hansson [Tue, 18 Feb 2014 10:50:52 +0000 (05:50 -0500)]
util: Enhance the error messages for packet encode/decode
This patch adds a more verbose error message when the Python protobuf
module cannot be loaded.
Andreas Hansson [Tue, 18 Feb 2014 10:50:51 +0000 (05:50 -0500)]
mem: Fix input to DPRINTF in CommMonitor
Minor fix of the debug message parameters.
Nilay Vaish [Sun, 16 Feb 2014 17:40:34 +0000 (11:40 -0600)]
stats: updates due to branch predictor warming
Nilay Vaish [Sat, 15 Feb 2014 18:44:09 +0000 (12:44 -0600)]
Added tag stable_2014_02_15 to the changeset
459491344fcf
Andreas Sandberg [Sun, 9 Feb 2014 19:49:28 +0000 (20:49 +0100)]
cpu: simple: Add support for using branch predictors
This changesets adds branch predictor support to the
BaseSimpleCPU. The simple CPUs normally don't need a branch predictor,
however, there are at least two cases where it can be desirable:
1) A simple CPU can be used to warm the branch predictor of an O3
CPU before switching to the slower O3 model.
2) The simple CPU can be used as a quick way of evaluating/debugging
new branch predictors since it exposes branch predictor
statistics.
Limitations:
* Since the simple CPU doesn't speculate, only one instruction will
be active in the branch predictor at a time (i.e., the branch
predictor will never see speculative branches).
* The outcome of a branch prediction does not affect the performance
of the simple CPU.
Nilay Vaish [Thu, 6 Feb 2014 22:30:13 +0000 (16:30 -0600)]
base: calls abort() from fatal
Currently fatal() ends the simulation in a normal fashion. This results in
the call stack getting lost when using a debugger and it is not always
possible to debug the simulation just from the information provided by the
printed error message. Even though the error is likely due to a user's fault,
the information available should not be thrown away. Hence, this patch to
call abort() from fatal().
Nilay Vaish [Thu, 6 Feb 2014 22:30:12 +0000 (16:30 -0600)]
ruby: memory controller: use MemoryNode *
Andreas Sandberg [Wed, 5 Feb 2014 13:08:13 +0000 (14:08 +0100)]
x86: Fix x87 state transfer bug
Changeset
7274310be1bb (isa: clean up register constants) increased
the value of NumFloatRegs, which triggered a bug in
X86ISA::copyRegs(). This bug is caused by the x87 stack being copied
twice since register indexes past NUM_FLOATREGS are mapped into the
x87 stack relative to the top of the stack, which is undefined when
the copy takes place.
This changeset updates the copyRegs() function to use access registers
using the non-flattening interface, which guarantees that undesirable
register folding does not happen.
Nikos Nikoleris [Sun, 2 Feb 2014 15:37:35 +0000 (16:37 +0100)]
x86, kvm: Fix bug in the RFlags get and set functions
The getRFlags and setRFlags utility functions were not updated
correctly when condition registers were separated into their own
register class. This lead to incorrect state transfer in calls from
kvm into the simulator (e.g., m5 readfile ended up in an infinite
loop) and when switching CPUs. This patch makes these utility
functions use getCCReg and setCCReg instead of getIntReg and setIntReg
which read and write the integer registers.
Reviewed-by: Andreas Sandberg <andreas@sandberg.pp.se>
Nilay Vaish [Fri, 31 Jan 2014 21:35:45 +0000 (15:35 -0600)]
config: correct bug in x86 drive sys instantiation
Ola Jeppsson [Thu, 30 Jan 2014 18:21:58 +0000 (12:21 -0600)]
unittest: Fix build errors
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Mitch Hayenga [Thu, 30 Jan 2014 05:21:26 +0000 (23:21 -0600)]
mem: Add additional tolerance to stride prefetcher
Forces the prefetcher to mispredict twice in a row before resetting the
confidence of prefetching. This helps cases where a load PC strides by a
constant factor, however it may operate on different arrays at times.
Avoids the cost of retraining. Primarily helps with small iteration loops.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Mitch Hayenga [Thu, 30 Jan 2014 05:21:26 +0000 (23:21 -0600)]
mem: Allowed tagged instruction prefetching in stride prefetcher
For systems with a tightly coupled L2, a stride-based prefetcher may observe
access requests from both instruction and data L1 caches. However, the PC
address of an instruction miss gives no relevant training information to the
stride based prefetcher(there is no stride to train). In theses cases, its
better if the L2 stride prefetcher simply reverted back to a simple N-block
ahead prefetcher. This patch enables this option.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
mem: prefetcher: add options, support for unaligned addresses
This patch extends the classic prefetcher to work on non-block aligned
addresses. Because the existing prefetchers in gem5 mask off the lower
address bits of cache accesses, many predictable strides fail to be
detected. For example, if a load were to stride by 48 bytes, with 64 byte
cachelines, the current stride based prefetcher would see an access pattern
of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern. This
patch fixes this, by training the prefetcher on access and not masking off the
lower address bits.
It also adds the following configuration options:
1) Training/prefetching only on cache misses,
2) Training/prefetching only on data acceses,
3) Optionally tagging prefetches with a PC address.
#3 allows prefetchers to train off of prefetch requests in systems with
multiple cache levels and PC-based prefetchers present at multiple levels.
It also effectively allows a pipelining of prefetch requests (like in POWER4)
across multiple levels of cache hierarchy.
Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7% for SPECFP (geomean).
Xiangyu Dong [Thu, 30 Jan 2014 04:35:04 +0000 (22:35 -0600)]
cpu: fix bug when TrafficGen deschedules event
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Mitch Hayenga [Wed, 29 Jan 2014 00:00:51 +0000 (18:00 -0600)]
arm: Enable umask syscall in SE mode
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Mitch Hayenga [Wed, 29 Jan 2014 00:00:51 +0000 (18:00 -0600)]
base: Fix race condition in the socket listen function
gem5 makes the incorrect assumption that by binding a socket, it
effectively has allocated a port. Linux only allocates ports once you call
listen on the given socket, not when you call bind. So even if the port was
free when bind was called, another process (gem5 instance) could race in
between the bind & listen calls and steal the port. In the current code, if
the call to bind fails due to the port being in use (EADDRINUSE), gem5 retries
for a different port. However if listen fails, gem5 just panics. The fix is
testing the return value of listen and re-trying if it was due to EADDRINUSE.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Amin Farmahini [Wed, 29 Jan 2014 00:00:50 +0000 (18:00 -0600)]
mem: Remove redundant findVictim() input argument
The patch
(1) removes the redundant writeback argument from findVictim()
(2) fixes the description of access() function
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Amin Farmahini [Wed, 29 Jan 2014 00:00:49 +0000 (18:00 -0600)]
mem: Fixes a bug in simple_dram write merging
Fixes updating the value of size in the write merge function.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Nilay Vaish [Tue, 28 Jan 2014 13:15:53 +0000 (07:15 -0600)]
x86: add a warning about the number of memory controllers
When memory size > 3GB, print a warning that twice the number of memory
controllers would be created.
Nilay Vaish [Tue, 28 Jan 2014 00:50:54 +0000 (18:50 -0600)]
x86: use lfpimm instead of limm for fptan
Nilay Vaish [Tue, 28 Jan 2014 00:50:53 +0000 (18:50 -0600)]
x86: implements x87 add/sub instructions
Nilay Vaish [Tue, 28 Jan 2014 00:50:52 +0000 (18:50 -0600)]
x86: implements fxch instruction.
Nilay Vaish [Tue, 28 Jan 2014 00:50:51 +0000 (18:50 -0600)]
x86: correct error in emms instruction.
Nilay Vaish [Tue, 28 Jan 2014 00:50:51 +0000 (18:50 -0600)]
config: allow more than 3GB of memory for x86 simulations
This patch edits the configuration files so that x86 simulations can have
more than 3GB of memory. It also corrects a bug in the MemConfig.py script.
Nilay Vaish [Mon, 27 Jan 2014 19:30:37 +0000 (13:30 -0600)]
stats: update sparc fs stats
Steve Reinhardt [Mon, 27 Jan 2014 05:38:58 +0000 (00:38 -0500)]
stats: update eio stats for recent changes
Ali Saidi [Fri, 24 Jan 2014 21:29:34 +0000 (15:29 -0600)]
stats: update stats for ARMv8 changes
ARM gem5 Developers [Fri, 24 Jan 2014 21:29:34 +0000 (15:29 -0600)]
arm: Add support for ARMv8 (AArch64 & AArch32)
Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.
Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.
Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
Ali Saidi [Fri, 24 Jan 2014 21:29:33 +0000 (15:29 -0600)]
stats: update stats for cache occupancy and clock domain changes
Andreas Hansson [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
arch: Make all register index flattening const
This patch makes all the register index flattening methods const for
all the ISAs. As part of this, readMiscRegNoEffect for ARM is also
made const.
Geoffrey Blake [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
checker: CheckerCPU handling of MiscRegs was incorrect
The CheckerCPU model in pre-v8 code was not checking the
updates to miscellaneous registers due to some methods
for setting misc regs were not instrumented. The v8 patches
exposed this by calling the instrumented misc reg update
methods and then invoking the checker before the main CPU had
updated its misc regs, leading to false positives about
register mismatches. This patch fixes the non-instrumented
misc reg update methods and places calls to the checker in
the proper places in the O3 model.
Ali Saidi [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
arch, cpu: Add support for flattening misc register indexes.
With ARMv8 support the same misc register id results in accessing different
registers depending on the current mode of the processor. This patch adds
the same orthogonality to the misc register file as the others (int, float, cc).
For all the othre ISAs this is currently a null-implementation.
Additionally, a system variable is added to all the ISA objects.
Giacomo Gabrielli [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
cpu: Add support for Memory+Barrier instruction types in O3 cpu.
Ali Saidi [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
cpu: Add support for instructions that zero cache lines.
Ali Saidi [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
cpu: Add CPU support for generatig wake up events when LLSC adresses are snooped.
This patch add support for generating wake-up events in the CPU when an address
that is currently in the exclusive state is hit by a snoop. This mechanism is required
for ARMv8 multi-processor support.
Giacomo Gabrielli [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
mem: Add flag to request if it was generated by a page table walk
Giacomo Gabrielli [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
mem: Add support for a security bit in the memory system
This patch adds the basic building blocks required to support e.g. ARM
TrustZone by discerning secure and non-secure memory accesses.
Chris Adeniyi-Jones [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
sim: Add openat/fstatat syscalls and fix mremap
This patch adds support for the openat and fstatat syscalls and
broadens the support for mremap to make it work on OS X.
Ali Saidi [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
mem: Remove explict cast from memhelper.
Previously we were casting the result type to the the memory type which
is incorrect for things like dual-memory operations which still return a
single result.
Timothy M. Jones [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
Cache: Collect very basic stats on tag and data accesses
Adds very basic statistics on the number of tag and data accesses within the
cache, which is important for power modelling. For the tags, simply count
the associativity of the cache each time. For the data, this depends on
whether tags and data are accessed sequentially, which is given by a new
parameter. In the parallel case, all data blocks are accessed each time, but
with sequential accesses, a single data block is accessed only on a hit.
Dam Sunwoo [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
mem: per-thread cache occupancy and per-block ages
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks. Cache occupancy stats are
recalculated on each stat dump.
Matt Horsnell [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
base: add support for probe points and common probes
The probe patch is motivated by the desire to move analytical and trace code
away from functional code. This is achieved by the probe interface which is
essentially a glorified observer model.
What this means to users:
* add a probe point and a "notify" call at the source of an "event"
* add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace)
* register that module as a probe listener
Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py
What is happening under the hood:
* every SimObject maintains has a ProbeManager.
* during initialization (src/python/m5/simulate.py) first regProbePoints and
the regProbeListeners is called on each SimObject. this hooks up the probe
point notify calls with the listeners.
FAQs:
Why did you develop probe points:
* to remove trace, stats gathering, analytical code out of the functional code.
* the belief that probes could be generically useful.
What is a probe point:
* a probe point is used to notify upon a given event (e.g. cpu commits an instruction)
What is a probe listener:
* a class that handles whatever the user wishes to do when they are notified
about an event.
What can be passed on notify:
* probe points are templates, and so the user can generate probes that pass any
type of argument (by const reference) to a listener.
What relationships can be generated (1:1, 1:N, N:M etc):
* there isn't a restriction. You can hook probe points and listeners up in a
1:1, 1:N, N:M relationship. They become useful when a number of modules
listen to the same probe points. The idea being that you can add a small
number of probes into the source code and develop a larger number of useful
analysis modules that use information passed by the probes.
Can you give examples:
* adding a probe point to the cpu's commit method allows you to build a trace
module (outputting assembler), you could re-use this to gather instruction
distribution (arithmetic, load/store, conditional, control flow) stats.
Why is the probe interface currently restricted to passing a const reference:
* the desire, initially at least, is to allow an interface to observe
functionality, but not to change functionality.
* of course this can be subverted by const-casting.
What is the performance impact of adding probes:
* when nothing is actively listening to the probes they should have a
relatively minor impact. Profiling has suggested even with a large number of
probes (60) the impact of them (when not active) is very minimal (<1%).
Andreas Hansson [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
sim: Expose the current voltage for each object as a stat
Andreas Hansson [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
sim: Expose the current clock period as a stat
This patch adds observability to the clock period of the clock domains
by including it as a stat.
As a result of adding this, the regressions will be updated in a
separate patch.
Matt Horsnell [Fri, 24 Jan 2014 21:29:30 +0000 (15:29 -0600)]
mem: track per-request latencies and access depths in the cache hierarchy
Add some values and methods to the request object to track the translation
and access latency for a request and which level of the cache hierarchy responded
to the request.
Andreas Hansson [Fri, 24 Jan 2014 21:29:29 +0000 (15:29 -0600)]
config: Make the Clock a Tick parameter like Latency/Frequency
This patch makes the Clock a TickParamValue just like
Latency/Frequency. There is no longer any need to distinguish it
(originally needed to support multiplication).
Andreas Hansson [Fri, 24 Jan 2014 21:29:29 +0000 (15:29 -0600)]
x86: Fix memory leak in table walker
This patch fixes a memory leak in the table walker, by ensuring that
the sender state is deleted again if the request packet cannot be
successfully sent.
Andreas Hansson [Fri, 24 Jan 2014 21:29:29 +0000 (15:29 -0600)]
cpu: Relax check on squashed non-speculative instructions
This patch relaxes the check performed when squashing non-speculative
instructions, as it caused problems with loads that were marked ready,
and then stalled on a blocked cache. The assertion is now allowing
memory references to be non-faulting.
Dam Sunwoo [Fri, 24 Jan 2014 21:29:29 +0000 (15:29 -0600)]
util: updated Streamline flow to support ARM DS-5 v5.17 protocol
The previous flow supported ARM DS-5 v5.13 protocol.
Dam Sunwoo [Fri, 24 Jan 2014 21:29:29 +0000 (15:29 -0600)]
cpu: remove faulty simpoint basic block inst count assertion
This patch removes an assertion in the simpoint profiling code that
asserts that a previously-seen basic block has the exact same number
of instructions executed as before. This can be false if the basic
block generates aborts or takes interrupts at different locations
within the basic block. The basic block profiling are not affected
significantly as these events are rare in general.
Nilay Vaish [Fri, 17 Jan 2014 17:02:15 +0000 (11:02 -0600)]
ruby: remove unused label no_vector
Nilay Vaish [Fri, 10 Jan 2014 22:19:58 +0000 (16:19 -0600)]
stats: updates due to changes to ruby
Nilay Vaish [Fri, 10 Jan 2014 22:19:47 +0000 (16:19 -0600)]
ruby: move all statistics to stats.txt, eliminate ruby.stats
Nilay Vaish [Fri, 10 Jan 2014 22:19:40 +0000 (16:19 -0600)]
stats: add function for adding two histograms
This patch adds a function to the HistStor class for adding two histograms.
This functionality is required for Ruby. It also adds support for printing
histograms in a single line.
Nilay Vaish [Thu, 9 Jan 2014 16:45:50 +0000 (10:45 -0600)]
ruby: fix bug introduced to revision
8523754f8885
Nilay Vaish [Wed, 8 Jan 2014 10:26:25 +0000 (04:26 -0600)]
ruby: slicc: remove variable 'addr' used in calls to doTransition
This variable causes trouble if a variable of same name is declared in a
protocol file. Hence it is being eliminated.
Nilay Vaish [Sat, 4 Jan 2014 06:03:34 +0000 (00:03 -0600)]
ruby: add a three level MESI protocol.
The first two levels (L0, L1) are private to the core, the third level (L2)is
possibly shared. The protocol supports clustered designs. For example, one
can have two sets of two cores. Each core has an L0 and L1 cache. There are
two L2 controllers where each set accesses only one of the L2 controllers.
Nilay Vaish [Sat, 4 Jan 2014 06:03:33 +0000 (00:03 -0600)]
ruby: rename MESI_CMP_directory to MESI_Two_Level
This is because the next patch introduces a three level hierarchy.
--HG--
rename : build_opts/ALPHA_MESI_CMP_directory => build_opts/ALPHA_MESI_Two_Level
rename : build_opts/X86_MESI_CMP_directory => build_opts/X86_MESI_Two_Level
rename : configs/ruby/MESI_CMP_directory.py => configs/ruby/MESI_Two_Level.py
rename : src/mem/protocol/MESI_CMP_directory-L1cache.sm => src/mem/protocol/MESI_Two_Level-L1cache.sm
rename : src/mem/protocol/MESI_CMP_directory-L2cache.sm => src/mem/protocol/MESI_Two_Level-L2cache.sm
rename : src/mem/protocol/MESI_CMP_directory-dir.sm => src/mem/protocol/MESI_Two_Level-dir.sm
rename : src/mem/protocol/MESI_CMP_directory-dma.sm => src/mem/protocol/MESI_Two_Level-dma.sm
rename : src/mem/protocol/MESI_CMP_directory-msg.sm => src/mem/protocol/MESI_Two_Level-msg.sm
rename : src/mem/protocol/MESI_CMP_directory.slicc => src/mem/protocol/MESI_Two_Level.slicc
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/config.ini => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/config.ini
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/ruby.stats
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/simerr => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/simerr
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/simout => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/simout
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/stats.txt
rename : tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_CMP_directory/system.pc.com_1.terminal => tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/system.pc.com_1.terminal
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/config.ini => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/config.ini
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/ruby.stats
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/simerr => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/simerr
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/simout => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/simout
rename : tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/00.hello/ref/alpha/linux/simple-timing-ruby-MESI_Two_Level/stats.txt
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/config.ini => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/config.ini
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/ruby.stats
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/simerr => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/simerr
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/simout => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/simout
rename : tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/00.hello/ref/alpha/tru64/simple-timing-ruby-MESI_Two_Level/stats.txt
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/config.ini => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/config.ini
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/ruby.stats
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/simerr => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/simerr
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/simout => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/simout
rename : tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/50.memtest/ref/alpha/linux/memtest-ruby-MESI_Two_Level/stats.txt
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/config.ini => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/config.ini
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/ruby.stats => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/ruby.stats
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/simerr => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/simerr
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/simout => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/simout
rename : tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_CMP_directory/stats.txt => tests/quick/se/60.rubytest/ref/alpha/linux/rubytest-ruby-MESI_Two_Level/stats.txt
Nilay Vaish [Sat, 4 Jan 2014 06:03:32 +0000 (00:03 -0600)]
ruby: remove cntrl_id from python config scripts.
Nilay Vaish [Sat, 4 Jan 2014 06:03:31 +0000 (00:03 -0600)]
ruby: add support for clusters
A cluster over here means a set of controllers that can be accessed only by a
certain set of cores. For example, consider a two level hierarchy. Assume
there are 4 L1 controllers (private) and 2 L2 controllers. We can have two
different hierarchies here:
a. the address space is partitioned between the two L2 controllers. Each L1
controller accesses both the L2 controllers. In this case, each L1 controller
is a cluster initself.
b. both the L2 controllers can cache any address. An L1 controller has access
to only one of the L2 controllers. In this case, each L2 controller
along with the L1 controllers that access it, form a cluster.
This patch allows for each controller to have a cluster ID, which is 0 by
default. By setting the cluster ID properly, one can instantiate hierarchies
with clusters. Note that the coherence protocol might have to be changed as
well.
Nilay Vaish [Sat, 4 Jan 2014 06:03:30 +0000 (00:03 -0600)]
ruby: some small changes
Steve Reinhardt [Sat, 4 Jan 2014 01:08:44 +0000 (17:08 -0800)]
config, x86: move kernel specification from tests to FSConfig.py
For some reason, the default x86 kernel is specified in
tests/configs/x86_generic.py and not in configs/common/FSConfig.py,
where the kernels for all the other ISAs are. This means that
running configs/example/fs.py for x86 fails because no kernel
is specified. Moving the specification over fixes this problem.
There is another problem that this uncovers, which is that going
past the init stage (i.e., past where the regression test stops)
fails because the fsck test on the disk device fails, but that's
a separate issue.
Steve Reinhardt [Sat, 4 Jan 2014 01:08:43 +0000 (17:08 -0800)]
python: provide better error message for wrapped C++ methods
If you successfully export a C++ SimObject method, but try to
invoke it from Python before the C++ object is created, you
get a confusing error that says the attribute does not exist,
making you question whether you successfully exported the
method at all. In reality, your only problem is that you're
calling the method too soon. This patch enhances the error
message to give you a better clue.
Steve Reinhardt [Sat, 4 Jan 2014 01:08:42 +0000 (17:08 -0800)]
python: don't die on assignment to cloned object
Updating the SimObject topology of a cloned hierarchy is a little
dangerous, in that cloning is a "deep copy" and the clone does not
inherit SimObject updates the same way it would inherit scalar
variable assignments.
However, because of various SimObject-valued proxy parameters,
like 'memories', 'clk_domain', and 'system', it turns out that
there are a number of implicit topology changes that happen at
instantiation, which means that these changes are impossible to
avoid. So in order to make cloning systems useful, this error
has to go. Changing it to a warning produces a lot of noise,
so it seems best just to delete it.
Christopher Torng [Mon, 30 Dec 2013 01:29:45 +0000 (19:29 -0600)]
sim: Add support for dynamic frequency scaling
This patch provides support for DFS by having ClockedObjects register
themselves with their clock domain at construction time in a member list.
Using this list, a clock domain can update each member's tick to the
curTick() before modifying the clock period.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Christopher Torng [Mon, 30 Dec 2013 01:29:45 +0000 (19:29 -0600)]
mips: Floating point convert bug fix
In mips architecture, floating point convert instructions use the
FloatConvertOp format defined in src/arch/mips/isa/formats/fp.isa. The type
of the operands in the ISA description file (_sw for signed word, or _sf for
signed float, etc.) is used to create a type for the operand in C++. Then the
operand is converted using the fpConvert() function in src/arch/mips/utility.cc.
If we are converting from a word to a float, and we want to convert 0xffffffff,
we expect -1 to be passed into fpConvert(). Instead, we see MAX_INT passed in.
Then fpConvert() converts _val_ to MAX_INT in single-precision floating point,
and we get the wrong value.
To fix it, the signs of the convert operands are being changed from unsigned to
signed in the MIPS ISA description.
Then, the FloatConvertOp format is being changed to insert a int32_t into the
C++ code instead of a uint32_t.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
Nilay Vaish [Thu, 26 Dec 2013 21:18:58 +0000 (15:18 -0600)]
stats: updates due to bug fixed in mesi coherence protocol
Nilay Vaish [Thu, 26 Dec 2013 21:18:55 +0000 (15:18 -0600)]
ruby: fix bugs in mesi cmp directory protocol
This patch fixes couple of bugs in the L2 controller of the mesi cmp
directory protocol.
1. The state MT_I was transitioning to NP on receiving a clean writeback
from the L1 controller. This patch makes it inform the directory controller
about the writeback.
2. The L2 controller was sending the dirty bit to the L1 controller and the
L2 controller used writeback from the L1 controller to update the dirty bit
unconditionally. Now, the L1 controller always assumes that the incoming
data is clean. The L2 controller updates the dirty bit only when the L1
controller writes to the block.
3. Certain unused functions and events are being removed.
Nilay Vaish [Sat, 21 Dec 2013 02:34:04 +0000 (20:34 -0600)]
ruby: slicc: replace max_in_port_rank with number of inports
This patch replaces max_in_port_rank with the number of inports. The use of
max_in_port_rank was causing spurious re-builds and incorrect initialization
of variables in ruby related regression tests. This was due to the variable
value being used across threads while compiling when it was not meant to be.
Since the number of inports is state machine specific value, this problem
should get solved.