Andreas Sandberg [Tue, 26 Aug 2014 14:14:07 +0000 (10:14 -0400)]
style: Fixup strange semantics in hg m5style
The 'hg m5style' command had some rather strange semantics. When
called without arguments, it applied the style checker to all added
files and modified regions of modified files. However, when providing
a list of files, it used that list as an ignore list instead of
specifically checking those files.
This patch makes the m5style command behave more like other Mercurial
commands where the arguments are used to specify which files to work
on instead of which files to ignore.
Andreas Sandberg [Tue, 26 Aug 2014 14:13:45 +0000 (10:13 -0400)]
base: Replace the internal varargs stuff with C++11 constructs
We currently use our own home-baked support for type-safe variadic
functions. This is confusing and somewhat limited (e.g., cprintf only
supports a limited number of arguments). This changeset converts all
uses of our internal varargs support to use C++11 variadic macros.
Andreas Sandberg [Tue, 26 Aug 2014 14:13:33 +0000 (10:13 -0400)]
base: Add compiler macros for C++11 final/override
Add the macros M5_ATTR_FINAL and M5_ATTR_OVERRIDE which are defined to
final and override respectively if supported by the compiler. This is
done to allow a smooth transition to gcc >= 4.7.
Mitch Hayenga [Tue, 26 Aug 2014 14:13:31 +0000 (10:13 -0400)]
mips: Fix RLIMIT_RSS naming
MIPS defined RLIMIT_RSS in a way that could cause a naming conflict with
RLIMIT_RSS from the host system. Broke clang+MacOS build.
Andreas Sandberg [Tue, 26 Aug 2014 14:13:28 +0000 (10:13 -0400)]
base: Add a static assert to check bit union ranges
If a bit field in a bit union specified as Bitfield<LSB, MSB> instead
of Bitfield<MSB, LSB> the code silently fails and the field is read as
zero. This changeset introduces a static assert that tests, at compile
time, that the bit order is correct.
Andreas Sandberg [Tue, 26 Aug 2014 14:13:23 +0000 (10:13 -0400)]
sparc: Fixup bit ordering in the PSTATE bit union
The order of the MSB and LSB bit of the mm field in the PSTATE union
is wrong. Any access to this field will currently be ignored and reads
will always return zero. This patch fixes the ordering so it is <MSB,
LSB> instead of <LSB, MSB>.
Andreas Hansson [Tue, 26 Aug 2014 14:13:03 +0000 (10:13 -0400)]
mem: Update DRAM controller comments
Update comments and add a reference for more information.
Andreas Hansson [Tue, 26 Aug 2014 14:12:45 +0000 (10:12 -0400)]
mem: Fix address interleaving bug in DRAM controller
This patch fixes a bug in the DRAM controller address decoding. In
cases where the DRAM burst size (e.g. 32 bytes in a rank with a single
LPDDR3 x32) was smaller than the channel interleaving size
(e.g. systems with a 64-byte cache line) one address bit effectively
got used as a channel bit when it should have been a low-order column
bit.
This patch adds a notion of "columns per stripe", and more clearly
deals with the low-order column bits and high-order column bits. The
patch also relaxes the granularity check such that it is possible to
use interleaving granularities other than the cache line size.
The patch also adds a missing M5_CLASS_VAR_USED to the tCK member as
it is only used in the debug build for now.
Curtis Dunham [Wed, 5 Feb 2014 22:17:41 +0000 (16:17 -0600)]
sim: bump checkpoint version for multiple event queues
This patch adds a fix for older checkpoints before support for
multiple event queues were added in changeset
2cce74fe359e. The change
in checkpoint version should really hav ebeen part of the
aforementioned changeset.
Andreas Hansson [Tue, 26 Aug 2014 14:12:04 +0000 (10:12 -0400)]
misc: README direct to website for dependencies
This patch updates the README to direct the user to the appropriate
sections on the gem5.org website rather than duplicating information.
Dam Sunwoo [Wed, 13 Aug 2014 10:57:36 +0000 (06:57 -0400)]
arm: change MISCREG_L2ERRSR to warn not fail
Some newer binaries compiled for Versatile Express TC2 contain access
to implementation specific L2MERRSR registers. This causes an infinite
loop of undefined exceptions. This patch changes the behavior to "warn
not fail" to keep the workloads going.
Dam Sunwoo [Wed, 13 Aug 2014 10:57:35 +0000 (06:57 -0400)]
sim: remove kernel mapping check for baremetal workloads
Baremetal workloads are specified using the "kernel" parameter, but
don't always have the correct address mappings. This patch adds a
boolean flag to the system and bypasses the kernel addr mapping checks
when running in baremetal mode.
Andreas Sandberg [Wed, 13 Aug 2014 10:57:31 +0000 (06:57 -0400)]
scons: Build the branch predictor for all CPUs
The branch predictor is normally only built when a CPU that uses a
branch predictor is built. The list of CPUs is currently incomplete as
the simple CPUs support branch predictors (for warming, branch stats,
etc). In practice, all CPU models now use branch predictors, so this
changeset removes the CPU model check and replaces it with a check for
the NULL ISA.
Andreas Sandberg [Wed, 13 Aug 2014 10:57:30 +0000 (06:57 -0400)]
mips: Remove unused private members to fix compile-time warning
Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
MIPS-specific classes to silence the warning.
Andreas Sandberg [Wed, 13 Aug 2014 10:57:29 +0000 (06:57 -0400)]
power: Remove unused private members to fix compile-time warning
Certain versions of clang complain about unused private members if
they are not used. This changeset removes such members from the
POWER-specific ProcessInfo struct to silence the warning.
Andreas Sandberg [Wed, 13 Aug 2014 10:57:28 +0000 (06:57 -0400)]
scons: Silence clang 3.4 warnings on Ubuntu 12.04
This changeset fixes three types of warnings that occur in clang 3.4
on Ubuntu 12.04:
* Certain versions of libstdc++ (primarily 4.8) use struct and class
interchangeably. This triggers a warning in clang.
* Swig has a tendency to generate code with the register class which
was deprecated in C++11. This triggers a deprecation warning in
clang.
* Swig sometimes generates Python wrapper code which returns
uninitialized values. It's unclear if this is actually a problem
(the cases might be limited to failure paths). We'll silence these
warnings for now since there is little we can do about the
generated code.
Andreas Sandberg [Wed, 13 Aug 2014 10:57:27 +0000 (06:57 -0400)]
base: Remove unused M5_PRAGMA_NORETURN
The M5_PRAGMA_NORETURN macro was only used in for
__exit_message. Since the macro only holds a stub definition and all
functions with noreturn semantics use the M5_ATTR_NORETURN, this
macros is completely redundant.
Andreas Sandberg [Wed, 13 Aug 2014 10:57:26 +0000 (06:57 -0400)]
cpu: Don't forward declare RefCountingPtr
RefCountingPtr is sometimes forward declared to avoid having to
include refcnt.hh. This does not work since we typically return
instances of RefCountingPtr rather than references to instances. The
only reason this currently works is that we include refcnt.hh in
cprintf.hh, which "leaks" the header to most other source files. This
changeset replaces such forward declarations with an include of
refcnt.hh.
Andreas Sandberg [Wed, 13 Aug 2014 10:57:25 +0000 (06:57 -0400)]
util: Fix state leakage in the SortIncludes style verifier
There are cases where the state of a SortIncludes object gets messed
up and leaks between invocations/files. This typically happens when a
file ends with an include block (dump_block() gets called at the end
of __call__). In this case, the state of the class is not reset
between files. This bug manifests itself as ghost includes that leak
between files when applying the style hooks.
This changeset adds a reset at the beginning of the __call__ method
which ensures that the class is always in a clean state when
processing a new file.
Mitch Hayenga [Wed, 13 Aug 2014 10:57:24 +0000 (06:57 -0400)]
mem: Properly set cache block status fields on writebacks
When a cacheline is written back to a lower-level cache,
tags->insertBlock() sets various status parameters. However these
status bits were cleared immediately after calling. This patch makes
it so that these status fields are not cleared by moving them outside
of the tags->insertBlock() call.
Andreas Hansson [Wed, 13 Aug 2014 10:57:21 +0000 (06:57 -0400)]
cpu: Modernise the branch predictor (STL and C++11)
This patch does some minor house keeping of the branch predictor by
adopting STL containers, and shifting some iterator to use range-based
for loops.
The predictor history is also changed from a list to a deque as we
never to insertion/deletion other than at the front and back.
Curtis Dunham [Tue, 11 Mar 2014 14:50:02 +0000 (09:50 -0500)]
arm: remove dead code fplib mul64x64
Mitch Hayenga [Wed, 13 Aug 2014 10:57:19 +0000 (06:57 -0400)]
ext: clang fix for flexible array members
Changes how flexible array members are defined so clang does not error
out during compilation.
Radhika Jagtap [Sun, 10 Aug 2014 09:39:40 +0000 (05:39 -0400)]
config: Fix cache latency param in mem test
This patch fixes the cache latency in mem test which is split into two params,
hit and response latency as per BaseCache.
Radhika Jagtap [Sun, 10 Aug 2014 09:39:20 +0000 (05:39 -0400)]
util: Move packet trace file read to protolib
This patch moves the code for opening an input protobuf packet trace into
a function defined in the protobuf library. This is because the code is
commonly used in decode scripts and is independent of the src protobuf
message.
Geoffrey Blake [Sun, 10 Aug 2014 09:39:16 +0000 (05:39 -0400)]
config: Add SubSystem container for simobjects
This patch adds the SubSystem container for grouping
simobjects together in logical subsystems to facilitate
building a larger system from constituent parts. The container
is simply a non-abstract empty simobject to hold the components
that will be connected as its children. In simulation the
object does not participate, its only use is during configuration
of the system.
Geoffrey Blake [Sun, 10 Aug 2014 09:39:13 +0000 (05:39 -0400)]
config: Add hooks to enable new config sys
This patch adds helper functions to SimObject.py, params.py and
simulate.py to enable the new configuration system. Functions like
enumerateParams() in SimObject lets the config system auto-generate
command line options for simobjects to be modified on the command
line.
Params in params.py have __call__() added
to their definition to allow the argparse module to use them
as a type to check command input is in the proper format.
Andreas Hansson [Sun, 10 Aug 2014 09:39:04 +0000 (05:39 -0400)]
cpu: Ensure the traffic generator suppresses non-memory packets
This patch adds a check to ensure that packets which are not going to
a memory range are suppressed in the traffic generator. Thus, if a
trace is collected in full-system, the packets destined for devices
are not played back.
Andreas Hansson [Sun, 10 Aug 2014 09:38:59 +0000 (05:38 -0400)]
base: Remove unused files
A bit of pruning
Andreas Hansson [Sun, 10 Aug 2014 09:38:56 +0000 (05:38 -0400)]
scons: Warn for incompatible gcc and binutils
It seems gcc >4.8 does not get along well with binutils <= 2.22, and
to help users this patch adds a warning with an indication for how to
fix the issue. It might even be worth adding a Exit(-1) and stop the
build.
Anthony Gutierrez [Mon, 28 Jul 2014 16:23:23 +0000 (12:23 -0400)]
mem: refactor LRU cache tags and add random replacement tags
this patch implements a new tags class that uses a random replacement policy.
these tags prefer to evict invalid blocks first, if none are available a
replacement candidate is chosen at random.
this patch factors out the common code in the LRU class and creates a new
abstract class: the BaseSetAssoc class. any set associative tag class must
implement the functionality related to the actual replacement policy in the
following methods:
accessBlock()
findVictim()
insertBlock()
invalidate()
Anthony Gutierrez [Mon, 28 Jul 2014 16:22:00 +0000 (12:22 -0400)]
arm: make the PseudoLRU tags the default for the O3_ARM_v7aL2
the Cortex-A15 has a random replacement policy for its L2 cache. see the
Cortex-A15 Technical Reference Manual 1.7 About the L2 memory system. this
patch makes the PseudoLRU tags the default for the ARM O3 CPU's L2 cache.
Andreas Hansson [Mon, 28 Jul 2014 05:48:21 +0000 (01:48 -0400)]
stats: Bump stats for the regressions using the minor CPU
Updating the stats to match the current behaviour.
Andrew Bardsley [Wed, 23 Jul 2014 21:09:05 +0000 (16:09 -0500)]
cpu: Minor CPU add regression tests for ARM and ALPHA
This patch adds regression tests results and test harnesses
for the Minor CPU on ARM and ALPHA.
Andrew Bardsley [Wed, 23 Jul 2014 21:09:04 +0000 (16:09 -0500)]
cpu: `Minor' in-order CPU model
This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).
The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).
Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.
Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.
Minor is faster than the o3 model. Sample results:
Benchmark | Stat host_seconds (s)
---------------+--------v--------v--------
(on ARM, opt) | simple | o3 | minor
| timing | timing | timing
---------------+--------+--------+--------
10.linux-boot | 169 | 1883 | 1075
10.mcf | 117 | 967 | 491
20.parser | 668 | 6315 | 3146
30.eon | 542 | 3413 | 2414
40.perlbmk | 2339 | 20905 | 11532
50.vortex | 122 | 1094 | 588
60.bzip2 | 2045 | 18061 | 9662
70.twolf | 207 | 2736 | 1036
Steve Reinhardt [Sun, 20 Jul 2014 02:04:58 +0000 (19:04 -0700)]
stats: update for syscall DPRINTF change
Only printing one rather than two args for the ignored syscall
warning means the count of register accesses has changed on
a few runs. Oddly only Alpha Tru64 seems to have any ignored
syscalls in the regression tests.
Steve Reinhardt [Sat, 19 Jul 2014 09:06:22 +0000 (02:06 -0700)]
syscall emulation: fix fast build issue
Surprisingly gcc will complain about unused variables even
inside an 'if (false)' block.
I thought I had tested this previously, but apparently not.
Binh Pham [Sat, 19 Jul 2014 05:05:51 +0000 (22:05 -0700)]
x86: make PioBus return BadAddress errors
Stop setting the use_default_range flag in PioBus in order to
have random bad addresses result in a BadAddress response and
not a gem5 fatal error. This is necessary in Ruby as Ruby is
connected directly to PioBus, so misspeculated addresses will
be sent there directly. For the classic memory system, this
change has no effect, as bad addresses are caught by the
memory bus before being sent to the PioBus.
This work was done while Binh was an intern at AMD Research.
Steve Reinhardt [Sat, 19 Jul 2014 05:05:51 +0000 (22:05 -0700)]
sim: remove unused MemoryModeStrings array
The System object has a static MemoryModeStrings array
that's (1) unused and (2) redundant, since there's an
auto-generated version in the Enums namespace. No
point in leaving it in.
Steve Reinhardt [Sat, 19 Jul 2014 05:05:51 +0000 (22:05 -0700)]
kern: get rid of unused linux syscall files
Steve Reinhardt [Sat, 19 Jul 2014 05:05:51 +0000 (22:05 -0700)]
syscall emulation: fix DPRINTF arg ordering bug
When we switched getSyscallArg() from explicit arg indices to
the implicit method, some DPRINTF arguments were left as calls
to getSyscallArg(), even though C/C++ doesn't guarantee
anything about the order of invocation of these calls. As a
result, the args could be printed out in arbitrary orders.
Interestingly, this bug has been around since 2009:
http://repo.gem5.org/gem5/rev/
4842482e1bd1
Anthony Gutierrez [Wed, 9 Jul 2014 13:28:15 +0000 (09:28 -0400)]
base: fix operator== for comparing EthAddr objects
this operator uses memcmp() to detect if two EthAddr object have the same
address, however memcmp() will return 0 if all bytes are equal. operator==
returns the return value of memcmp() to indicate whether or not two
address are equal. this is incorrect as it will always give the opposite of
the intended behavior. this patch fixes that problem.
Anthony Gutierrez [Wed, 2 Jul 2014 17:19:13 +0000 (13:19 -0400)]
base: fix some bugs in EthAddr
per the IEEE 802 spec:
1) fixed broadcast() to ensure that all bytes are equal to 0xff.
2) fixed unicast() to ensure that bit 0 of the first byte is equal to 0
3) fixed multicast() to ensure that bit 0 of the first byte is equal to 1, and
that it is not a broadcast.
also the constructors in EthAddr are fixed so that all bytes of data are
initialized.
Radhika Jagtap [Tue, 1 Jul 2014 15:58:22 +0000 (11:58 -0400)]
util: Add DVFS perfLevel to checkpoint upgrade script
This patch updates the checkpoint upgrader script. It adds the _perfLevel
variable in the clock domain and voltage domain simObjects used for DVFS.
Stephan Diestelhorst [Mon, 30 Jun 2014 17:56:06 +0000 (13:56 -0400)]
power: Add basic DVFS support for gem5
Adds DVFS capabilities to gem5, by allowing users to specify lists for
frequencies and voltages in SrcClockDomains and VoltageDomains respectively.
A separate component, DVFSHandler, provides a small interface to change
operating points of the associated domains.
Clock domains will be linked to voltage domains and thus allow separate clock,
but shared voltage lines.
Currently all the valid performance-level updates are performed with a fixed
transition latency as specified for the domain.
Config file example:
...
vd = VoltageDomain(voltage = ['1V','0.95V','0.90V','0.85V'])
tsys.cluster1.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster2.clk_domain.clock = ['1GHz','700MHz','400MHz','230MHz']
tsys.cluster1.clk_domain.domain_id = 0
tsys.cluster2.clk_domain.domain_id = 1
tsys.cluster1.clk_domain.voltage_domain = vd
tsys.cluster2.clk_domain.voltage_domain = vd
tsys.dvfs_handler.domains = [tsys.cluster1.clk_domain,
tsys.cluster2.clk_domain]
tsys.dvfs_handler.enable = True
Andreas Hansson [Mon, 30 Jun 2014 17:56:04 +0000 (13:56 -0400)]
mem: DRAMPower trace formatting script
This patch adds a first version of a script that processes the debug
output and generates a command trace for DRAMPower. This is work in
progress and is intended as a snapshot of ongoing work at this point.
The longer term plan is to link in DRAMPower as a library and have one
instance of the model per rank, and instantiate it based on a struct
passed from gem5. Each command will then be a call to the model and no
parsing of traces will be necessary.
Andreas Hansson [Mon, 30 Jun 2014 17:56:03 +0000 (13:56 -0400)]
mem: DRAMPower trace output
This patch adds a DRAMPower flag to enable off-line DRAM power
analysis using the DRAMPower tool. A new DRAMPower flag is added
and a follow-on patch adds a Python script to post-process the output
and order it based on time stamps.
The long-term goal is to link DRAMPower as a library and provide the
commands through function calls to the model rather than first
printing and then parsing the commands. At the moment it is also up to
the user to ensure that the same DRAM configuration is used by the
gem5 controller model and DRAMPower.
Andreas Hansson [Mon, 30 Jun 2014 17:56:02 +0000 (13:56 -0400)]
mem: Add bank and rank indices as fields to the DRAM bank
This patch adds the index of the bank and rank as a field so that we can
determine the identity of a given bank (reference or pointer) for the
power tracing. We also grab the opportunity of cleaning up the
arguments used for identifying the bank when activating.
Andreas Hansson [Mon, 30 Jun 2014 17:56:01 +0000 (13:56 -0400)]
mem: Extend DRAM row bits from 16 to 32 for larger densities
This patch extends the DRAM row bits to 32 to support larger density
memories. Additional checks are also added to ensure the row fits in
the 32 bits.
Anthony Gutierrez [Mon, 30 Jun 2014 17:50:03 +0000 (13:50 -0400)]
cpu: implement a bi-mode branch predictor
Anthony Gutierrez [Mon, 30 Jun 2014 17:50:01 +0000 (13:50 -0400)]
arm: make the bi-mode predictor the default for O3_ARM_v7a_BP
the branch predictor used in the Cortex-A15 is a bi-mode style predictor,
see:
http://arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf
and
http://nvidia.com/docs/IO/116757/NVIDIA_Quad_a15_whitepaper_FINALv2.pdf
this patch makes the bi-mode predictor the default for the ARM O3 CPU.
Steve Reinhardt [Sun, 22 Jun 2014 21:33:09 +0000 (14:33 -0700)]
stats: update for O3 changes
Mostly small differences in total ticks, but O3 stall causes
shifted significantly.
30.eon does speed up by ~6% on Alpha and ARM, and 50.vortex
by 4.5% on ARM. At the other extreme, X86 70.twolf is 0.8%
slower.
Binh Pham [Sat, 21 Jun 2014 17:39:44 +0000 (10:39 -0700)]
x86: fix table walker assertion
In a cycle, we could see a R and W requests corresponding to the same
page walk being sent to the memory. During the cycle that assertion
happens, we have 2 responses corresponding to the R and W above. We
also have a 'read' variable to keep track of the inflight Read
request, this gets reset to NULL right after we send out any R
request; and gets set to the next R in the page walk when a response
comes back.
The issue we are seeing here is when we get a response for W request,
assert(!read) fires because we got a response for R request right
before this, hence we set 'read' to NOT NULL value, pointing to the
next R request in the pagewalk!
This work was done while Binh was an intern at AMD Research.
Binh Pham [Sat, 21 Jun 2014 17:26:55 +0000 (10:26 -0700)]
o3: make dispatch LSQ full check more selective
Dispatch should not check LSQ size/LSQ stall for non load/store
instructions.
This work was done while Binh was an intern at AMD Research.
Binh Pham [Sat, 21 Jun 2014 17:26:43 +0000 (10:26 -0700)]
o3: split load & store queue full cases in rename
Check for free entries in Load Queue and Store Queue separately to
avoid cases when load cannot be renamed due to full Store Queue and
vice versa.
This work was done while Binh was an intern at AMD Research.
Andreas Hansson [Tue, 10 Jun 2014 21:44:39 +0000 (17:44 -0400)]
scons: Bump the compiler version to gcc 4.6 and clang 3.0
This patch bumps the supported version of gcc from 4.4 to 4.6, and
clang from 2.9 to 3.0. This enables, amongst other things, range-based
for loops, lambda expressions, etc. The STL implementation shipping
with 4.6 also has a full functional implementation of unique_ptr and
shared_ptr.
Joel Hestness [Tue, 10 Jun 2014 03:01:18 +0000 (22:01 -0500)]
Util: Do not style check symlinks
The style checker used to traverse symlinks if they pointed to files, which can
result in style checker failure if the pointed-to file doesn't exist. This
style check is actually unnecessary, since symlinks either point to other files
that are already style checked, or files outside gem5, which shouldn't be
checked. Skip symlinks.
Joel Hestness [Tue, 10 Jun 2014 03:01:16 +0000 (22:01 -0500)]
sim: More rigorous clocking comments
The language describing the clockEdge and nextCycle functions were ambiguous,
and so were prone to misinterpretation/misuse. Clear up the comments to more
rigorously describe their functionality.
Yasuko Eckert [Wed, 4 Jun 2014 14:48:20 +0000 (07:48 -0700)]
ext: Add a McPAT regression tester
Add a regression tester to McPAT. Joel Hestness wrote these tests and Yasuko
Eckert modified them to reflect the new McPAT interface and other changes
the previous patch made.
Yasuko Eckert [Tue, 3 Jun 2014 20:32:59 +0000 (13:32 -0700)]
ext: McPAT interface changes and fixes
This patch includes software engineering changes and some generic bug fixes
Joel Hestness and Yasuko Eckert made to McPAT 0.8. There are still known
issues/concernts we did not have a chance to address in this patch.
High-level changes in this patch include:
1) Making XML parsing modular and hierarchical:
- Shift parsing responsibility into the components
- Read XML in a (mostly) context-free recursive manner so that McPAT input
files can contain arbitrary component hierarchies
2) Making power, energy, and area calculations a hierarchical and recursive
process
- Components track their subcomponents and recursively call compute
functions in stages
- Make C++ object hierarchy reflect inheritance of classes of components
with similar structures
- Simplify computeArea() and computeEnergy() functions to eliminate
successive calls to calculate separate TDP vs. runtime energy
- Remove Processor component (now unnecessary) and introduce a more abstract
System component
3) Standardizing McPAT output across all components
- Use a single, common data structure for storing and printing McPAT output
- Recursively call print functions through component hierarchy
4) For caches, allow splitting data array and tag array reads and writes for
better accuracy
5) Improving the usability of CACTI by printing more helpful warning and error
messages
6) Minor: Impose more rigorous code style for clarity (more work still to be
done)
Overall, these changes greatly reduce the amount of replicated code, and they
improve McPAT runtime and decrease memory footprint.
Yasuko Eckert [Tue, 3 Jun 2014 20:32:53 +0000 (13:32 -0700)]
ext: change McPAT to not force compile in 32-bit mode.
Yasuko Eckert [Tue, 3 Jun 2014 20:32:29 +0000 (13:32 -0700)]
ext: Redirect McPAT object files
All object files and McPAT binaries are moved to directory gem5/build/mcpat/
rather than creating them locally.
Steve Reinhardt [Sun, 1 Jun 2014 01:00:23 +0000 (18:00 -0700)]
style: eliminate equality tests with true and false
Using '== true' in a boolean expression is totally redundant,
and using '== false' is pretty verbose (and arguably less
readable in most cases) compared to '!'.
It's somewhat of a pet peeve, perhaps, but I had some time
waiting for some tests to run and decided to clean these up.
Unfortunately, SLICC appears not to have the '!' operator,
so I had to leave the '== false' tests in the SLICC code.
Nilay Vaish [Sun, 25 May 2014 02:30:46 +0000 (21:30 -0500)]
stats: changes due to recent o3 patch.
Nilay Vaish [Fri, 23 May 2014 11:07:02 +0000 (06:07 -0500)]
stats: changes due to o3 cpu and ruby message buffer patches
Nilay Vaish [Fri, 23 May 2014 11:07:02 +0000 (06:07 -0500)]
ruby: slicc: remove unused ids DNUCA*
Nilay Vaish [Fri, 23 May 2014 11:07:02 +0000 (06:07 -0500)]
ruby: remove old protocol documentation
Nilay Vaish [Fri, 23 May 2014 11:07:02 +0000 (06:07 -0500)]
ruby: message buffer: drop dequeue_getDelayCycles()
The functionality of updating and returning the delay cycles would now be
performed by the dequeue() function itself.
Nilay Vaish [Fri, 23 May 2014 11:07:02 +0000 (06:07 -0500)]
cpu: o3: remove stat totalCommittedInsts
This patch removes the stat totalCommittedInsts. This variable was used for
recording the total number of instructions committed across all the threads
of a core. The instructions committed by each thread are recorded invidually.
The total would now be generated by summing these individual counts.
Anthony Gutierrez [Thu, 15 May 2014 17:26:31 +0000 (13:26 -0400)]
config: remove unecessary assignment of etherlink interfaces
in makeDualRoot() the etherlink interfaces are set using the tsunami interface
however, they are set again a few lines later based on whether or not the system
is a realview or tsunami system; the original assignment is always overwritten
or there will be a fatal. this seems like an artifact from when tsunami was the
only type of system capable of running with the dual option.
Steve Reinhardt [Mon, 12 May 2014 21:23:31 +0000 (14:23 -0700)]
syscall emulation: clean up & comment SyscallReturn
Steve Reinhardt [Mon, 12 May 2014 21:22:17 +0000 (17:22 -0400)]
tests: update t1000 & pc-switcheroo-full stats
committed reference config.json files too
Steve Reinhardt [Sun, 11 May 2014 02:13:51 +0000 (22:13 -0400)]
tests: update eio ref outputs for new stats
Also committed reference config.json files for
the eio tests.
Andreas Hansson [Fri, 9 May 2014 22:58:50 +0000 (18:58 -0400)]
stats: Bump stats for the fixes, and mostly DRAM controller changes
Andreas Hansson [Fri, 9 May 2014 22:58:49 +0000 (18:58 -0400)]
config: Bump DRAM sweep bus speed to match DDR4 config
This patch bumps the bus clock speed such that the interconnect does
not become a bottleneck with a DDR4-2400-x64 DRAM delivering 19.2
GByte/s theoretical max.
Andreas Hansson [Fri, 9 May 2014 22:58:49 +0000 (18:58 -0400)]
tests: Reflect name change in DRAM tests
This patch reflects the recent name change in the DRAM TrafficGen
tests and also tidies up the test directory.
--HG--
rename : tests/configs/tgen-simple-dram.py => tests/configs/tgen-dram-ctrl.py
rename : tests/quick/se/70.tgen/ref/null/none/tgen-simple-dram/config.ini => tests/quick/se/70.tgen/ref/null/none/tgen-dram-ctrl/config.ini
rename : tests/quick/se/70.tgen/ref/null/none/tgen-simple-dram/simerr => tests/quick/se/70.tgen/ref/null/none/tgen-dram-ctrl/simerr
rename : tests/quick/se/70.tgen/ref/null/none/tgen-simple-dram/simout => tests/quick/se/70.tgen/ref/null/none/tgen-dram-ctrl/simout
rename : tests/quick/se/70.tgen/ref/null/none/tgen-simple-dram/stats.txt => tests/quick/se/70.tgen/ref/null/none/tgen-dram-ctrl/stats.txt
rename : tests/quick/se/70.tgen/tgen-simple-dram.cfg => tests/quick/se/70.tgen/tgen-dram-ctrl.cfg
Andreas Hansson [Fri, 9 May 2014 22:58:49 +0000 (18:58 -0400)]
mem: Update DDR3 and DDR4 based on datasheets
This patch makes a more firm connection between the DDR3-1600
configuration and the corresponding datasheet, and also adds a
DDR3-2133 and a DDR4-2400 configuration. At the moment there is also
an ongoing effort to align the choice of datasheets to what is
available in DRAMPower.
Andreas Hansson [Fri, 9 May 2014 22:58:49 +0000 (18:58 -0400)]
mem: Add DRAM cycle time
This patch extends the current timing parameters with the DRAM cycle
time. This is needed as the DRAMPower tool expects timestamps in DRAM
cycles. At the moment we could get away with doing this in a
post-processing step as the DRAMPower execution is separate from the
simulation run. However, in the long run we want the tool to be called
during the simulation, and then the cycle time is needed.
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Simplify DRAM response scheduling
This patch simplifies the DRAM response scheduling based on the
assumption that they are always returned in order.
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Add precharge all (PREA) to the DRAM controller
This patch adds the basic ingredients for a precharge all operation,
to be used in conjunction with DRAM power modelling.
Currently we do not try and apply any cleverness when precharging all
banks, thus even if only a single bank is open we use PREA as opposed
to PRE. At the moment we only have a single tRP (tRPpb), and do not
model the slightly longer all-bank precharge constraint (tRPab).
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Remove printing of DRAM params
This patch removes the redundant printing of DRAM params.
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Add tRTP to the DRAM controller
This patch adds the tRTP timing constraint, governing the minimum time
between a read command and a precharge. Default values are provided
for the existing DRAM types.
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Merge DRAM latency calculation and bank state update
This patch merges the two control paths used to estimate the latency
and update the bank state. As a result of this merging the computation
is now in one place only, and should be easier to follow as it is all
done in absolute (rather than relative) time.
As part of this change, the scheduling is also refined to ensure that
we look at a sensible estimate of the bank ready time in choosing the
next request. The bank latency stat is removed as it ends up being
misleading when the DRAM access code gets evaluated ahead of time (due
to the eagerness of waking the model up for scheduling the next
request).
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Add tWR to DRAM activate and precharge constraints
This patch adds the write recovery time to the DRAM timing
constraints, and changes the current tRASDoneAt to a more generic
preAllowedAt, capturing when a precharge is allowed to take place.
The part of the DRAM access code that accounts for the precharge and
activate constraints is updated accordingly.
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Merge DRAM page-management calculations
This patch treats the closed page policy as yet another case of
auto-precharging, and thus merges the code with that used for the
other policies.
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Add DRAM power states to the controller
This patch adds power states to the controller. These states and the
transitions can be used together with the Micron power model. As a
more elaborate use-case, the transitions can be used to drive the
DRAMPower tool.
At the moment, the power-down modes are not used, and this patch
simply serves to capture the idle, auto refresh and active modes. The
patch adds a third state machine that interacts with the refresh state
machine.
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Ensure DRAM refresh respects timings
This patch adds a state machine for the refresh scheduling to
ensure that no accesses are allowed while the refresh is in progress,
and that all banks are propely precharged.
As part of this change, the precharging of banks of broken out into a
method of its own, making is similar to how activations are dealt
with. The idle accounting is also updated to ensure that the refresh
duration is not added to the time that the DRAM is in the idle state
with all banks precharged.
Andreas Hansson [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
mem: Make DRAM read/write switching less conservative
This patch changes the read/write event loop to use a single event
(nextReqEvent), along with a state variable, thus joining the two
control flows. This change makes it easier to follow the state
transitions, and control what happens when.
With the new loop we modify the overly conservative switching times
such that the write-to-read switch allows bank preparation to happen
in parallel with the bus turn around. Similarly, the read-to-write
switch uses the introduced tRTW constraint.
Ali Saidi [Thu, 17 Apr 2014 21:56:09 +0000 (16:56 -0500)]
arm: Make sure UndefinedInstructions are properly initialized
Ali Saidi [Thu, 17 Apr 2014 21:55:54 +0000 (16:55 -0500)]
arm: allow DC instructions by default so SE mode works
Ali Saidi [Thu, 17 Apr 2014 21:55:05 +0000 (16:55 -0500)]
sim, arm: implement more of the at variety syscalls
Needed for new AArch64 binaries
Andrew Bardsley [Fri, 9 May 2014 22:58:48 +0000 (18:58 -0400)]
cpu: Useful getters for ActivityRecorder
Add some useful getters to ActivityRecorder
Andrew Bardsley [Fri, 9 May 2014 22:58:47 +0000 (18:58 -0400)]
cpu: Add flag name printing to StaticInst
This patch adds a the member function StaticInst::printFlags to allow all
of an instruction's flags to be printed without using the individual
is... member functions or resorting to exposing the 'flags' vector
It also replaces the enum definition StaticInst::Flags with a
Python-generated enumeration and adds to the enum generation mechanism
in src/python/m5/params.py to allow Enums to be placed in namespaces
other than Enums or, alternatively, in wrapper structs allowing them to
be inherited by other classes (so populating that class's name-space
with the enumeration element names).
Andrew Bardsley [Fri, 9 May 2014 22:58:47 +0000 (18:58 -0400)]
cpu: Timebuf const accessors
Add const accessors for timebuf elements.
Andrew Bardsley [Fri, 9 May 2014 22:58:47 +0000 (18:58 -0400)]
arm: Add branch flags onto macroops
Mark branch flags onto macroops to allow branch prediction before
microop decomposition
Andrew Bardsley [Fri, 9 May 2014 22:58:47 +0000 (18:58 -0400)]
cpu: Allow setWhen on trace objects
Allow setting of 'when' in trace records. This allows later times
than the arbitrary record creation point to be used as inst. times
Curtis Dunham [Fri, 9 May 2014 22:58:47 +0000 (18:58 -0400)]
arm: add preliminary ISA splits for ARM arch
Curtis Dunham [Fri, 9 May 2014 22:58:47 +0000 (18:58 -0400)]
arch: teach ISA parser how to split code across files
This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.
The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.
Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.
Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.
Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
Geoffrey Blake [Fri, 9 May 2014 22:58:47 +0000 (18:58 -0400)]
config: Avoid generating a reference to myself for Parent.any
The unproxy code for Parent.any can generate a circular reference
in certain situations with classes hierarchies like those in ClockDomain.py.
This patch solves this by marking ouself as visited to make sure the
search does not resolve to a self-reference.
Geoffrey Blake [Fri, 9 May 2014 22:58:47 +0000 (18:58 -0400)]
arch, arm: Preserve TLB bootUncacheability when switching CPUs
The ARM TLBs have a bootUncacheability flag used to make some loads
and stores become uncacheable when booting in FS mode. Later the
flag is cleared to let those loads and stores operate as normal. When
doing a takeOverFrom(), this flag's state is not preserved and is
momentarily reset until the CPSR is touched. On single core runs this
is a non-issue. On multi-core runs this can lead to crashes on the O3
CPU model from the following series of events:
1) takeOverFrom executed to switch from Atomic -> O3
2) All bootUncacheability flags are reset to true
3) Core2 tries to execute a load covered by bootUncacheability, it
is flagged as uncacheable
4) Core2's load needs to replay due to a pipeline flush
3) Core1 core does an action on CPSR
4) The handling code for CPSR then checks all other cores
to determine if bootUncacheability can be set to false
5) Asynchronously set bootUncacheability on all cores to false
6) Core2 replays load previously set as uncacheable and notices
it is now flagged as cacheable, leads to a panic.
This patch implements takeOverFrom() functionality for the ARM TLBs
to preserve flag values when switching from atomic -> detailed.