From f843aabdd42051abc8b22437d5c9167fc867ac46 Mon Sep 17 00:00:00 2001
From: =?utf8?q?Kristian=20H=C3=B8gsberg=20Kristensen?=
 <kristian.h.kristensen@intel.com>
Date: Mon, 22 Feb 2016 09:14:25 -0800
Subject: [PATCH] intel/genxml: Add README

I've had people ask about the design of the pack functions, for example,
why aren't we using bitfields. I wrote up a bit of background on why and
how we ended up with the current design and we might as well keep that
with the code.
---
 src/intel/genxml/README | 60 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 src/intel/genxml/README

diff --git a/src/intel/genxml/README b/src/intel/genxml/README
new file mode 100644
index 00000000000..bc518c60bad
--- /dev/null
+++ b/src/intel/genxml/README
@@ -0,0 +1,60 @@
+This provides some background the design of the generated headers.  We
+started out trying to generate bit fields but it evolved into the pack
+functions because of a few limitations:
+
+  1) Bit fields still generate terrible code today. Even with modern
+     optimizing compilers you get multiple load+mask+store operations
+     to the same dword in memory as you set individual bits. The
+     compiler also has to generate code to mask out overflowing values
+     (for example, if you assign 200 to a 2 bit field). Our driver
+     never writes overflowing values so that's not needed. On the
+     other hand, most compiler recognize that the template struct we
+     use is a temporary variable and copy propagate the individual
+     fields and do amazing constant folding.  You should take a look
+     at the code that gets generated when you compile in release mode
+     with optimizations.
+
+  2) For some types we need to have overlapping bit fields. For
+     example, some values are 64 byte aligned 32 bit offsets. The
+     lower 5 bits of the offset are always zero, so the hw packs in a
+     few misc bits in the lower 5 bits there. Other times a field can
+     be either a u32 or a float. I tried to do this with overlapping
+     anonymous unions and it became a big mess. Also, when using
+     initializers, you can only initialize one union member so this
+     just doesn't work with out approach.
+
+     The pack functions on the other hand allows us a great deal of
+     flexibility in how we combine things. In the case of overlapping
+     fields (the u32 and float case), if we only set one of them in
+     the pack function, the compiler will recognize that the other is
+     initialized to 0 and optimize out the code to or it it.
+
+  3) Bit fields (and certainly overlapping anonymous unions of bit
+     fields) aren't generally stable across compilers in how they're
+     laid out and aligned. Our pack functions let us control exactly
+     how things get packed, using only simple and unambiguous bitwise
+     shifting and or'ing that works on any compiler.
+
+Once we have the pack function it allows us to hook in various
+transformations and validation as we go from template struct to dwords
+in memory:
+
+  1) Validation: As I said above, our driver isn't supposed to write
+     overflowing values to the fields, but we've of course had lots of
+     cases where we make mistakes and write overflowing values. With
+     the pack function, we can actually assert on that and catch it at
+     runtime.  bitfields would just silently truncate.
+
+  2) Type conversions: some times it's just a matter of writing a
+     float to a u32, but we also convert from bool to bits, from
+     floats to fixed point integers.
+
+  3) Relocations: whenever we have a pointer from one buffer to
+     another (for example a pointer from the meta data for a texture
+     to the raw texture data), we have to tell the kernel about it so
+     it can adjust the pointer to point to the final location. That
+     means extra work we have to do extra work to record and annotate
+     the dword location that holds the pointer. With bit fields, we'd
+     have to call a function to do this, but with the pack function we
+     generate code in the pack function to do this for us. That's a
+     lot less error prone and less work.
-- 
2.30.2