--- /dev/null
+This provides some background the design of the generated headers. We
+started out trying to generate bit fields but it evolved into the pack
+functions because of a few limitations:
+
+ 1) Bit fields still generate terrible code today. Even with modern
+ optimizing compilers you get multiple load+mask+store operations
+ to the same dword in memory as you set individual bits. The
+ compiler also has to generate code to mask out overflowing values
+ (for example, if you assign 200 to a 2 bit field). Our driver
+ never writes overflowing values so that's not needed. On the
+ other hand, most compiler recognize that the template struct we
+ use is a temporary variable and copy propagate the individual
+ fields and do amazing constant folding. You should take a look
+ at the code that gets generated when you compile in release mode
+ with optimizations.
+
+ 2) For some types we need to have overlapping bit fields. For
+ example, some values are 64 byte aligned 32 bit offsets. The
+ lower 5 bits of the offset are always zero, so the hw packs in a
+ few misc bits in the lower 5 bits there. Other times a field can
+ be either a u32 or a float. I tried to do this with overlapping
+ anonymous unions and it became a big mess. Also, when using
+ initializers, you can only initialize one union member so this
+ just doesn't work with out approach.
+
+ The pack functions on the other hand allows us a great deal of
+ flexibility in how we combine things. In the case of overlapping
+ fields (the u32 and float case), if we only set one of them in
+ the pack function, the compiler will recognize that the other is
+ initialized to 0 and optimize out the code to or it it.
+
+ 3) Bit fields (and certainly overlapping anonymous unions of bit
+ fields) aren't generally stable across compilers in how they're
+ laid out and aligned. Our pack functions let us control exactly
+ how things get packed, using only simple and unambiguous bitwise
+ shifting and or'ing that works on any compiler.
+
+Once we have the pack function it allows us to hook in various
+transformations and validation as we go from template struct to dwords
+in memory:
+
+ 1) Validation: As I said above, our driver isn't supposed to write
+ overflowing values to the fields, but we've of course had lots of
+ cases where we make mistakes and write overflowing values. With
+ the pack function, we can actually assert on that and catch it at
+ runtime. bitfields would just silently truncate.
+
+ 2) Type conversions: some times it's just a matter of writing a
+ float to a u32, but we also convert from bool to bits, from
+ floats to fixed point integers.
+
+ 3) Relocations: whenever we have a pointer from one buffer to
+ another (for example a pointer from the meta data for a texture
+ to the raw texture data), we have to tell the kernel about it so
+ it can adjust the pointer to point to the final location. That
+ means extra work we have to do extra work to record and annotate
+ the dword location that holds the pointer. With bit fields, we'd
+ have to call a function to do this, but with the pack function we
+ generate code in the pack function to do this for us. That's a
+ lot less error prone and less work.