intel/compiler: Properly consider UBO loads that cross 32B boundaries.
The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.
For example, if a UBO contained the following, tightly packed:
vec4 a; // [0, 16)
float b; // [16, 20)
vec4 c; // [20, 36)
then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.
Similarly, dvec4s would suffer from the same problem.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>