v3d: Delay emitting ldvpm on V3D 4.x until it's actually used.
For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need
to do VPM setup when the load_inputs are out of order. For V3D 4.x, we
can reduce register pressure by delaying our loads until they're actually
needed. This also avoids a bunch of silly MOVs in the pre-opt VIR dump.
total instructions in shared programs:
6421415 ->
6419933 (-0.02%)
total uniforms in shared programs:
2393139 ->
2393140 (<.01%)
total threads in shared programs: 153864 -> 153906 (0.03%)