This is still not really correct, since at least for sm 4.0
the nesting limit is 64 per subroutine, and subroutine nesting itself
has a limit of 32, so since we have a flat stack we'd need 32*64.
But this should probably be better fixed with per-subroutine stacks,
since otherwise these structures get really big (like 100kB for the
lp_exec_mask).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
/**
* Maximum control flow nesting
*
- * SM3.0 requires 24
+ * SM4.0 requires 64 (per subroutine actually, subroutine nesting itself is 32)
+ * SM3.0 requires 24 (most likely per subroutine too)
+ * add 2 more (some translation could add one more)
*/
-#define LP_MAX_TGSI_NESTING 32
+#define LP_MAX_TGSI_NESTING 66
/**
* Maximum iterations before loop termination