fix (per-vertex) fog when using ARB_vp by incorporating fog factor computation into the vertex program (not yet fixed for swtnl). Simplify (and correct) the VTX_TCL_OUTPUT_VTXFMT handling when using vertex programs, turns out it's solely driven by the needs of the past-vertex stage of the pipeline, this should fix lockups with ill-specified applications using vertex programs (for instance applications enabling fog but not writing to fog coord output will now get (conformant) undefined results instead of lockups).