1. For ck_pr_cas_foo_value, let the inline assembly save the observed
value in a register, and store to the output reference in C.
This lets the C optimiser eliminate the memory access once the
CAS function is inlined.
2. Specify the result of the CAS as a condition code in EFLAGS instead
of executing SETcc in inline assembly, when possible. GCC gained
this functionality in GCC 6; CAS loops can now directly branch on
the condition code, without SETcc / TEST.
TESTED=existing regression tests.