I was actually unsure of the exact memory model
I wanted for atomic RMW operations. It was
made apparent with time that I had to adopt RMO
if I didn't want to sacrifice performance. Make
sure we can assume RMO for the stack.
CK_CC_PACKED will drop structures to one-byte alignment in certain
cases. Obviously, this will mean bad performance on most architectures.
Thanks to Matt Johnson from https://rigel.crhc.illinois.edu/ for
reporting this problem.