Both LLVM-backed compilers and GCC incorrectly treat
a barrier-sandwiched load as a loop invariant in dequeue_spmc.
Forcing volatile atomic load semantics generates the right
thing.
Thanks to Devon O'Dell and Abel Mathew for help in catching
this issue.