This is an example limitation of fence_X_Y variant. I am
considering extending this to include an acquire extension.
Use a memory fence to force total order in a manner that
will be clearer to other developers who read this.
This did not manifest as a problem on any target architectures
due to their handling of atomic operations (SPARC models it as
both a load and a store, while Power atomic_load ordering was
enforced through a full barrier).
It is possible this will be moved to a self-contained file.
For a majority of architectures, RTM is an unnecessary
implementation-specific optimization.
As ck_pr semantics were still not molded, I was designing
under the assumption I would potentially go towards
acq/req interface. Since RMO will be the semantic norm for
the ck_pr model from now on, enforce stricter ordering
requirements on rwlock.
ck_rwlock_write_unlock function will now also serialize both
loads and stores.