This is an example limitation of fence_X_Y variant. I am
considering extending this to include an acquire extension.
Use a memory fence to force total order in a manner that
will be clearer to other developers who read this.
This did not manifest as a problem on any target architectures
due to their handling of atomic operations (SPARC models it as
both a load and a store, while Power atomic_load ordering was
enforced through a full barrier).
It is possible this will be moved to a self-contained file.
For a majority of architectures, RTM is an unnecessary
implementation-specific optimization.
It is possible for a defragmenting set or swap operation
to set a tombstone. If the probe sequence does not encounter
an empty slot and hits maximum write-side probe limit first
for it to fail to reprobe defragmenting store.
I accidentally removed ck_pr_fence implicit compiler
barrier semantics in re-structure of ck_pr_fence.
This does affect the correctness of any data structures
in ck_pr_fence or the correctness of consumers of ck_pr
operations where ck_pr serves as linearization points.
The reason it does not affect any CK data structures is
that explicit compiler barriers (whether they are store/load
operations or atomic ready-modify-write operations) always
serve as linearization points.
However, if consumers are doing tricky things like using
these barriers to serialize aliased locations for correctness,
then it is possible for compiler re-ordering to bite them in
the ass.
These add unnecessary complexity to the ck_pr_fence interface.
Instead, it can be safely assumed that developers will use
ck_pr_fence_X to enforce X -> X ordering.
This includes fixing acquire semantics on mcs_lock fast path.
This represents an additional fence on the fast path for
acquire semantics post-acquisition.