* Implement ck_pr_dec_is_zero family of functions
* include/ck_pr.h: add ck_pr_{dec,inc}_is_zero and implement ck_pr_{dec,inc}_zero in terms of the new functions. Convert the architecture-specific implementations of ck_pr_foo_zero for x86 and x86-64 to ck_pr_foo_is_zero.
* regressions/ck_pr/validate: add smoke tests for ck_pr_dec_{,is_}zero and ck_pr_inc_{,is_}zero
* doc: document ck_pr_inc_is_zero
--platform let you set the platform, instead of relying on uname -m
--use-cc-builtins force the usage of gcc atomic builtins, instead of using the one provided by CK.
These primitives are meant to be used in lock implementations
where control dependency ordering is sufficient to enforce
ordering of critical section. At the moment, this only affects
PPC. Currently, we rely on lwsync for entry into critical sections
which is insufficient. sync is rather heavy-weight, and assuming
we aren't falling victim into compiler re-ordering, isync should
be sufficient.
There is follow-up work to be done in ARM, as we may have cheaper
(but target-specialized) ISB-tricks for load-load ordering.
We use some macro trickery to enforce that ck_pr_store_* is actually
storing the correct type into the target variable, without any actual
side effects--by making the assignment into an rvalue and using a
comma expression, the compiler should optimize it away.
On the load side, we simply cast the result to the type of the target
variable for pointer loads.
There is an unsafe version of the store_ptr macro called
ck_pr_store_ptr_unsafe for those times when you are _really_ sure that
you know what you're doing.
This commit also updates some of the source files (ck_ht, ck_hs,
ck_rhs): ck_ht now uses the unsafe macro, as its conversion between
uintptr_t and void * is invalid under the new macros. ck_hs and ck_rhs
have had some casts added to preserve validity.
I accidentally removed ck_pr_fence implicit compiler
barrier semantics in re-structure of ck_pr_fence.
This does affect the correctness of any data structures
in ck_pr_fence or the correctness of consumers of ck_pr
operations where ck_pr serves as linearization points.
The reason it does not affect any CK data structures is
that explicit compiler barriers (whether they are store/load
operations or atomic ready-modify-write operations) always
serve as linearization points.
However, if consumers are doing tricky things like using
these barriers to serialize aliased locations for correctness,
then it is possible for compiler re-ordering to bite them in
the ass.
These add unnecessary complexity to the ck_pr_fence interface.
Instead, it can be safely assumed that developers will use
ck_pr_fence_X to enforce X -> X ordering.
More specifically, note that in memory models where atomic
operations do not have serializing effects that atomic
read-modify-write operations are modeled as store operations.
These operations serialize atomic-RMW operations with respect
to each other, loads and stores. In addition to this, the
load_depends implementations have been removed.
ck_pr_fence_{load_load,store_store,load_store,store_load} operations
have been added. In addition to this, it is no longer the responsibility
of architecture ports to determine when to emit a specific fence. Instead,
the underlying port will always emit the necessary instructions to
enforce strict ordering. The higher-level include/ck_pr implementation will
enforce whether or not a fence is necessary to be emitted according to
the memory model specified by ck_md (CK_MD_{TSO,RMO,PSO}).
In other words, only ck_pr_fence_strict_* is implemented by the MD-ck_pr
port.
C casts to unsigned int by default, so we were experiencing some negative
undefined behavior in the 1 << 31 case. x86 now works; bts and btc are
both passing.