These operations serialize atomic-RMW operations with respect
to each other, loads and stores. In addition to this, the
load_depends implementations have been removed.
ck_pr_fence_{load_load,store_store,load_store,store_load} operations
have been added. In addition to this, it is no longer the responsibility
of architecture ports to determine when to emit a specific fence. Instead,
the underlying port will always emit the necessary instructions to
enforce strict ordering. The higher-level include/ck_pr implementation will
enforce whether or not a fence is necessary to be emitted according to
the memory model specified by ck_md (CK_MD_{TSO,RMO,PSO}).
In other words, only ck_pr_fence_strict_* is implemented by the MD-ck_pr
port.
build:
- configure step will generate relevant CFLAGS.
- build profiles are for convenience (developers can use themu
for cross-compilation).
regressions:
- Renamed ck_barrier unit tests to work-around behavior
of Solaris linker.
- Adopted use of a PTHREAD_CFLAGS variable.
ck_cc:
- Added internal CK_CC_IMM macro for compilers that are
verbose against impossible inline constraints (or limited
optimizers).
ck_pr/x86*:
- Adopted CK_CC_IMM macro.
- Dropped redundant constraints.
This work was mostly completed by Theo Schlossnagle
<jesus@omniti.com>, much thanks to him. He has
also provided access to a machine with Sun Studio 12.
There is a bug first generation AMD Opteron processors'
with cpuid family 0Fh and models less than 40h when it
comes to read-modify write operations after load/store
sequence. Not worth supporting this processor.
If you are on this processor, you can find more information
at: http://bugzilla.kernel.org/show_bug.cgi?id=11305#c2
ck_pr_load_32_2 (and thus ck_pr_load_ptr_2) were previously implemented in
terms of lock cmpxchg8b, which is considerably slower than just using movq.
Relevant tests making use of load_ptr_2 still pass, so I'm confident this
change is correct.
It turns out that the "p" constraint doesn't work on clang, so we have to get
rid of that. This means that we may need to require GCC 4.3+ if it turns out
that GCC 4.1 / 4.2 still run out of registers compiling this version.