This was accidentally grouped into previous commit.
The destiny of this interface for internal use is still unclear (in context of
utilization in built-in data structures). The interface is enabled by default
on x86, as it is compatible with read-side prefetch* operations and
binary-compatible with 3DNow! extension. Older compilers will waste an
additional byte (they generate 3DNow! variant) on this, but older compilers
waste more on spillage if we encode instruction. Power support is coming soon.
gcc is smart enough to use an even register for 64bits operations, and provide
a way to access the first and the second words, so use that instead of
hardcoding registers.
This adds support for CAS_64{_VALUE}, CAS_PTR_2{_VALUE},
LOAD_64, STORE_64 and other primitives built on universal
CAS primitive.
Patch submitted by Olivier Houchard <cognet@FreeBSD>.
Besides implementing Thumb 2 supports, this fixes incorrect usage of
"cmp" where "cmpeq" was meant.
Patch submitted by Olivier Houchard <cognet@freebsd>.
These add unnecessary complexity to the ck_pr_fence interface.
Instead, it can be safely assumed that developers will use
ck_pr_fence_X to enforce X -> X ordering.
These operations serialize atomic-RMW operations with respect
to each other, loads and stores. In addition to this, the
load_depends implementations have been removed.
ck_pr_fence_{load_load,store_store,load_store,store_load} operations
have been added. In addition to this, it is no longer the responsibility
of architecture ports to determine when to emit a specific fence. Instead,
the underlying port will always emit the necessary instructions to
enforce strict ordering. The higher-level include/ck_pr implementation will
enforce whether or not a fence is necessary to be emitted according to
the memory model specified by ck_md (CK_MD_{TSO,RMO,PSO}).
In other words, only ck_pr_fence_strict_* is implemented by the MD-ck_pr
port.
build:
- configure step will generate relevant CFLAGS.
- build profiles are for convenience (developers can use themu
for cross-compilation).
regressions:
- Renamed ck_barrier unit tests to work-around behavior
of Solaris linker.
- Adopted use of a PTHREAD_CFLAGS variable.
ck_cc:
- Added internal CK_CC_IMM macro for compilers that are
verbose against impossible inline constraints (or limited
optimizers).
ck_pr/x86*:
- Adopted CK_CC_IMM macro.
- Dropped redundant constraints.
This work was mostly completed by Theo Schlossnagle
<jesus@omniti.com>, much thanks to him. He has
also provided access to a machine with Sun Studio 12.
There is a bug first generation AMD Opteron processors'
with cpuid family 0Fh and models less than 40h when it
comes to read-modify write operations after load/store
sequence. Not worth supporting this processor.
If you are on this processor, you can find more information
at: http://bugzilla.kernel.org/show_bug.cgi?id=11305#c2
ck_pr_load_32_2 (and thus ck_pr_load_ptr_2) were previously implemented in
terms of lock cmpxchg8b, which is considerably slower than just using movq.
Relevant tests making use of load_ptr_2 still pass, so I'm confident this
change is correct.
It turns out that the "p" constraint doesn't work on clang, so we have to get
rid of that. This means that we may need to require GCC 4.3+ if it turns out
that GCC 4.1 / 4.2 still run out of registers compiling this version.
Fix a typo that was causing several validation tests to hang.
(Doing cmpxchg8b (%eax) isn't going to work very well.) I am
wondering if something is wrong with the general implementation
of ck_pr_bts_64 and ck_pr_btc_64 because it's pretty clear that
with the stack tests passing, ck_pr_cas_32_2_value works fine.
Making things work properly with PIC on 32-bit x86 architectures is tricky
because of our lack of %ebx. Additionally, GCC versions < 4.3 have some
problems determining what registers may be reused, causing some of the inline
assembly constraints to be a little counterintuitive. (Thanks to Ian Lance
Taylor for the suggestion to get around the reuse issues.)
This change makes us use sane assembler in cases where we're running
non-PIC and use the heavyweight versions only for PIC. There may still be
some issues in this code; for example, it's apparent that 64-bit btc and
bts intrinsic atomics are broken in the version of GCC I'm using, so those
will have to be implemented.
Additionally, the ck_stack tests currently don't work with fPIC (not sure
if that's the fault of the tests or the port). Everything does pass now in
non-PIC, excluding btc/bts tests (in my current environment).
Make the assumption that all of our 32-bit x86 architecture targets
have SSE and SSE2. This allows us to use MOVQ, which is nicer than
using cmpxchg8b for loads / stores.
Fix up some of the CAS stuff for -fPIC. This isn't entirely done,
and at least ck_fifo_mpmc hangs with this code. Not entirely sure
why.
Add APIs for doing atomic CAS/load/store/etc on 32-bit platforms. In
some cases this also includes operations on 64-bit integers using
cmpxchg8b. It is possible we could do some additional stuff on larger
integers using SSE, but the goal of this port is to target i586/k5 and
newer processors.
Samy mentions that we may want to do something or another for making
portability easier, but I wasn't paying tons of attention when he was
talking, so I forget what that was all about. Plus it's funny to write
in a commit message.
Haven't done a full run-through of validate / benchmark tests, but
a cursory runthrough seems to indicate all passing.
Moved rdtsc and affinity logic to a single file which other
regression tests use. Single point of reference will ease
porting these to future architectures and platforms. Removed
invalid Copyright statement.
Added CK_CC_USED to force some code generation that I found
useful for debugging.
Added ck_stack latency tests and a modified version of djoseph's
modifications to benchmark.h for spinlock latency tests.