It turns out that the "p" constraint doesn't work on clang, so we have to get
rid of that. This means that we may need to require GCC 4.3+ if it turns out
that GCC 4.1 / 4.2 still run out of registers compiling this version.
Fix a typo that was causing several validation tests to hang.
(Doing cmpxchg8b (%eax) isn't going to work very well.) I am
wondering if something is wrong with the general implementation
of ck_pr_bts_64 and ck_pr_btc_64 because it's pretty clear that
with the stack tests passing, ck_pr_cas_32_2_value works fine.
This is the software combining tree barrier from the MCS paper. Currently,
it uses a binary tree; it may be changed later to use an n-ary tree.
Validation (combining_validation.c in regressions/ck_barrier/validate)
has also been added.
Making things work properly with PIC on 32-bit x86 architectures is tricky
because of our lack of %ebx. Additionally, GCC versions < 4.3 have some
problems determining what registers may be reused, causing some of the inline
assembly constraints to be a little counterintuitive. (Thanks to Ian Lance
Taylor for the suggestion to get around the reuse issues.)
This change makes us use sane assembler in cases where we're running
non-PIC and use the heavyweight versions only for PIC. There may still be
some issues in this code; for example, it's apparent that 64-bit btc and
bts intrinsic atomics are broken in the version of GCC I'm using, so those
will have to be implemented.
Additionally, the ck_stack tests currently don't work with fPIC (not sure
if that's the fault of the tests or the port). Everything does pass now in
non-PIC, excluding btc/bts tests (in my current environment).
Make the assumption that all of our 32-bit x86 architecture targets
have SSE and SSE2. This allows us to use MOVQ, which is nicer than
using cmpxchg8b for loads / stores.
Fix up some of the CAS stuff for -fPIC. This isn't entirely done,
and at least ck_fifo_mpmc hangs with this code. Not entirely sure
why.
CK_CC_PACKED will drop structures to one-byte alignment in certain
cases. Obviously, this will mean bad performance on most architectures.
Thanks to Matt Johnson from https://rigel.crhc.illinois.edu/ for
reporting this problem.
Add APIs for doing atomic CAS/load/store/etc on 32-bit platforms. In
some cases this also includes operations on 64-bit integers using
cmpxchg8b. It is possible we could do some additional stuff on larger
integers using SSE, but the goal of this port is to target i586/k5 and
newer processors.
Samy mentions that we may want to do something or another for making
portability easier, but I wasn't paying tons of attention when he was
talking, so I forget what that was all about. Plus it's funny to write
in a commit message.
Haven't done a full run-through of validate / benchmark tests, but
a cursory runthrough seems to indicate all passing.
Moved rdtsc and affinity logic to a single file which other
regression tests use. Single point of reference will ease
porting these to future architectures and platforms. Removed
invalid Copyright statement.
Added CK_CC_USED to force some code generation that I found
useful for debugging.
Added ck_stack latency tests and a modified version of djoseph's
modifications to benchmark.h for spinlock latency tests.