Make the assumption that all of our 32-bit x86 architecture targets
have SSE and SSE2. This allows us to use MOVQ, which is nicer than
using cmpxchg8b for loads / stores.
Fix up some of the CAS stuff for -fPIC. This isn't entirely done,
and at least ck_fifo_mpmc hangs with this code. Not entirely sure
why.
CK_CC_PACKED will drop structures to one-byte alignment in certain
cases. Obviously, this will mean bad performance on most architectures.
Thanks to Matt Johnson from https://rigel.crhc.illinois.edu/ for
reporting this problem.
Add APIs for doing atomic CAS/load/store/etc on 32-bit platforms. In
some cases this also includes operations on 64-bit integers using
cmpxchg8b. It is possible we could do some additional stuff on larger
integers using SSE, but the goal of this port is to target i586/k5 and
newer processors.
Samy mentions that we may want to do something or another for making
portability easier, but I wasn't paying tons of attention when he was
talking, so I forget what that was all about. Plus it's funny to write
in a commit message.
Haven't done a full run-through of validate / benchmark tests, but
a cursory runthrough seems to indicate all passing.
Moved rdtsc and affinity logic to a single file which other
regression tests use. Single point of reference will ease
porting these to future architectures and platforms. Removed
invalid Copyright statement.
Added CK_CC_USED to force some code generation that I found
useful for debugging.
Added ck_stack latency tests and a modified version of djoseph's
modifications to benchmark.h for spinlock latency tests.