Make the assumption that all of our 32-bit x86 architecture targets
have SSE and SSE2. This allows us to use MOVQ, which is nicer than
using cmpxchg8b for loads / stores.
Fix up some of the CAS stuff for -fPIC. This isn't entirely done,
and at least ck_fifo_mpmc hangs with this code. Not entirely sure
why.
Add APIs for doing atomic CAS/load/store/etc on 32-bit platforms. In
some cases this also includes operations on 64-bit integers using
cmpxchg8b. It is possible we could do some additional stuff on larger
integers using SSE, but the goal of this port is to target i586/k5 and
newer processors.
Samy mentions that we may want to do something or another for making
portability easier, but I wasn't paying tons of attention when he was
talking, so I forget what that was all about. Plus it's funny to write
in a commit message.
Haven't done a full run-through of validate / benchmark tests, but
a cursory runthrough seems to indicate all passing.