Add APIs for doing atomic CAS/load/store/etc on 32-bit platforms. In
some cases this also includes operations on 64-bit integers using
cmpxchg8b. It is possible we could do some additional stuff on larger
integers using SSE, but the goal of this port is to target i586/k5 and
newer processors.
Samy mentions that we may want to do something or another for making
portability easier, but I wasn't paying tons of attention when he was
talking, so I forget what that was all about. Plus it's funny to write
in a commit message.
Haven't done a full run-through of validate / benchmark tests, but
a cursory runthrough seems to indicate all passing.