The barriers have been restructured into individual file
per implementation. Some micro-optimizations were implemented
for some barriers (caching common computations in the barrier).
State subsription is now explicit with the TID counter allocated
on a per-barrier basis.
Tournament barriers remaining and then another round will be done
for correctness and algorithmic improvements.
These are the tournament and mcs barriers from "Algorithms for Scalable
Synchronization on Shared-Memory Multiprocessors." Validation tests have
also been added for these barriers to regressions/ck_barrier/validate.
This is the software combining tree barrier from the MCS paper. Currently,
it uses a binary tree; it may be changed later to use an n-ary tree.
Validation (combining_validation.c in regressions/ck_barrier/validate)
has also been added.
Instead of executing a join on the threads, aggregate existing state
of their counters. This can mean one missed increment per thread,
which is not a big deal.
Moved rdtsc and affinity logic to a single file which other
regression tests use. Single point of reference will ease
porting these to future architectures and platforms. Removed
invalid Copyright statement.
Added CK_CC_USED to force some code generation that I found
useful for debugging.
Added ck_stack latency tests and a modified version of djoseph's
modifications to benchmark.h for spinlock latency tests.