The barriers have been restructured into individual file
per implementation. Some micro-optimizations were implemented
for some barriers (caching common computations in the barrier).
State subsription is now explicit with the TID counter allocated
on a per-barrier basis.
Tournament barriers remaining and then another round will be done
for correctness and algorithmic improvements.
These are the tournament and mcs barriers from "Algorithms for Scalable
Synchronization on Shared-Memory Multiprocessors." Validation tests have
also been added for these barriers to regressions/ck_barrier/validate.
This is the software combining tree barrier from the MCS paper. Currently,
it uses a binary tree; it may be changed later to use an n-ary tree.
Validation (combining_validation.c in regressions/ck_barrier/validate)
has also been added.
Instead of executing a join on the threads, aggregate existing state
of their counters. This can mean one missed increment per thread,
which is not a big deal.