ck_epoch_reclaim is now the replacement for ck_epoch_flush.
ck_epoch_purge guarantees that all entries are reclaimed
for the provided record before exiting.
n_peak counter has been added, which provides the peak number
of items across all reclamation lists. n_reclamations provides
the number of reclamations across the lifetime of the record.
These are cleared on unregister.
ck_epoch_update has been renamed to ck_epoch_tick.
Hazardous sections which mutate shared structures are now
expected to begin with ck_epoch_write_begin and end with
ck_epoch_end.
Hazardous sections which read shared structures are now
expected to begin with ck_epoch_read_begin and end with
ck_epoch_end.
ck_hp_free is now more aggressive. It will attempt a
reclamation cycle any time the pending count is long.
I should probably add a ck_hp_retire to have a version
which allows for bulk updates to local reclamation lists.
Recycle will just be a bottleneck. The MPMC interface should instead
return a junk pointer and allow the user to manage its lifetime in
a way they see fit.
There is a bug first generation AMD Opteron processors'
with cpuid family 0Fh and models less than 40h when it
comes to read-modify write operations after load/store
sequence. Not worth supporting this processor.
If you are on this processor, you can find more information
at: http://bugzilla.kernel.org/show_bug.cgi?id=11305#c2
The barriers have been restructured into individual file
per implementation. Some micro-optimizations were implemented
for some barriers (caching common computations in the barrier).
State subsription is now explicit with the TID counter allocated
on a per-barrier basis.
Tournament barriers remaining and then another round will be done
for correctness and algorithmic improvements.
These are the tournament and mcs barriers from "Algorithms for Scalable
Synchronization on Shared-Memory Multiprocessors." Validation tests have
also been added for these barriers to regressions/ck_barrier/validate.
ck_pr_load_32_2 (and thus ck_pr_load_ptr_2) were previously implemented in
terms of lock cmpxchg8b, which is considerably slower than just using movq.
Relevant tests making use of load_ptr_2 still pass, so I'm confident this
change is correct.
C casts to unsigned int by default, so we were experiencing some negative
undefined behavior in the 1 << 31 case. x86 now works; bts and btc are
both passing.
It turns out that the "p" constraint doesn't work on clang, so we have to get
rid of that. This means that we may need to require GCC 4.3+ if it turns out
that GCC 4.1 / 4.2 still run out of registers compiling this version.