ck_ec implements 32- and (on 64 bit platforms) 64- bit event
counts. Event counts let us easily integrate OS-level blocking (e.g.,
futexes) in lock-free protocols. Waking up waiters only locks in the
OS kernel, and does not happen at all when no waiter is blocked.
Waiters only block conditionally, if the event count's value is
still equal to some prior value.
ck_ec supports multiple producers (wakers) and consumers (waiters),
and, on x86-TSO, has a more efficient specialisation for single
producer mode. In the latter mode, the overhead compared to a version
counter is on the order of 2-3 cycles and 1-2 instructions, in the
fast path. The slow path, when there are threads blocked on the event
count, consists of one additional atomic instruction and a futex
syscall.
Similarly, the fast path for consumers, when an update comes quickly,
has no overhead compared to spinning on a read-only counter. After
a few thousand cycles, consumers (waiters) enter the slow path with
one atomic instruction and a few blocking syscalls.
The single-producer specialisation requires the x86-TSO memory model,
x86's non-atomic read-modify-write instructions, and, ideally a
futex-like OS abstraction. On !x86/x86_64 platforms, single producer
increments fall back to the multiple producer code path.
Fixes https://github.com/concurrencykit/ck/issues/79
build: add linux-ppc64le target.
There appears to be a regression on the target localized to epoch section optimization. I will need to investigate further.
* Implement ck_pr_dec_is_zero family of functions
* include/ck_pr.h: add ck_pr_{dec,inc}_is_zero and implement ck_pr_{dec,inc}_zero in terms of the new functions. Convert the architecture-specific implementations of ck_pr_foo_zero for x86 and x86-64 to ck_pr_foo_is_zero.
* regressions/ck_pr/validate: add smoke tests for ck_pr_dec_{,is_}zero and ck_pr_inc_{,is_}zero
* doc: document ck_pr_inc_is_zero
This primarily affects the FreeBSD kernel, where the popcount builtin
can be problematic (relies on compiler-provided libraries). See the
history of __POPCNT__ for details [1].
- A new flag, CK_MD_CC_BUILTIN_DISABLE, can be set to indicate that CK
should not rely on compiler builtins when possible.
- ck_cc_clz has been removed, it was unused.
- ck_internal_bsf has been removed, it was duplicate of ck_cc_ffs but broken,
replaced in favor of ck_cc_ffs. Previous consumers were using the bsf
instruction, eitherway.
- ck_{rhs,hs,ht} have been updated to use ck_cc_ffs*.
If FreeBSD requires the builtins for performance reasons, we will lift the
appropriate detection into ck_md (at least, bt* bs* family of functions don't
have the same problems on most targets unlike popcount).
1: https://lists.freebsd.org/pipermail/svn-src-head/2015-March/069663.html
Memoize the map into ck_hs_iterator_t to make iteration more safe in the face of growth or shrinkage of the map. Tests for same.
Work from Riley Berton.