In an attempt to prevent gcc from emiting warnings, ck_pr_md_load_ptr and
ck_pr_md_store_ptr were made wrong in commit
5ae12a19d0.
load_ptr would return target instead of *target, and store would store the
value in target instead of in *target.
This is an attempt at fixing this, while still trying to avoid warnings.
This primarily affects the FreeBSD kernel, where the popcount builtin
can be problematic (relies on compiler-provided libraries). See the
history of __POPCNT__ for details [1].
- A new flag, CK_MD_CC_BUILTIN_DISABLE, can be set to indicate that CK
should not rely on compiler builtins when possible.
- ck_cc_clz has been removed, it was unused.
- ck_internal_bsf has been removed, it was duplicate of ck_cc_ffs but broken,
replaced in favor of ck_cc_ffs. Previous consumers were using the bsf
instruction, eitherway.
- ck_{rhs,hs,ht} have been updated to use ck_cc_ffs*.
If FreeBSD requires the builtins for performance reasons, we will lift the
appropriate detection into ck_md (at least, bt* bs* family of functions don't
have the same problems on most targets unlike popcount).
1: https://lists.freebsd.org/pipermail/svn-src-head/2015-March/069663.html
A note has also been added around some ambiguity with respect to WC
memory and relaxed memory semantics (so, heavier-weight mfence semantics
for strict acquire-release interface).
All fences related to atomic operations have been removed as they were
just unnecessary, and so, confusing.
Memoize the map into ck_hs_iterator_t to make iteration more safe in the face of growth or shrinkage of the map. Tests for same.
Work from Riley Berton.
With preemption, it is possible for _ck_ring_enqueue_mp to have a
snapshot of p_head so stale with respect to the later snapshot of
c_head that a comparison modulo (small) ring size will erroneously
conclude that the ring is full.
Detect that case and retry rather than failing. We only retry when
the enqueuers have made global forward progress, so the first loop
is as lock-free as it ever was.
Bonus: the new condition should be marginally faster.
Add a new configure option, --enable-lse, which is only effective for
the AArch64 architecture. When used, most ck_pr_* atomics will use Large
System Extensions instructions as per the ARMv8.1 specification, rather
then LL/SC instruction pairs.
We don't have to claim we will read the value from variables when we do not,
this was only done to work around a bug on some versions of gcc for arm
a while ago, hopefully this won't be needed here.
This should fix the (harmless) warnings described in issue #83.
- ck_epoch_begin: Disallow early load of epoch as it leads to measurable
performance degradation in some benchmarks.
- ck_epoch_synchronize: Enforce barrier semantics.
Break out internal implementations to _mp and _sc variants from which
public interface is built on. Do not rely on macro. Adopt CK_CC_RESTRICT
instead of using restrict directly.