This primarily affects the FreeBSD kernel, where the popcount builtin
can be problematic (relies on compiler-provided libraries). See the
history of __POPCNT__ for details [1].
- A new flag, CK_MD_CC_BUILTIN_DISABLE, can be set to indicate that CK
should not rely on compiler builtins when possible.
- ck_cc_clz has been removed, it was unused.
- ck_internal_bsf has been removed, it was duplicate of ck_cc_ffs but broken,
replaced in favor of ck_cc_ffs. Previous consumers were using the bsf
instruction, eitherway.
- ck_{rhs,hs,ht} have been updated to use ck_cc_ffs*.
If FreeBSD requires the builtins for performance reasons, we will lift the
appropriate detection into ck_md (at least, bt* bs* family of functions don't
have the same problems on most targets unlike popcount).
1: https://lists.freebsd.org/pipermail/svn-src-head/2015-March/069663.html
Annotate fall through cases in switch statements where that behavior is
desirable to quiet compiler warnings with the -Wimplicit-fallthrough
flag. The annotation format used is supported by both GCC and Clang.
Fixes#108.
Memoize the map into ck_hs_iterator_t to make iteration more safe in the face of growth or shrinkage of the map. Tests for same.
Work from Riley Berton.
This is in preparation for upcoming work for allowing record sharing.
The write-side operations rely only on global state. Future work, we can play
tricks by caching latest call epoch while still building on the core EBR
concept.
An idle grace period requires all threads to be idle. This optimization
introduced a regression with idle detection if subset of threads are
both active and idle. Unfortunately, none of our test machines detected
the problem.
This issue was reported by Julie Zhao <julie.zhao@sparkpos....>
- ck_epoch_begin: Disallow early load of epoch as it leads to measurable
performance degradation in some benchmarks.
- ck_epoch_synchronize: Enforce barrier semantics.
The default value is still 50, but that may be revisited later.
Also, pre-calculate the max number of entries before growing, toi avoid
having to do it at each insert.
We use some macro trickery to enforce that ck_pr_store_* is actually
storing the correct type into the target variable, without any actual
side effects--by making the assignment into an rvalue and using a
comma expression, the compiler should optimize it away.
On the load side, we simply cast the result to the type of the target
variable for pointer loads.
There is an unsafe version of the store_ptr macro called
ck_pr_store_ptr_unsafe for those times when you are _really_ sure that
you know what you're doing.
This commit also updates some of the source files (ck_ht, ck_hs,
ck_rhs): ck_ht now uses the unsafe macro, as its conversion between
uintptr_t and void * is invalid under the new macros. ck_hs and ck_rhs
have had some casts added to preserve validity.
This has been on the TODO for a while and helps reduce
read-side retries. It also has the advantage of providing
true wait-freedom on insertion (including termination safety).
The tombstone and version counter update invariant was not respected
in all necessary places.
- If a concurrent load operation is preempted after observing
the version counter and key field of a slot, then the slot is moved
and re-used by another key-value pair, the load operation would
observe an inconsistent pair without the relevant version counter
update.
- On RMO architectures, a store fence was missing on the delete path
(tombstone placement must always be followed by version counter
update).