This borrows from a technique described by Purcell and Harris
in "Non-blocking hashtables with open addressing" technical report.
Essentially, every slot will have an associated local probe maxim,
including tombstones. Highly aggressive workloads may still require
occassional garbage collection.
This function allows for faster insertions into tombstone-heavy
probe sequences by short-circuiting on tombstones rather than
continuing to probe. The user must already guarantee that the
entry being inserted is unique. If a non-unique key is inserted
with this operation, undefined behavior will result.
Could not find suitable use-case and generally doesn't
appear interesting to academics in the existing
form. Maybe it will make a come-back in the future with
fewer memory and latency compromises.
This adds support for CAS_64{_VALUE}, CAS_PTR_2{_VALUE},
LOAD_64, STORE_64 and other primitives built on universal
CAS primitive.
Patch submitted by Olivier Houchard <cognet@FreeBSD>.
The array is optimized for SPMC and fast iteration (though MPMC
transformation is also possible). This is an extremely simple
implementation with support for atomic in-place modification
through put -> remove elimination.
Besides implementing Thumb 2 supports, this fixes incorrect usage of
"cmp" where "cmpeq" was meant.
Patch submitted by Olivier Houchard <cognet@freebsd>.
This operation moves ownership from one hash set object
to another and re-assigns callback functions to developer-specified
values. This allows for dynamic configuration of allocation
callbacks and is necessary for use-cases involving executable code
which may be unmapped underneath the hash set.
The developer is responsible for enforcing barriers and enforcing
the visibility of the new hash set.
When spinning on global counters, it cannot be assumed that is_locked
functions will guarantee atomic to load ordering, an explicit fence
is necessary. is_locked will only guarantee load ordering.
These come in the form of CK_ELIDE_ADAPTIVE_PROTOTYPE,
CK_ELIDE_LOCK_ADAPTIVE and CK_ELIDE_UNLOCK_ADAPTIVE.
Primarily pushing this for the few that are playing with
master.
This is inspired by Andi Kleen's work for adaptive behavior
in the Linux kernel's RTM locks implementation. There are
various differences in the state machine, however. Specifically,
the concept of a retry and a busy-wait has been unified due
to state machine simplification such that any exhausted busy-wait
cycle reverts to a forfeit (a busy-wait is a specialized retry).
Follow-up work will involve allowing for is_locked behavior
to yield what users expect, if called from with-in a transaction
through the wrapper. It is warned that this will come at a performance
penalty.
This is an example limitation of fence_X_Y variant. I am
considering extending this to include an acquire extension.
Use a memory fence to force total order in a manner that
will be clearer to other developers who read this.
This did not manifest as a problem on any target architectures
due to their handling of atomic operations (SPARC models it as
both a load and a store, while Power atomic_load ordering was
enforced through a full barrier).
It is possible this will be moved to a self-contained file.
For a majority of architectures, RTM is an unnecessary
implementation-specific optimization.