ck_swlock_read_latchlocks observers a writer if the unlatch
operation sets n_readers to 0.
The unlatch operation now just unsets the latch bit, we can safely
decrement n_readers in ck_swlock_read_latchlocks().
+ Fixes to validation tests & ELIDE coverage.
* new bulk rmw: intersection, intersection_negate (with complement),
* new bulk reads: empty, full, count, count_intersect
* ck_bitmap_iterator fixes: avoid out of bounds read on empty bitmaps,
and ck_bitmap_next is marginally faster on sparse bitmaps .
* less space: the bitmap itself is an array of unsigned int, which
eliminates alignment padding between the n_bit field and the bitmap.
* more unit tests.
This lock is copy-safe when the latch operations are used.
Simplified write side operations lead to lower latencies than ck_rwlock
for single writer workloads.
write_latch allows writers to prevent incoming read acquisitions from
succeeding while permitting write_lock operations to complete. This is
useful if multiple writers are cooperating in linearizing a non-blocking
data structure and allows for lock state to be copied by the write lock
holder safely.
In collaboration with Paul Khuong and Jaidev Sridhar.
There is an off-by-one for slot ID sizeof(bytelock->readers) + 1.
This patch fixes the handling of this slot ID. Based off a patch
submitted by Albi Kavo <albi.kavo@gma....>.
Add a read-mostly mode, in which entries won't share cache lines with
associated datas (probes, probe_bound, etc).
This makes write operations slower, but make get faster.
This will serve to drive both pointer-based and type-specialized ring
implementations. Some full fences have also been removed as they are
unnecessary with the introduction of X_Y fence interface.
Remove the ring buffer from the struct ck_ring, it is now required to
explicitely pass the ring buffer for enqueue/dequeue. That can be useful for
systems with multiple address spaces.
This operation is short-hand notation for rebuilding
a hash table. This rebuild can occur in the presence
of concurrent readers and will require twice the amount
of memory of the existing hash table until completion.
This borrows from a technique described by Purcell and Harris
in "Non-blocking hashtables with open addressing" technical report.
Essentially, every slot will have an associated local probe maxim,
including tombstones. Highly aggressive workloads may still require
occassional garbage collection.
This function allows for faster insertions into tombstone-heavy
probe sequences by short-circuiting on tombstones rather than
continuing to probe. The user must already guarantee that the
entry being inserted is unique. If a non-unique key is inserted
with this operation, undefined behavior will result.
Could not find suitable use-case and generally doesn't
appear interesting to academics in the existing
form. Maybe it will make a come-back in the future with
fewer memory and latency compromises.
This adds support for CAS_64{_VALUE}, CAS_PTR_2{_VALUE},
LOAD_64, STORE_64 and other primitives built on universal
CAS primitive.
Patch submitted by Olivier Houchard <cognet@FreeBSD>.
The array is optimized for SPMC and fast iteration (though MPMC
transformation is also possible). This is an extremely simple
implementation with support for atomic in-place modification
through put -> remove elimination.
Besides implementing Thumb 2 supports, this fixes incorrect usage of
"cmp" where "cmpeq" was meant.
Patch submitted by Olivier Houchard <cognet@freebsd>.
This operation moves ownership from one hash set object
to another and re-assigns callback functions to developer-specified
values. This allows for dynamic configuration of allocation
callbacks and is necessary for use-cases involving executable code
which may be unmapped underneath the hash set.
The developer is responsible for enforcing barriers and enforcing
the visibility of the new hash set.
When spinning on global counters, it cannot be assumed that is_locked
functions will guarantee atomic to load ordering, an explicit fence
is necessary. is_locked will only guarantee load ordering.
These come in the form of CK_ELIDE_ADAPTIVE_PROTOTYPE,
CK_ELIDE_LOCK_ADAPTIVE and CK_ELIDE_UNLOCK_ADAPTIVE.
Primarily pushing this for the few that are playing with
master.
This is inspired by Andi Kleen's work for adaptive behavior
in the Linux kernel's RTM locks implementation. There are
various differences in the state machine, however. Specifically,
the concept of a retry and a busy-wait has been unified due
to state machine simplification such that any exhausted busy-wait
cycle reverts to a forfeit (a busy-wait is a specialized retry).
Follow-up work will involve allowing for is_locked behavior
to yield what users expect, if called from with-in a transaction
through the wrapper. It is warned that this will come at a performance
penalty.
This is an example limitation of fence_X_Y variant. I am
considering extending this to include an acquire extension.
Use a memory fence to force total order in a manner that
will be clearer to other developers who read this.
This did not manifest as a problem on any target architectures
due to their handling of atomic operations (SPARC models it as
both a load and a store, while Power atomic_load ordering was
enforced through a full barrier).