ck_ec implements 32- and (on 64 bit platforms) 64- bit event
counts. Event counts let us easily integrate OS-level blocking (e.g.,
futexes) in lock-free protocols. Waking up waiters only locks in the
OS kernel, and does not happen at all when no waiter is blocked.
Waiters only block conditionally, if the event count's value is
still equal to some prior value.
ck_ec supports multiple producers (wakers) and consumers (waiters),
and, on x86-TSO, has a more efficient specialisation for single
producer mode. In the latter mode, the overhead compared to a version
counter is on the order of 2-3 cycles and 1-2 instructions, in the
fast path. The slow path, when there are threads blocked on the event
count, consists of one additional atomic instruction and a futex
syscall.
Similarly, the fast path for consumers, when an update comes quickly,
has no overhead compared to spinning on a read-only counter. After
a few thousand cycles, consumers (waiters) enter the slow path with
one atomic instruction and a few blocking syscalls.
The single-producer specialisation requires the x86-TSO memory model,
x86's non-atomic read-modify-write instructions, and, ideally a
futex-like OS abstraction. On !x86/x86_64 platforms, single producer
increments fall back to the multiple producer code path.
Fixes https://github.com/concurrencykit/ck/issues/79
This lock is copy-safe when the latch operations are used.
Simplified write side operations lead to lower latencies than ck_rwlock
for single writer workloads.
Could not find suitable use-case and generally doesn't
appear interesting to academics in the existing
form. Maybe it will make a come-back in the future with
fewer memory and latency compromises.
The array is optimized for SPMC and fast iteration (though MPMC
transformation is also possible). This is an extremely simple
implementation with support for atomic in-place modification
through put -> remove elimination.
John Wittrock has contributed a phase-fair reader-writer
lock implementation. These locks allow phase fairness
guarantees between readers and writers. This work includes
additional changes and clean-up.
Follow-up work is expected.
Thanks to John Wittrock for patches and Professor Gabriel
Parmer (http://www.seas.gwu.edu/~gparmer/) for advising.
This is a hash table that is optimized for architectures that
implement total store ordering and workloads that are read-heavy
involving a single writer and multiple readers. Unlike traditional
non-blocking multi-producer/multi-consumer hash table
implementations this version allows for immediate re-use of deleted
buckets (no need for explicit reclamation cycles) and is more
conducive to traditional safe memory reclamation schemes used in
unmanaged languages (otherwise, we would require key duplication).
It is relatively heavy-weight for MPMC workloads on architectures
which do not implement TSO in comparison to Click's MPMC hash
table. However, it still has better performance characteristics
than a blocking hash table.
The committed version currently only provides x86_64 support. This is
being committed for review by peers and for a silent release that will
allow us to test ck_ht_spmc under high production workloads.
Next public release will include additional documentation as well as
support for other architectures.
In the mean time, please see the unit tests for example usage. Included in
this commit: Dropped -Wbad-function-cast from GCC port.
Writer-side synchronization is still necessary. My current use-cases call for
SLIST and LIST implementations, and as such, I've only implemented support
for these. TAILQ facilities will be developed when the time comes that I require
them or if there is sufficient user-demand.
build:
- configure step will generate relevant CFLAGS.
- build profiles are for convenience (developers can use themu
for cross-compilation).
regressions:
- Renamed ck_barrier unit tests to work-around behavior
of Solaris linker.
- Adopted use of a PTHREAD_CFLAGS variable.
ck_cc:
- Added internal CK_CC_IMM macro for compilers that are
verbose against impossible inline constraints (or limited
optimizers).
ck_pr/x86*:
- Adopted CK_CC_IMM macro.
- Dropped redundant constraints.
This work was mostly completed by Theo Schlossnagle
<jesus@omniti.com>, much thanks to him. He has
also provided access to a machine with Sun Studio 12.
Moved rdtsc and affinity logic to a single file which other
regression tests use. Single point of reference will ease
porting these to future architectures and platforms. Removed
invalid Copyright statement.
Added CK_CC_USED to force some code generation that I found
useful for debugging.
Added ck_stack latency tests and a modified version of djoseph's
modifications to benchmark.h for spinlock latency tests.