This operation is short-hand notation for rebuilding
a hash table. This rebuild can occur in the presence
of concurrent readers and will require twice the amount
of memory of the existing hash table until completion.
This borrows from a technique described by Purcell and Harris
in "Non-blocking hashtables with open addressing" technical report.
Essentially, every slot will have an associated local probe maxim,
including tombstones. Highly aggressive workloads may still require
occassional garbage collection.
This abstracts away pointer packing tricks to a macro,
fixes comparison of pointers to occur in absence of
embedded values and improves robustness of pointer marshaling
for write-side operations for pointers that may already have
had pointer packing tricks played on them.
This function allows for faster insertions into tombstone-heavy
probe sequences by short-circuiting on tombstones rather than
continuing to probe. The user must already guarantee that the
entry being inserted is unique. If a non-unique key is inserted
with this operation, undefined behavior will result.
This assumes adjacent sector prefetch and appears to have minimal
clustering effect. In addition to this, developers are free to define
their own linear probe length by defining CK_HS_PROBE_L1_SHIFT.
Could not find suitable use-case and generally doesn't
appear interesting to academics in the existing
form. Maybe it will make a come-back in the future with
fewer memory and latency compromises.
The array is optimized for SPMC and fast iteration (though MPMC
transformation is also possible). This is an extremely simple
implementation with support for atomic in-place modification
through put -> remove elimination.
This operation moves ownership from one hash set object
to another and re-assigns callback functions to developer-specified
values. This allows for dynamic configuration of allocation
callbacks and is necessary for use-cases involving executable code
which may be unmapped underneath the hash set.
The developer is responsible for enforcing barriers and enforcing
the visibility of the new hash set.
It is possible for a defragmenting set or swap operation
to set a tombstone. If the probe sequence does not encounter
an empty slot and hits maximum write-side probe limit first
for it to fail to reprobe defragmenting store.
This function allows for explicit execution of all
deferred callbacks in an epoch_record. The primary
motivation is currently for performance profiling
but there are other use-cases where best-effort
semantics could be applied.
This is similar to the set of changes done to ck_ht.
In addition to this, a bug was caught were minimal
probe sequence was based on last rather than first
tombstone observed.
This hashset will now face long-running bursty
write -> delete -> write workloads much more effectively
and includes significant performance improvements for
delete heavy workloads (at least 2x measured).
Previously, a probe to an empty slot was necessary in order
to avoid hash table growth. As long as a tombstone is available,
re-use it. This prevents excessive growth on workloads involving
long bursts of writes (near 0.5 load factor) followed by long
bursts of deletes.
The distinction between additive/exponential implementation
and geometric implementation does little but confuse users.
The terminology used in ck_backoff now reflects terminology
used in literature.
ck_backoff_gb has been removed.
This allows us to only wait two rather than three epoch-protected
sections, which is a considerable performance improvement. Thanks
much to Richard Bornat for clarifications and feedback.
I had the pleasure of spending a significant amount of time at the most
recent LPC with Mathieu Desnoyers and Paul McKenney. In discussing
RCU semantics in relation to epoch reclamation, it was argued that
epoch reclamation is a specialisation of RCU (rather than a generalization).
In light of this discussion, I thought it would make more sense to not expose
write-side synchronization semantics aside from ck_epoch_call (similar to
RCU call), ck_epoch_poll (identical to tick), ck_epoch_barrier and
ck_epoch_synchronization (similar to ck_epoch_synchronization). Writers will
now longer have to use write-side epoch sections but can instead rely on
epoch_barrier/synchronization for blocking semantics and ck_epoch_poll
for old tick semantics.
One advantage of this is we can avoid write-side recursion for certain workloads.
Additionally, for infrequent writes, epoch_barrier and epoch_synchronization both
allow for blocking semantics to be used so you don't have to pay the cost of
epoch_entry for non-blocking dispatch.
Example usage:
e = stack_pop(mystack);
ck_epoch_synchronize(...);
free(e);
read_begin and read_end has been replaced with ck_epoch_begin and ck_epoch_end.
If multiple writers need SMR guarantees, then they can also use ck_epoch_begin
and ck_epoch_end. Any dispatch in presence of multiple writers should be done
with-in an epoch section (for now).
There are some follow-up commits to come.
These changes primarily affect platforms with RMO semantics.
- Write out to local epoch requires atomic semantics,
adopt atomic store to epoch in ck_epoch_reclaim.
- Correct an issue where we would publish node availability
before record re-initialization.
- Execute full barrier in write_begin to serialize active flag
IFF we are not recursing into an epoch block. Serialize active
publication with respect to epoch walk.
- Use CAS-based tick operation for epoch counter. Despite higher
constant factor, we are much less likely to starve writers on
bursty workloads.
Though a new implementation is in the works, roll in
some performance improvements in the mean time.
The probe routines have been broken out into separate
reader/writer variants. These variants are much less
branch-intensive (and don't involve predict to stall
in many cases).
New implementation is attempting to deal with
interface-induced overheads.
Documentation and regressions tests have been updated to reflect this.
This functionality allows for individual hash tables use to different
allocation functions. Thanks to Wez Furlong for pointing out the necessary
documentation update for ck_ht.
Migrate available block list to CK_LIST.
New blocks are only allocated when the available list is exhausted.
Remove bag->avail_tail.
Print out number of writer iterations for unit test.
Lengthen duration of unit test.
Add necessary load fence to iterator.
Initialize iterator appropriately for empty bags.
Improve unit test.
Fix bag linkage bug for non x86_64 targets.
Fix block accounting on removal.
Specifically, any platform that has CK support for 64-bit
load/store operations.
Additional improvements have been made to the unit tests
to disambiguate put/get failures.
This is a hash table that is optimized for architectures that
implement total store ordering and workloads that are read-heavy
involving a single writer and multiple readers. Unlike traditional
non-blocking multi-producer/multi-consumer hash table
implementations this version allows for immediate re-use of deleted
buckets (no need for explicit reclamation cycles) and is more
conducive to traditional safe memory reclamation schemes used in
unmanaged languages (otherwise, we would require key duplication).
It is relatively heavy-weight for MPMC workloads on architectures
which do not implement TSO in comparison to Click's MPMC hash
table. However, it still has better performance characteristics
than a blocking hash table.
The committed version currently only provides x86_64 support. This is
being committed for review by peers and for a silent release that will
allow us to test ck_ht_spmc under high production workloads.
Next public release will include additional documentation as well as
support for other architectures.
In the mean time, please see the unit tests for example usage. Included in
this commit: Dropped -Wbad-function-cast from GCC port.
build:
- configure step will generate relevant CFLAGS.
- build profiles are for convenience (developers can use themu
for cross-compilation).
regressions:
- Renamed ck_barrier unit tests to work-around behavior
of Solaris linker.
- Adopted use of a PTHREAD_CFLAGS variable.
ck_cc:
- Added internal CK_CC_IMM macro for compilers that are
verbose against impossible inline constraints (or limited
optimizers).
ck_pr/x86*:
- Adopted CK_CC_IMM macro.
- Dropped redundant constraints.
This work was mostly completed by Theo Schlossnagle
<jesus@omniti.com>, much thanks to him. He has
also provided access to a machine with Sun Studio 12.