More specifically, note that in memory models where atomic
operations do not have serializing effects that atomic
read-modify-write operations are modeled as store operations.
These operations serialize atomic-RMW operations with respect
to each other, loads and stores. In addition to this, the
load_depends implementations have been removed.
ck_pr_fence_{load_load,store_store,load_store,store_load} operations
have been added. In addition to this, it is no longer the responsibility
of architecture ports to determine when to emit a specific fence. Instead,
the underlying port will always emit the necessary instructions to
enforce strict ordering. The higher-level include/ck_pr implementation will
enforce whether or not a fence is necessary to be emitted according to
the memory model specified by ck_md (CK_MD_{TSO,RMO,PSO}).
In other words, only ck_pr_fence_strict_* is implemented by the MD-ck_pr
port.
This function allows for explicit execution of all
deferred callbacks in an epoch_record. The primary
motivation is currently for performance profiling
but there are other use-cases where best-effort
semantics could be applied.
Several counter-examples were found which break in the
presence of store-to-load re-ordering. Strict fence
semantics are necessary.
Thanks to Paul McKenney for helpful discussions.
These variants of ck_ring_enqueue_* return the snapshot of queue
length with respect to the linearization point. This can be used to
extract ring size without incurring additional cacheline invalidation
overhead from the writer.
I still need to implement benchmark tests and write documentation. The reader-writer cohort locks also required that I add a method to the existing ck_cohort framework to determine whether or not a cohort lock is currently in a locked state.
John Wittrock has contributed a phase-fair reader-writer
lock implementation. These locks allow phase fairness
guarantees between readers and writers. This work includes
additional changes and clean-up.
Follow-up work is expected.
Thanks to John Wittrock for patches and Professor Gabriel
Parmer (http://www.seas.gwu.edu/~gparmer/) for advising.
Upon popular request, added a variant of the ticket spinlock
with trylock support. This is pending additional verification
on other architectures besides x86*. It is still unclear whether
this implementation will be the default as it is has slower
fast path.
Add trylock support to the ck_spinlock validation tests.
It currently only tests ck_spinlock_ticket_t trylock
functionality if available.
CK_LIST_INSERT_HEAD was incorrectly managing prev
pointer on insertion to non-empty list. This bug
would cause erroneous behavior on CK_LIST_REMOVE
to non-head elements. Unit test will be updated
for this regression.
An off-by-one was introduced in downgrade path from writer.
This can cause deadlock if a writer downgrades from a write lock.
Pointed out by Jeffrey Birnbaum <jmb...@...>.
Both LLVM-backed compilers and GCC incorrectly treat
a barrier-sandwiched load as a loop invariant in dequeue_spmc.
Forcing volatile atomic load semantics generates the right
thing.
Thanks to Devon O'Dell and Abel Mathew for help in catching
this issue.
The distinction between additive/exponential implementation
and geometric implementation does little but confuse users.
The terminology used in ck_backoff now reflects terminology
used in literature.
ck_backoff_gb has been removed.
This operation is of format:
CK_S*LIST_MOVE(a, b, linkage) and is equivalent to intializing
a with the contents of b. This is done in a manner that is atomic
with respect to readers. Read-only operations are still valid in
b, but behavior is undefined for write-side operations on b after
a MOVE operation.