These changes primarily affect platforms with RMO semantics.
- Write out to local epoch requires atomic semantics,
adopt atomic store to epoch in ck_epoch_reclaim.
- Correct an issue where we would publish node availability
before record re-initialization.
- Execute full barrier in write_begin to serialize active flag
IFF we are not recursing into an epoch block. Serialize active
publication with respect to epoch walk.
- Use CAS-based tick operation for epoch counter. Despite higher
constant factor, we are much less likely to starve writers on
bursty workloads.
As ck_pr semantics were still not molded, I was designing
under the assumption I would potentially go towards
acq/req interface. Since RMO will be the semantic norm for
the ck_pr model from now on, enforce stricter ordering
requirements on rwlock.
ck_rwlock_write_unlock function will now also serialize both
loads and stores.
I was actually unsure of the exact memory model
I wanted for atomic RMW operations. It was
made apparent with time that I had to adopt RMO
if I didn't want to sacrifice performance. Make
sure we can assume RMO for the stack.
Though a new implementation is in the works, roll in
some performance improvements in the mean time.
The probe routines have been broken out into separate
reader/writer variants. These variants are much less
branch-intensive (and don't involve predict to stall
in many cases).
New implementation is attempting to deal with
interface-induced overheads.
I accidentally swapped head/tail load in ck_hp_fifo (not in
ck_fifo, however). We must acquire head snapshot before tail snapshot.
An example execution history which could cause an incorrect update to occur
is below.
- tail <- fifo.tail / fifo.head != fifo.tail
- dequeue to empty (until final CAS which renders fifo.head = fifo.tail)
- head <- fifo.head / (head != tail)
- next <- fifo.head->next / next = NULL
- As head != tail, update to next pointer (where next is NULL).
However, if
- head <- fifo.head / (fifo.head != fifo.tail)
- dequeue to empty (until final CAS which renders fifo.head = fifo.tail)
- tail <- fifo.tail / fifo.head != fifo.tail
- next <- fifo.head->next / next = NULL
If we caught tail in final transition, the by the time we read next pointer,
head would have also changed forcing us to re-read. Thanks to Hendrik Donner
for reporting this.