As ck_pr semantics were still not molded, I was designing
under the assumption I would potentially go towards
acq/req interface. Since RMO will be the semantic norm for
the ck_pr model from now on, enforce stricter ordering
requirements on rwlock.
ck_rwlock_write_unlock function will now also serialize both
loads and stores.
I was actually unsure of the exact memory model
I wanted for atomic RMW operations. It was
made apparent with time that I had to adopt RMO
if I didn't want to sacrifice performance. Make
sure we can assume RMO for the stack.
Though a new implementation is in the works, roll in
some performance improvements in the mean time.
The probe routines have been broken out into separate
reader/writer variants. These variants are much less
branch-intensive (and don't involve predict to stall
in many cases).
New implementation is attempting to deal with
interface-induced overheads.
I accidentally swapped head/tail load in ck_hp_fifo (not in
ck_fifo, however). We must acquire head snapshot before tail snapshot.
An example execution history which could cause an incorrect update to occur
is below.
- tail <- fifo.tail / fifo.head != fifo.tail
- dequeue to empty (until final CAS which renders fifo.head = fifo.tail)
- head <- fifo.head / (head != tail)
- next <- fifo.head->next / next = NULL
- As head != tail, update to next pointer (where next is NULL).
However, if
- head <- fifo.head / (fifo.head != fifo.tail)
- dequeue to empty (until final CAS which renders fifo.head = fifo.tail)
- tail <- fifo.tail / fifo.head != fifo.tail
- next <- fifo.head->next / next = NULL
If we caught tail in final transition, the by the time we read next pointer,
head would have also changed forcing us to re-read. Thanks to Hendrik Donner
for reporting this.
Documentation and regressions tests have been updated to reflect this.
This functionality allows for individual hash tables use to different
allocation functions. Thanks to Wez Furlong for pointing out the necessary
documentation update for ck_ht.