This allows us to only wait two rather than three epoch-protected
sections, which is a considerable performance improvement. Thanks
much to Richard Bornat for clarifications and feedback.
I had the pleasure of spending a significant amount of time at the most
recent LPC with Mathieu Desnoyers and Paul McKenney. In discussing
RCU semantics in relation to epoch reclamation, it was argued that
epoch reclamation is a specialisation of RCU (rather than a generalization).
In light of this discussion, I thought it would make more sense to not expose
write-side synchronization semantics aside from ck_epoch_call (similar to
RCU call), ck_epoch_poll (identical to tick), ck_epoch_barrier and
ck_epoch_synchronization (similar to ck_epoch_synchronization). Writers will
now longer have to use write-side epoch sections but can instead rely on
epoch_barrier/synchronization for blocking semantics and ck_epoch_poll
for old tick semantics.
One advantage of this is we can avoid write-side recursion for certain workloads.
Additionally, for infrequent writes, epoch_barrier and epoch_synchronization both
allow for blocking semantics to be used so you don't have to pay the cost of
epoch_entry for non-blocking dispatch.
Example usage:
e = stack_pop(mystack);
ck_epoch_synchronize(...);
free(e);
read_begin and read_end has been replaced with ck_epoch_begin and ck_epoch_end.
If multiple writers need SMR guarantees, then they can also use ck_epoch_begin
and ck_epoch_end. Any dispatch in presence of multiple writers should be done
with-in an epoch section (for now).
There are some follow-up commits to come.
Some people might be confused as far as lack of
fencing in the lock. Add a comment to clarify that
old values should not be equal to new values
of current position (where acquiring the current position
already has a global ordering).
These changes primarily affect platforms with RMO semantics.
- Write out to local epoch requires atomic semantics,
adopt atomic store to epoch in ck_epoch_reclaim.
- Correct an issue where we would publish node availability
before record re-initialization.
- Execute full barrier in write_begin to serialize active flag
IFF we are not recursing into an epoch block. Serialize active
publication with respect to epoch walk.
- Use CAS-based tick operation for epoch counter. Despite higher
constant factor, we are much less likely to starve writers on
bursty workloads.
As ck_pr semantics were still not molded, I was designing
under the assumption I would potentially go towards
acq/req interface. Since RMO will be the semantic norm for
the ck_pr model from now on, enforce stricter ordering
requirements on rwlock.
ck_rwlock_write_unlock function will now also serialize both
loads and stores.