The atomicity of the sequence number's increment is unnecessary, since
there should be only one writer at any given time. Fix it by changing
it for a regular increment + store.
Signed-off-by: Emilio G. Cota <cota@braap.org>
This only affects RMO. This adds stricter semantics for critical section
serialization. In addition to this, asymmetric synchronization primitives will
now provide load ordering with respect to readers.
This also modifies locked operations to have acquire semantics
(they're there for elision predicates, and this doesn't impact them
in any way). There are several performance improvements included in this
as well (redundant fence was removed from days of wanting to support
Alpha).
These primitives are meant to be used in lock implementations
where control dependency ordering is sufficient to enforce
ordering of critical section. At the moment, this only affects
PPC. Currently, we rely on lwsync for entry into critical sections
which is insufficient. sync is rather heavy-weight, and assuming
we aren't falling victim into compiler re-ordering, isync should
be sufficient.
There is follow-up work to be done in ARM, as we may have cheaper
(but target-specialized) ISB-tricks for load-load ordering.
On TSO architectures, this relies on atomic ordering guarantees
rather than a full barrier. On Pentium M, this results in
approximately 30% improvement in latency for stack.
The default value is still 50, but that may be revisited later.
Also, pre-calculate the max number of entries before growing, toi avoid
having to do it at each insert.
DECONST_PTR is a hack to deconstify void pointer values
that is safe in presence of -Wcast-qual. CK_CC_RESTRICT
is restrict qualifier that can be disabled for only
partially C99 compliant compilers.
We use some macro trickery to enforce that ck_pr_store_* is actually
storing the correct type into the target variable, without any actual
side effects--by making the assignment into an rvalue and using a
comma expression, the compiler should optimize it away.
On the load side, we simply cast the result to the type of the target
variable for pointer loads.
There is an unsafe version of the store_ptr macro called
ck_pr_store_ptr_unsafe for those times when you are _really_ sure that
you know what you're doing.
This commit also updates some of the source files (ck_ht, ck_hs,
ck_rhs): ck_ht now uses the unsafe macro, as its conversion between
uintptr_t and void * is invalid under the new macros. ck_hs and ck_rhs
have had some casts added to preserve validity.
Commit 554e2f08 removed underscores from _CK prefixes. It missed
the arm bits, breaking the arm build (which checks for the
non-existing _CK_PR_H). Fix it.
While at it, fix the mismatch between _CK_ISB and __CK_ISB; convert
them both to CK_ISB.
Signed-off-by: Emilio G. Cota <cota@braap.org>
This was accidentally grouped into previous commit.
The destiny of this interface for internal use is still unclear (in context of
utilization in built-in data structures). The interface is enabled by default
on x86, as it is compatible with read-side prefetch* operations and
binary-compatible with 3DNow! extension. Older compilers will waste an
additional byte (they generate 3DNow! variant) on this, but older compilers
waste more on spillage if we encode instruction. Power support is coming soon.
This avoids memory traffic in busy-wait loops. Been on TODO list for a while,
may as well bite the bullet. No regressions introduced in recent versions of
GCC, clang and ICC.
gcc is smart enough to use an even register for 64bits operations, and provide
a way to access the first and the second words, so use that instead of
hardcoding registers.
ck_swlock_read_latchlocks observers a writer if the unlatch
operation sets n_readers to 0.
The unlatch operation now just unsets the latch bit, we can safely
decrement n_readers in ck_swlock_read_latchlocks().
+ Fixes to validation tests & ELIDE coverage.
* new bulk rmw: intersection, intersection_negate (with complement),
* new bulk reads: empty, full, count, count_intersect
* ck_bitmap_iterator fixes: avoid out of bounds read on empty bitmaps,
and ck_bitmap_next is marginally faster on sparse bitmaps .
* less space: the bitmap itself is an array of unsigned int, which
eliminates alignment padding between the n_bit field and the bitmap.
* more unit tests.
This lock is copy-safe when the latch operations are used.
Simplified write side operations lead to lower latencies than ck_rwlock
for single writer workloads.