1. For ck_pr_cas_foo_value, let the inline assembly save the observed
value in a register, and store to the output reference in C.
This lets the C optimiser eliminate the memory access once the
CAS function is inlined.
2. Specify the result of the CAS as a condition code in EFLAGS instead
of executing SETcc in inline assembly, when possible. GCC gained
this functionality in GCC 6; CAS loops can now directly branch on
the condition code, without SETcc / TEST.
TESTED=existing regression tests.
glibc-2.30 added a wrapper to gettid (https://lwn.net/Articles/795127/).
gettid will clash with the glibc-provided symbol. Remove the
macro and instead move to a dedicated namespace.
We go this route to avoid introducing unnecessary complexity to
build.
Fixes#147
Previously, we would simply fail if the architecture was not
a first-class citizen. However, we have always allowed a built-in
fallback in code.
Instead, allow for people to direcly use the builtin fallback without
having to provide their own profiles and emit a loud warning.
This new interface allows for slot reservation to avoid additional
copy-overhead from consumer. The primary use-case is for the type-specialized
variant of ck_ring. The initial patch-set does not migrate enqueue and
dequeue to be implemented in terms of reserve and commit but will be a future
commit.
There was a silly typo and circular dependency introduced in the migration.
Thanks to Sarah Edkins <sedkins@backtrace.io> for letting me borrow her laptop
to investigate.
These tests check for sane behavior in the presence of new
maps being created for the hash set. They require the presence
of SMR.
For the life time of the growth_spmc tests, disable deallocation.
ck_ec implements 32- and (on 64 bit platforms) 64- bit event
counts. Event counts let us easily integrate OS-level blocking (e.g.,
futexes) in lock-free protocols. Waking up waiters only locks in the
OS kernel, and does not happen at all when no waiter is blocked.
Waiters only block conditionally, if the event count's value is
still equal to some prior value.
ck_ec supports multiple producers (wakers) and consumers (waiters),
and, on x86-TSO, has a more efficient specialisation for single
producer mode. In the latter mode, the overhead compared to a version
counter is on the order of 2-3 cycles and 1-2 instructions, in the
fast path. The slow path, when there are threads blocked on the event
count, consists of one additional atomic instruction and a futex
syscall.
Similarly, the fast path for consumers, when an update comes quickly,
has no overhead compared to spinning on a read-only counter. After
a few thousand cycles, consumers (waiters) enter the slow path with
one atomic instruction and a few blocking syscalls.
The single-producer specialisation requires the x86-TSO memory model,
x86's non-atomic read-modify-write instructions, and, ideally a
futex-like OS abstraction. On !x86/x86_64 platforms, single producer
increments fall back to the multiple producer code path.
Fixes https://github.com/concurrencykit/ck/issues/79
On FreeBSD, atomic operations in the kernel must access the nucleus
address space. Userland may use either the atomic instruction set
which goes without an ASI (address space identifier) or specify the
primary address space.
To avoid hardcoding the address space here, we grab the corresponding
identifier from the appropriate machine header but also only for the
kernel so the namespace doesn't get polluted for userland.