# ThreadWeaver - Lock-Free Multi-Thread Communication Templates A collection of lock-free intra-thread message-sending fabrics including SPSC, MPSC and SPMC templates, targeting x64 and ARMv8+ platforms. The implementation avoids CAS retry loops and ensures a linearly bounded number of operations for fairness, leading to bounded latency for any request to be processed. Fairness here means that no producer or consumer can be perpetually starved under continuous contention. Some variations of the same actions are provided for fine-tuning performance. ## Table of Contents - [ThreadWeaver – Lock-Free Multi-Thread Communication Templates](#threadweaver---lock-free-multi-thread-communication-templates) - [Requirements](#requirements) - [Quick Start](#quick-start) - [Explicitness](#explicitness) - [Important Messages](#important-messages) - [External Synchronization](#external-synchronization) - [Variations](#variations) - [Common Verbs](#common-verbs) - [`[sync] init`](#sync-init) - [`[sync] flush`](#sync-flush) - [`send`](#send) - [`recv`](#recv) - [Design and Inspiration](#design-and-inspiration) ## Requirements The Weaver library requires a compiler supporting C++20 and above standards. ## Quick Start Weaver is contained within one header in [weaver.h](include/weaver.h) to accommodate C++ specific template instantiation requirements. The user must provide the cache line size `--param=destructive-interference-size` or configure the static `THWeaver::CLS` parameter in order to compile with the library. This is to ensure correct cache line isolation and avoid false sharing across platforms. **We will use the term "fabric" to refer to instantiated communication objects from the Weaver library.** The included classes are: 1. SPSC: `THWeaver::EndpointQueue` 2. MPSC: `THWeaver::FanInFabric` 3. SPMC: `THWeaver::FanOutFabric` See [docs](docs/) for more detailed documentation on the classes and result enums. ### Explicitness The Weaver library expects that the user explicitly state the behavior for the fabric and does not rely on automatic constructors and destructors for state initialization and resource management. ### Important Messages Throughout the documentation (including this README), all important messages will have a leading "IMPORTANT". Violations of important messages may result in **undefined behavior**. ### External Synchronization Some non-critical Weaver methods may expect external synchronization, oriented towards control-plane usage rather than the actual data-plane pipelines. Use of these methods without synchronization may result in **undefined behaviors**. These operations are marked by `[sync]`. ### Variations All class methods are named by `verb[_variation]`, and different variations of the same action may have different costs. This document will only contain an overview on the verb (action), please see [docs](docs/) for detailed views on variations ### Common Verbs The verbs listed below are universal to all Weaver classes. #### `[sync] init` Explicitly initialize the fabric. **IMPORTANT: This method is expected to only be called once during the fabric's lifetime.** #### `[sync] flush` Reset the fabric or part of the fabric. #### `send` Moves a message into the fabric. The return type will explicitly inform if and how it failed. #### `recv` Moves a message out of the fabric or initializes a token to the message. The return type will explicitly inform if and how it failed. ## Design and Inspiration Weaver is designed for the current generation and architecture of hardware and does not provide upwards compatibility to future hardware. It assumes multi-level caching and high-cost of memory operations, and aims to reduce memory and cache coherence traffic by respecting the hardware cache architecture. And to ensure fairness for multiple producer or consumer threads, Weaver includes simple scheduling that degrades into round-robin schemes in the worst case. It is inspired by networking concepts and hardware clock domain crossing implementations. And for each operation, it tries to keep modifications monotonic and data-flow unidirectional. Shared states that must be lossless are structured so that operations are causally ordered and monotonic, allowing simpler synchronization and reduced coherence traffic.