Skip to content

Preface

Idan Sheinberg edited this page Oct 6, 2020 · 1 revision

Origin

Traditional thread-level CPU affinity solutions are utilized in scenarios where on one hand, compute resource are constrained, while low-latency, guaranteed execution times are required. HFT (High-Frequency-Trading) platforms are often subject to such execution conditions, which is why we usually tend to see software solutions addressing CPU affinity concerns in the context of such systems.

While financial trading is one of the most veteran fields to have been revolutionized by advancements made in IT, the fundamental approach to optimizing critical applications performance in such environments remained mostly the same:

  • Keep your application small, simple & lean.
  • Choose the most performant tools for the job (i.e. write your code in C/C++, for the very least).
  • Deal as little as possible with threading/concurrency concerns.
  • Ensure your critical path remains as short as possible.

Getting With the Times

Of course, and especially in the in the field of algorithmic trading, there are far more drastic approaches (programming entire applications on top of FPGA boards is such an example). But, we also need to acknowledge the fact the IT industry, both in terms of hardware and software, has undergone drastic changes in the past decade or so:

  • Dozens of new programming languages have sprung up, often exchanging simplicity and ease with optimal performance to a varying degree
  • Engineering man-power puts a heavier emphasis on proficiency in upper-level languages.
  • Compute resources, even in constrained execution environments (as opposed to various "clouds") are much more abundant.
  • Having 16, 24 or even 64 real cores in a single socket is a much more common sight these days, as opposed to an era were the industry was dominated by single core processors.
  • We don't really deal with single applications anymore. Instead, we build software platforms, most often comprised of a multitude of services running on top of different execution environments and operating systems, spanning across multiple data-centers.
  • Following proper software design paradigms (as well as inventing new ones from time to time) has become a true concern in the current day and age.
  • Some of the newer software design paradigms doesn't really go along with single-threaded, simple software design concepts.
  • Data is king. It's not enough to have our applications, compute constrained as they, just operate (or write logs at best). We need clear a visibility of what's going on in real-time.

The Critical Path

When trying to address core-affinity problem for modern applications, a few concerns come to mind:

  • A performance critical application has at least one critical path - an execution path that the app aims to finish as fast as possible, and would try to assign as much dedicate computation resources as available in order for it happen. A common critical path example is what's called in HFT terminology 'tick-to-trade': The processing path, starting at the moment an exchange 'tick' is received by the application and ending when the same application issue a buy/sell order.
  • Any application will most likely have one or more non-critical paths - execution paths for non critical processing that don't affect the applications internal state directly, and which we can allow to be processed at a slower rate/later time. A common example for a non-critical path would be tracing/state data persistence to a remote, external location. This would most likely involve some sort of serialization/encoding and might be compute intensive

As various business concerns grow in complexity within an application, we're likely to see several critical and non-critical paths within a single application. So the key principle it's important to adhere to: Disallow non-critical path processing from interrupting that of a critical one

Beyond Your Reach

Another key factor in meticulously designing compute resource allocation, is acknowledging the fact that various 3rd partly libraries in your applications, and even some of the JVM's internal facilities, may spawn and manages their own threads (as well as thread pools), sometimes exposing no means of affecting their inception, let alone their affinity and resource allocation.

Put in other words, you can assign CPU cores to threads spawned and managed explicitly by your code, but if you still have 3rd party library spawning threads that run wild, your efforts are in vein.

One Needle to Rule Them All

Needle, created mid 2020, tries to take a fresh(er) approach at solving thread-level, core affinity resource allocation problems. The rest of this Wiki will cover the library internals in greater details, demonstrating how various features of the library can help address various concerns raised earlier, such as controlling 3rd party library threads or dividing resources on a critical/non-critical path level.