The data structure that makes real-time charts fast, and the allocation trap hiding inside it | aweai

Back to newsNews

The data structure that makes real-time charts fast, and the allocation trap hiding inside it

Web Dev

The data structure that makes real-time charts fast, and the allocation trap hiding inside it

By Jaya Sai Kishan Chapparam

devto-webdev1h ago · Jun 30, 202612 min read

Image courtesy of devto-webdev

If you stream telemetry into a React app the obvious way, you wire each incoming sample to a piece of state and let the component re-render. That works until the data gets fast, and then it falls over, and I wrote a whole separate post about why the rendering falls over. This post is about the layer underneath that: where the samples actually live between arriving and being drawn. It is a smaller question than rendering and it has a cleaner answer, and it also has a trap that I walked straight i

If you stream telemetry into a React app the obvious way, you wire each incoming sample to a piece of state and let the component re-render. That works until the data gets fast, and then it falls over, and I wrote a whole separate post about why the rendering falls over. This post is about the layer underneath that: where the samples actually live between arriving and being drawn. It is a smaller question than rendering and it has a cleaner answer, and it also has a trap that I walked straight into and measured my way back out of, which is the part worth your time.

The short version is that the right container is a fixed-size circular buffer, the reason is allocation rather than speed-of-access, and the obvious way to read from it gives back exactly the allocation win it bought you. So the structure is half the story and how you read from it is the other half.

Why not just an array

A telemetry channel is an unbounded stream. Samples arrive at 30, 50, sometimes hundreds of hertz, forever, and a chart shows the last few seconds of them. The naive container is a plain array you push onto and slice the tail off of:

samples.push(value);
if (samples.length > maxPoints) samples.shift();

Enter fullscreen mode Exit fullscreen mode

This is wrong in two ways that both come down to memory, not speed. shift() removes from the front, which is O(n) because every remaining element slides down one slot, and you pay that on every sample. Worse, an array that you grow and trim like this churns the allocator constantly. The engine resizes the backing store, the old one becomes garbage, the collector runs, and at telemetry rates that garbage collection happens often enough to land inside your frame budget. You do not feel it on a fast machine with one chart. You feel it on a slower machine, or with twenty channels, as periodic stutter that has no obvious cause because nothing in your code is obviously slow.

The fix is to stop allocating entirely. Decide up front how many samples you keep, allocate that once, and overwrite the oldest sample in place when you are full. That is a ring buffer, and for numeric telemetry there is a second win sitting right next to it.

A typed array, not an array of numbers

A normal JavaScript array of numbers is an array of boxed values the engine has to chase through pointers. A Float64Array is a flat block of contiguous doubles, no boxing, no pointer chasing, and it is exactly the shape numeric samples want. So the buffer is two parallel Float64Arrays, one for values and one for timestamps, each allocated once at the capacity you choose:

export class RingBuffer {
  private buf: Float64Array;
  private times: Float64Array;
  private head = 0;
  private count = 0;
  readonly capacity: number;
  constructor(capacity: number) {
    if (!Number.isInteger(capacity) || capacity <= 0) {
      throw new RangeError(`RingBuffer capacity must be a positive integer, got ${capacity}`);
    }
    this.capacity = capacity;
    this.buf = new Float64Array(capacity);
    this.times = new Float64Array(capacity);
  }

Enter fullscreen mode Exit fullscreen mode

head is the write index, the slot the next sample lands in. count is how many slots are live, and it climbs until it hits capacity and then stops. There is deliberately no tail pointer, because you can derive where the oldest sample sits: if the buffer is not full yet the oldest is at index 0, and if it is full the oldest is wherever head currently points, since that is the slot about to be overwritten.

The write path does nothing interesting, on purpose

push(value: number, timestamp: number): void {
  this.buf[this.head] = value;
  this.times[this.head] = timestamp;
  this.head = (this.head + 1) % this.capacity;
  if (this.count < this.capacity) this.count++;
}

Enter fullscreen mode Exit fullscreen mode

That is the entire write path, and the most important thing about it is what it does not do. It does not allocate. It does not grow. It does not box anything. Two indexed stores into pre-allocated Float64Arrays, one modulo to advance the write index, one conditional increment that stops mattering once the buffer fills. The numbers go straight into slots as unboxed doubles. Once the buffer is full, head keeps wrapping and each write overwrites the oldest sample in place, which is why there is no branch for full-versus-not-full on this path. The buffer reaches a steady state and stays there, allocating nothing, for the entire life of the stream. This is the property the whole structure exists to have.

This is also the right place to keep React out of it. None of this touches component state. The buffer is a plain object that a requestAnimationFrame loop reads on its own schedule, so a sample arriving at 200Hz is just a slot write, not a re-render. That separation, fast data into a mutable buffer, rendering on its own clock, is the thing that lets the high-frequency path stay cheap, and the buffer is the mutable thing in the middle.

Reading it back in order

The wrinkle in any ring buffer is that the data is stored out of order once it wraps. The newest sample might be in the middle of the array with the oldest just after it, so reading "oldest to newest" means walking from head to the end and then from the start back to head. Here is the read that handles both cases:

private orderedInto(source: Float64Array, out: Float64Array): number {
  if (out.length < this.count) {
    throw new RangeError(
      `readInto target too small: need ${this.count}, got ${out.length}`,
    );
  }
  if (this.count === 0) return 0;
  if (this.count < this.capacity) {
    // Not full yet: oldest is at 0, newest at head-1, no wrap.
    out.set(source.subarray(0, this.count));
    return this.count;
  }
  // Full: oldest sits at head, wrapping forward to head-1.
  const tail = source.subarray(this.head, this.capacity);
  const wrap = source.subarray(0, this.head);
  out.set(tail, 0);
  out.set(wrap, tail.length);
  return this.count;
}

Enter fullscreen mode Exit fullscreen mode

Before it wraps, the data is contiguous and it is one copy. Once full, it is two copies, the run from head to the end followed by the run from the start up to head, stitched into the output in order. subarray is a view, not a copy, so the only actual writes are the two out.set calls into the caller's buffer, and the function returns how many elements it wrote so the caller knows the valid prefix.

Notice that this read fills a buffer you hand it. That is the whole point, and it is the part I got wrong the first time.

The trap

The convenient way to expose that read is to return a fresh array, and that is what the buffer did first:

getValues(): Float64Array {
  const out = new Float64Array(this.count);   // allocates, every call
  this.orderedInto(this.buf, out);
  return out;
}

Enter fullscreen mode Exit fullscreen mode

This is lovely to call and it quietly reintroduces the exact problem the ring buffer was built to remove. Every call allocates a fresh Float64Array. And the draw loop calls it a lot. My chart components read the buffer twice per frame, once to compute the y-axis extent and once to actually draw, across both values and timestamps, which is four allocations per channel per frame. At a default buffer of 10,000 samples, sixty times a second, per channel.

So I had built a zero-allocation buffer and then drained it through an allocating read on the hottest path in the library. The write side was clean and the read side handed all the garbage right back.

The fix is to let the caller own the output buffer and reuse it across frames:

/** Zero-copy ordered read into a caller-owned buffer. Returns the count. */
readInto(out: Float64Array): number {
  return this.orderedInto(this.buf, out);
}
readTimesInto(out: Float64Array): number {
  return this.orderedInto(this.times, out);
}

Enter fullscreen mode Exit fullscreen mode

Same ordered read, no allocation. The component allocates one scratch Float64Array per channel once, at setup, and refills it every frame instead of minting a new one. getValues and getTimes still exist for occasional callers who genuinely want a fresh copy, but the render path uses readInto.

What it actually cost, measured

This is the part I care about getting right, because "allocation is bad" is the kind of claim everyone nods at and nobody checks. So I benchmarked the allocating read against the reused-buffer read, same scene, same math, only the read differs. Every number below is from one machine, and the harness is committed so you can run it on yours.

The first thing the measurement taught me is that on a fast machine, at low channel counts, the allocation costs you nothing visible. Here is a sustained 60-second run at normal CPU speed, looking at per-frame main-thread work in milliseconds:

channels	read	work p50	work p99	work max	frames over 5ms
6	allocate	1.2	1.9	3.8	0
6	reuse	0.6	1.3	1.5	0
24	allocate	3.7	10.1	15.0	263
24	reuse	2.2	4.0	4.6	0

At 6 channels the allocating version never even spikes. The frame rate stayed pinned at 60 the whole time, because each garbage collection pause was short enough to fit inside the frame's idle budget. If you had stopped measuring here you would conclude the allocation does not matter, and you would ship it.

Look at the 24-channel rows. The allocating read has a p99 of 10ms and 263 frames where main-thread work crossed 5ms. Those are GC pauses, and at normal speed they hide under the vsync interval, eating your headroom without dropping a frame. The reused-buffer read has a p99 of 4ms and zero such spikes. The garbage was never free. It was being absorbed by the spare time in each frame, which means it was quietly spending the budget you would want for everything else the app does.

Now take the headroom away. Here is the same thing under 6x CPU throttle, which is a reasonable stand-in for a mid-tier laptop, counting frames that ran long enough to miss a vsync:

channels	read	dropped frames	bad stutters
6	allocate	16	1
6	reuse	0	0
24	allocate	363	46
24	reuse	0	0

This is where the hidden cost becomes visible. The 24-channel allocating view dropped 363 frames in 30 seconds, roughly half of them, on hardware that is not exotic. The reused-buffer version dropped none. The exact same GC pauses that were invisible on the fast machine become dropped frames the moment the budget shrinks, and the user on the slower machine is precisely the user you cannot test on and most need to not fail.

That is the whole argument for the ring buffer, restated honestly. It is not that the buffer makes reads fast. It is that the buffer makes the steady state allocate nothing, and that property is only worth anything if you do not undo it at the read. The speed was always in the allocation, and the trap is that the convenient read silently reintroduces the allocation you removed.

What this does not fix

One thing I want to name so the post does not oversell itself. Killing the per-frame allocation does not reduce how much the chart draws. The draw loop still walks every sample in the buffer, so a 10,000-sample buffer drawing into an 800-pixel chart is still rasterizing far more geometry than the display can resolve. That is a separate problem with a separate fix, decimating to pixel columns, and it is tracked as its own piece of work rather than smuggled into this one. The allocation fix and the overdraw fix are independent, and this post is only the first one.

The buffer ships in @altara/core as RingBuffer, with useRingBuffer(capacity) as the React hook that gives a component a stable instance for its lifetime. The benchmark, including the allocating-versus-reused read comparison, is in the repo and runs from one command, so the numbers above are yours to reproduce or disagree with on your own hardware.