For example, I think all the algorithms discussed in this article require knowing the graph topology up front. Even dynamic dependencies need to be known ahead of time, unless I'm misreading.
However, take Vue's reactivity for example. Here's a simple input and output:
const input = ref(1);
const output = computed(() => ref.value + 1);
Without actually attempting to evaluate the closure that defines `output`, there's no way of knowing what it depends on.*My working understanding of such systems, which I think are more or less similar to all the new "signals" libraries popping up in JS, is that they are... Pull-push systems? First, the UI code (i.e. the HTML or JSX template functions) request the value of an output. The reactivity system evaluates the graph (pull), recording dependencies as it goes.
Then, later, when an input changes, it can use the graph it built up to update (push) dirty states and work out which of the currently-live output nodes need to be re-evaluated.
*The only other approach would be to analyse the syntax of the source code itself to create that static dependency graph. Which I understand is what e.g. Svelte does.
With the variations on the push algorithm, you do kind of need to know the graph topology ahead of time, at least to be able to traverse it efficiently and correctly (this is the topological sorting thing). But for pull (and therefore for push-pull), the dependencies can be completely dynamic - in a programming language, you can call `eval` or something, or in a spreadsheet you could use `indirect` to generate cell references dynamically. For push-pull specifically, when you evaluate a node you would generally delete all of its upstream connections (i.e. for each cell it depends on currently, remove that dependency) and then rebuild that node's connections to the graph while evaluating it.
Signals libraries are exactly where I found this concept, and they work basically like you describe. I think this is a big part of what makes signals work so well in comparison to, say, RxJS - they're basically the same concept (here's a new reactive primitive, let's model our data in terms of this primitive and then plug that into the UI, so we can separate business logic from rendering logic more cleanly), but the behaviour of a signal is often easier to understand because it's not built from different combinators but just described in "ordinary" code. In effect, if observables are monads that need to be wired together with the correct combinators, then signals are do-notation.
I've been working in the context of reactivity on the backend where you're often loading up the calculations "from scratch" in a request.
I agree with your monad analogy! We looked into using something like Rx in the name of not reinventing the wheel. If you build out your calculation graph in a DSL like that, then you can do more analysis of the graph structure. But as you said in the article, it can get complicated. And Rx and co are oriented towards "push" flows where you always need to process every input event. In our context, you don't necessarily care about every input if the user makes a bunch of changes at once; it's very acceptable to e.g. debounce the end result.
But after working with Svelte 5 daily for a few months, I don't think I like the implicitness. For one, reactivity doesn't show up in type signatures. So I have to remember whats reactive and what's not. Another is that making plain values (string, numbers, etc) reactive is really a trap. The reactivity doesn't cross file boundaries so you have to do weird things like exporting a getter / setter instead of the variable.
I agree that not seeing reactivity in the type system can be irritating. In theory, you can wrap reactive elements in `Computed` objects (Angular's signals have this, I believe) so you can follow them a bit better, but the problem is that you can still accidentally end up with implicitly reactive values, so it only works as a kind of opt-in "here be reactivity" signal, and you can't guarantee that just because you can't see a `Computed`, that reactivity has been ruled out.
That said, I find I eventually built up a good intuition for where reactivity would be, usually with the logic that functions were reactive and single values weren't, kind of like thunks in other contexts. For me, at least, it feels much simpler to have this implicit tracking, because then I don't need to define dependencies explicitly, but I can generally see them in my code.
There's some great discussion over on lobste.rs (https://lobste.rs/s/2zk3oe/pushing_pulling_three_reactivity), but I particularly recommend this link that someone posted there to a post that covers much of the same topic but in a lot more detail with reference to various existing libraries and tools and how they do things: https://lord.io/spreadsheets/
The format made for good lunchtime reading -- the care you put into making it easily readable shows. Are the illustrations actually hand-drawn? Looking forward to the next part(s) that you hinted at!
Still kind of WIP, but it isn't secret. People are welcome to check it out at https://gitlab.com/samsartor/hornpipe
It's worth considering that the dataflow graph (as an abstract mathematical graph), the computation graph (the partial order of function execution required to compute the data), the traversal strategy, the runtime representation of the graph, the runtime data structure for the graph, and the runtime data structures for efficient reactive update are all separate but related aspects.
For instance, push and pull are both directed graphs. They have the same connectivity, but the direction of the arrows is reversed. You can only efficiently traverse edges in the direction that you represent. A dataflow graph has edges pointing from sources to sinks, a data dependency graph has edges pointing from sinks to sources. [Side note: if a computation can produce multiple results the data dependency graph and the computation dependency graph are not exactly the same thing and you need to be clear on the distinction, but I am assuming here single-output nodes]. In a dataflow graph you want to evaluate the changed nodes prior to evaluating the downstream nodes that depend on them. As TFA states, this necessitates a postorder (children first) traversal of the data dependency graph, starting at all terminal sinks, and terminating at sources or already visited nodes. You can use a sense-reversing "visited" flag on each node to avoid a reset pass. As noted in the article this traversal need only be performed when the graph topology changes. But for stable traversal order the topological sort can be cached in an array. Needless to say that arrays are much faster to iterate over than any kind of pointer chasing. [Witness the rise of Entity-Component systems over OO models]. I suspect that there is a cut-over point where it is more efficient to iterate the entire array (perhaps with memoized results, or JIT compilation) than to perform a more surgical "update only what is downstream of the changes" approach. Another approach is to assign all nodes a contiguous integer id, and maintain a dirty node bitmask where bit-indices correspond to node ids. In addition, each source has a bitmask that is 1 for all downstream dependent nodes. When a source changes, bitwise-or source.downstream_dependents bitmask with the global dirty_nodes bitmask. To evaluate (not necessarily immediately), iterate in topological order processing only the dirty nodes. In any case, the point I'm trying to make is that the data structure that is best for building or manipulating the graph could very well be different from the data structure that is best for computing the desired results. There will be trade-offs to be made. For this reason alone it's best to keep the graph-theoretic properties and the implementation data structures separate in your head.
In my view the interesting requirements raised by the article are (1) lazy evaluation (e.g. of expensive or conditionally required data). This might be where control flow graphs of basic blocks enter the story. and, (2) dynamic reconfiguration during node evaluation. Some questions I'd be asking about dynamic reconfiguration are: what happens if you delete a node that has yet to be evaluated? will new subgraphs "patched in" to the existing graph (how exactly?), or are they always disconnected components that can be evaluated after the current graph traversal completes?