Logo

dev-resources.site

for different kinds of informations.

Weak memoization in Javascript

Published at
6/11/2024
Categories
javascript
web
state
Author
Anton Korzunov
Categories
3 categories in total
javascript
open
web
open
state
open
Weak memoization in Javascript

I decided to write this article to explain how a few things work as I felt a lack of understanding. Things somehow connected to state management and memoization.
We need to talk about something very foundational, as well as about abstractions we can build on top.

This time nothing will be left uncovered. This time you will master worry-free memoization in javascript 😎

Memoization is such a common thing in JS world nowadays, common and very important. With the recent release of Reselect 5.0 it also became a little easier than before. Easier not to break fragile selectors without the need to understand all nuances.

Reselect 5 was released in Dec 2023, but the idea behind the breakthrough solution was boiling for quite a while, for a few years at least.
The first "modern" implementation I was able to find - weak-memoize - was created around 8 years ago.

I have reason to believe that the extended adoption time was influenced by my less-than-stellar communication skills and the challenge of conveying the principles behind "Weak Memoization," which I originally shared five years ago in Memoization Forget-Me.

Let's give it another shot.
Let’s walk the full journey from something 💁‍♂️ simple to something quite 👨‍🔬 extreme.
This time I'll make you believe.

Level 1 - make it work

Let's write a few lines related to performing an operation “once”.
What would the simplest memoization be?



// some place to store results
let storedValue;
let storedArgument;

function memoizedCalculation(arg) {
  // is the arg equal to the known one?
  if(arg === storedArgument) {
    return storedValue;
  }

  // if not - let's calculate new value
  storedValue = functionToMemoize(arg);
  storedArgument = arg;
  return storedValue;
}

// usage
const value1 = memoizedCalculation(1);
const value2 = memoizedCalculation(2);
const valueN = memoizedCalculation(2); // will be memoized


Looks familiar - I reckon a lot of projects have code like this in a few places.

The only real problem - this code is not reusable and… 🤮 global variables that are so hard to control, especially in tests. I reckon the obvious next step is to make it a little more robust and introduce a level 2 solution

Level 2 - put it in a box

Handling global variables is a complex job, so would making them local improve the situation?
How we can make this code more "local" and more reusable?

The answer is "factories"



// reusable function you can "apply" to any other function
+function memoizeCalculation(functionToMemoize) {
   let storedValue;
   let storedArgument;
   return function memoizedCalculation(arg) {
       // is the arg equal to the known one?
       if(arg === storedArgument) {
         return storedValue;
       }

       // if not - let's calculate new value
       storedValue = functionToMemoize();
       storedArgument = arg;
       return storedValue;
   }
+}

// fist create a memoized function
+const memoizedCalculation = memoizeCalculation(functionToMemoize);

// usage
const value1 = memoizedCalculation(1);
const value2 = memoizedCalculation(2);
const valueN = memoizedCalculation(2); // will be memoized


Interesting parts - nothing really changed from the usage point of view and nothing really changed from the function implementation - 99% of the code has been preserved, we just put another function around.

While this implementation is screaming "I am simple" - this is exactly how memoize-one is written, so it is “simple enough” to handle any production use case 🚀

This is a solution you actually can use. May be add .clear method to make ease testing, but that's all.

More importantly - "the original reselect" was also working this way, and that is "not the best way". Nobody was happy.

Something drastically changed in reselect 5, so let’s move forward and try to find that precious change.

Boss fight - there could be only one

The problem with reselect as well as memoize-one is that “one” - a single local variable used to store the results of the last invocation. In no circumstances it could be more than one, it just made this way.

Why its "one"? Everything is connected to the "pattern" being used and one's ability to cleanup the results of previous memoization.
That is the difference between "caching"(with cache-key, storage limits and time-to-life) and memoization free of any "complications".
Memoization Forget-Me explains these moments in detail.

🤔 How we can improve this? Maybe it's not "how" or "what". May be it's "where"?
🤨 Let me ask you - where this local variable is stored?

The correct answer for JavaScript is:

💁‍♂️ "it’s stored in a function closure”, without defining what is a “closure”

A more helpful answer is

🧐 "the variable is stored in a function stack”. The very one that can overflow.

In technical terms stack is an execution context - it stores information about current function arguments, all its local variables and where the function should return execution once it is done. It's a section in a memory holding the real data.

Call stack
In the simplest situation calling a function "extends" stack and exiting a function "shrinks" it. Here is why calling to many functions can cause the stack overflow as it cannot grow endlessly.

However, in the Javascript world it is not so easy, because we have closures which can "see" variables from dead functions or in other words ➡️ the lifetime of a closure is unbound from the lifetime of its parent context. As a result "call stack" in JS is more alike a "graph of boxes" with personal stacks of different functions pointing to each other.

v8 context

💁‍♂️ So every function has a little box📦 with the data. Very neat.
Here is an article with a little deep dive to v8 execution context details.

Every time you call a function it is executed using a different stack, a different context, a different "box". Exactly like React hooks being statically defined in a Function are magically getting values from React Fiber, which is an execution context for a particular component.

just ask yourself - where useState holds value. There should be a special place for it. There is always a special place for something, even in Hell...

👏👏 execution context gives an ability to run the same function with different variables 👏👏

You know - that sounds like this, which works almost the same way in JavaScript - providing context. And keep in mind - the keyword we no longer use is not only about classes - this works for every operation around.



 myFunction.call(this, arg1);


What if we use js context (this) instead of CPU context (stack)

Loading level 2 - somewhere else

First of all, I need to point at one implementation detail of memoize-one which considers a potential side effect of a hidden function argument(this) potentially leading to different outcomes

Consider the following code



class MyClass {
  onClick(event) {
     return this.process(event, this.state)
  }

  render() {
    return <div onClick={this.onClick}/>
  }
}


You might forgotten how it works, but in short - it does not 🤣 as we are losing this by reading a shared value from it. To correct implementation one need to inline callback



class MyClass {
  onClick = (event) => {
     return this.process(event, this.state)
  }


In this case, every this.onClick would be a very unique and very own onClick. For me, it sounds like a memoization.

Wondering how this operation of assigning "class members" works 🤨? Typescript playground to the rescue. But to save a click...



class MyClass {
    constructor() {
        // just stores stuff in "this".
        this.onClick = (event) => {
            return this.process(event, this.state);
        };
    }
}


Looking back at the original render method reading this.onClick you might notice that a value of this matters.

So, let's imagine that this is always present. What about updating our memoization function a little bit?
What if we store our memoization results variable inside context, so we could have as many memoized results as we need?
Let’s do something very straightforward (and quite similar to the legacy React Context)



function memoizeCalculation(functionToMemoize) {
-   let storedValue;
-   let storedArgument;
   return function memoizedCalculation(arg) {
       // is the arg equal to the known one?
+      // let's store data not in the "function context"
+      // but the "execution context". 🤷‍♂️ why not? 
+       if(arg === this.storedArgument) {
+         return this.storedValue;
       }

       // if not - let's calculate new value
+       this.storedValue = functionToMemoize();
+       this.storedArgument = arg;
+       return this.storedValue;
   }
}


Now we have the ability to store more than one result because we store it in different contexts.



myFunction.call(object1, arg1); // will memoize
myFunction.call(object2, arg1); // will memoize SEPARATELY
myFunction.call(object1, arg1); // will reuse memoization
myFunction.call({storedValue:'return this', storedArgument:args1}, arg1); // will BYPASS 🙀 memoization


That would work, but would also create the same problem Legacy context had - naming conflicts. We used it at the last line above, it's cool, but please dont do that in the real code.
Can we do better? Well absolutely! One line will do the magic!



function memoizeCalculation(functionToMemoize) {
+   // a magic place for our data
+   const storeSymbol = Symbol();
   return function memoizedCalculation(arg) {
       // is the arg equal to the known one?
+       if(arg === this[storeSymbol]?.storedArgument) {
+         return this[storeSymbol]?.storedValue;
       }

       // if not - let's calculate new value
+       this[storeSymbol] = {   
+         storedValue: functionToMemoize(),
+         storedArgument arg,
+        }
+       return this[storeSymbol]?.storedValue;
   }
}


That looks great - we created a "magic key" to safely and securely store our data. Many solutions out there use a similar approach.
However, I am still not satisfied because we penetrate object boundaries and store information where we should not.

Could we do better?

Level 3 - The Better Place

🎉 this is the chapter where weak memoization starts!

In the example above we were using this to store information. We were storing this information directly inside this, but the task was always about tracking relations, nothing else.

information is related to this

  • this[key]=information
  • this[information]="related

Every time you encounter such a situation ask yourself - is there a better primitive to handle the job? In database terms, it could be a separate table storing relations, not another field on the thisTable

And indeed there is one primitive we could use here - Map, or to match our use case a WeakMap

WeakMap will hold information about A->B and automatically remove the record when A ceases to exist.
Semantically map.set(A,B) is equal to A[B]=true, but without hurting A

This is how our updates memoization will look like



function memoizeCalculation(functionToMemoize) {
+   // a magic store
+   const store = new WeakMap();
   return function memoizedCalculation(arg) {
       // is the arg equal to the known one?
+      const {storedArgument, storedValue} = store.get(this);
+       if(arg === storedArgument) {
+         return storedValue;
       }

       // if not - let's calculate new value
+      const storedValue = functionToMemoize();
+      // we do not "write into this", we "associate" data
+      store.set(this, {   
+         storedValue: functionToMemoize(),
+         storedArgument arg,
+        });
+       return storedValue;
   }
}


Now we store information related to this and functionToMemoize in some other place.

But the presence of context was an assumption, in reality is usually absent.
How we can change this approach to get at least some benefits from the given approach without relaying on this presence?

🤷‍♂️ Easy, just introduce another variable.



// "master default"
+ let GLOBAL_CONTEXT = {};
function memoizeCalculation(functionToMemoize) {
   // a magic store
   const store = new WeakMap();
   return function memoizedCalculation(arg) {
+      const context = this || GLOBAL_CONTEXT;
       // is the arg equal to the known one?
+      const {storedArgument, storedValue} = store.get(context);


And we can do another another step forward
What about automatic cleanup for all functions to ease testing (and help catching memory leaks).



let GLOBAL_CONTEXT = {};
// instantly invalidates all caches
export const RESET_ALL_CACHES = () =>GLOBAL_CONTEXT={};


or



+ export const GET_CONTEXT = () => userDefinedWay();
// ...
   return function memoizedCalculation(arg) {
+      const context = GET_CONTEXT();


So now we actually got this omnipresent context and more importantly - we can control the version of store exposed to the memoization by controlling this GLOBAL_CONTEXT.
Here we can even start using Asynchronous context tracking to have per-tread or per-test memoization separation out of the box.

And you know... I would only keep it and remove this we do not really use anymore 😎

🤑 BONUS: Just update GLOBAL_CONTEXT - all cache would be invalidated with all previous values garbage collected. Magic 🪄!

Cool, the code above is actually useful, but we are still quite far from anything useful and anything as good as reselect v5. Let’s follow to the next level!

Level 4 - cascading

Let’s make a little detour here. Reselect5 was not developed from the scratch, it was inspired by React.cache function, which is not deeply documented and not many developers really understand how it works.

So let’s check the code! Here is a link. And here is the first section of the code



export function cache(fn) {
  return function () {
    // try to read some "shared" variable, like `GET_CONTEXT` in the above
    const dispatcher = ReactSharedInternals.A;
    if (!dispatcher) {
      // If there is no dispatcher, then we treat this as not being cached.
      return fn.apply(null, arguments);
    }
    // and get some "fnMap", so alike to our "store"
    const fnMap = dispatcher.getCacheForType(createCacheRoot);


And it actually looks very close to the code we just written

  • some "omni present" variable - ReactSharedInternals.A gives us a pointer to the cache store
  • and we will "read" and "write" to it
  • there could be more than one such context at the same point of time, which is quite valuable for SSR being executed in a parallel
  • React.cache just stores result in different places for different renders.

But there is another important moment in React's implementation related to a "single result" being stored. React.cache uses cascade. And Reselect v5 does exactly the same, and memoize-weak does the same and kashe does the same



for (let i = 0, l = arguments.length; i < l; i++) {
      const arg = arguments[i];
      if (
        typeof arg === 'function' ||
        (typeof arg === 'object' && arg !== null)
      ) {
        // Objects go into a WeakMap
          cacheNode.o = objectCache = new WeakMap();
          objectCache.set(arg, cacheNode);
      } else {
        // Primitives go into a regular Map
          cacheNode.p = primitiveCache = new Map();
          primitiveCache.set(arg, cacheNode);
      }
    }


In some implementation null is rerouted to a const NULL={} object in order to allow WeakMap usage.

It creates a 🌳Tree-Like structure with nodes being WeakMap or Map, depending on the argument type.

Red-Black tree

Think about the RedBlack tree from above, with black nodes being Map and red nodes being WeakMap

One important downside of this approach could be found at the original issue

I use a WeakMap so that objects that can't be reached again get their cache entries garbage collected. Expandos aren't appropriate here because we also can GC the cache itself.

There are unfortunate cases like cachedFn('a', 'b', 'c', 'd', {}) will get its value GC:ed but not the path of maps to it. A smarter implementation could put the weakmaps on the outer side but it has to be more clever to ensure that argument index is preserved. I added some tests to preserve this in case we try this.

In the given example first arguments are not weak-mappable, so the cascade will be not garbage collected, but the values returned by a cached function - will.

  • Unless there is no weakmappable objects at all
  • But everything will be cleaned up once the current rendering ends and the "main symbol" get's removed.
  • so that's not an issue

🏆 The Cascade is what made Reselect 5 superior to all previous versions. Now it can store more than one value, because it's not a single local variable, its a tree 🏆

Level up - an extra ability

As long as we are using React as an example - there is one more feature connecting the dots between something we talked about - Taint API

taintObjectReference lets you prevent a specific object instance from being passed to a Client Component like a user object.

How does it work? Exactly like the "relations" from above (link to sources)



export const TaintRegistryObjects = new WeakMap();

export function taintObjectReference(
  message,
  object,
) { 
  // here it goes
  TaintRegistryObjects.set(object, message);
}

/// ...
const tainted = TaintRegistryObjects.get(value);
if (tainted !== undefined) {
    throwTaintViolation(tainted);
}


WeakMaps establishing relations and is helping with so many things.

So here it is

I hope you enjoyed this journey, and this evolution of concepts helped you better understand how things are connected and how different primitives can help you.

Go use this new knowledge!

Quick recap

  • every program need to store variables somewhere. Usually you don't control this moment, but sometimes you do
  • by controlling where values are stored you can create many different things - from weak memoization to react hooks
  • there are libraries our there, like the latest generation of reselect, that gives you superpowers to worry less about memoization
  • there are still some edge cases where you might need to have a special logic, for example with React separating memoization between different render threads

PS: And as I've mentioned React, hooks, and stack -
The future of React.use and React.useMemo points at the interesting discussion that could be totally understood differently once hooks using fiber as a "single stack" model will start using something more like closures, ie independent context.

Featured ones: