Interesting Things from Webpack Sources

Jun 17, 2020

A lot of frontend projects use webpack as a bundler tool. It allows to transform, bundle, or package different resources like JavaScript, styles, images, or fonts. Recently I’d been exploring webpack sources and I found several interesting solutions to different programming problems that can be used in JavaScript projects.

Function context

Webpack is a highly extensible tool. It allows to process different types of content like coffeescript or css using loaders. The majority of loaders transform input content into JavaScript, which can be handled by webpack. The simplest loader receives input data as an argument and returns the result as a string:

module.exports = function(content, map, meta) {
  return "module.exports = \"Hello world\"";
};

More complicated loaders often perform different operations like async function calls, cache control, or adding dependencies. In this case, a loader requires some sort of communication with webpack but there is no explicit argument for this purpose. Loaders use a different approach: calling methods of this context:

module.exports = function(content, map, meta) {
  const callback = this.async();
  someAsyncOperation(content, function(err, result) {
    if (err) return callback(err);
    callback(null, result, map, meta);
  });
};

The same approach was popular several years ago with jQuery:

$("li").each(function(index) {
  console.log(index + ": " + $(this).text());
});

The idea of providing this for a function call is one of the core ideas of JavaScript. E.g. it’s possible to bind the same callback to different links and determine which link was clicked using this:

document.querySelectorAll('a').forEach(a => a.onclick = onLinkClick);

function onLinkClick() {
  console.log(`link clicked: ${this.innerText}`);
}

To implement webpack-like approach for loaders such code can be used:

// Assume loader is a function described earlier

function processLoaderResult(err, result) {
  // do something with loader result
}

const context = {
  async() {
    this.callbackRequired = true;
    return (err, result) => {
      processLoaderResult(err, result);
    }
  }
}

const result = loader.call(context, "some module content");

if (!context.callbackRequired) {
  processLoaderResult(null, result);
}

It’s important to remember that arrow functions do not allow to change this that is why it’s impossible to use the arrow function as a complicated loader:

module.exports = content => {
  // this === global
  // this.async === undefined
  const callback = this.async(); // undefined is not a function
  someAsyncOperation(content, (err, result) => {
    if (err) return callback(err);
    callback(null, result);
  });
};

The idea of changing this context can be useful in different frameworks and libraries but in my projects, I prefer to minimize using it. I think it’s better to use this with classes and objects. When I write a function I provide data for it as explicit arguments. If a function requires some sort of context I provide it as a first argument:

function processRequest(context, request) {
  const session = context.db.findSessionForUser(request.user);
  // ...
}

In my opinion, such code is easier to understand (e.g. arguments can be described with JSDoc) and it’s possible to use arrow functions.

Tapable

Some of webpack algorithms consist of two parts: backbone behavior and customizations, e.g. adding an entrypoint performs some operations and then calls addEntry hook:

this.hooks.addEntry.call(entry, options);

A hook is a way of defining an extension point. Plugins can bind callbacks to a hook and webpack will execute these callbacks at some point in its algorithm.

Such an approach allows to customize different stages of the build process. Moreover some internal code like RuleSetCompiler consists of hooks almost completely. Webpack uses Tapable library to introduce and execute hooks.

Take a look at the example. Assume there is a task of processing user requests. There are some common logic and several independent actions that should be performed for each request. Code for such case can be implemented like this:

const { SyncHook } = require("tapable");

class RequestProcessor {
  constructor() {
    this.hooks = {
      processRequest: new SyncHook("request")
    };
  }

  async processRequests() {
    while (true) {
      const request = await this.getRequest();
      logRequest(request);

      this.hooks.call(request);
    }
  }
}

const requestProcessor = new RequestProcessor();

requestProcessor.hooks.tap("email", (request) => {
  // send email
});

requestProcessor.hooks.tap("db", (request) => {
  // save info to db
});

requestProcessor.hooks.tap("stats", (request) => {
  // calculate some stats
});

requestProcessor.processRequests();

It’s similar with events processing in the browser when the developer adds several callbacks for some event of the DOM node and the browser calls them when the event occurs:

const btn = ducument.getElementById('main-btn');
btn.addEventListener('click', showBanner);
btn.addEventListener('click', calculateStats);

Tapable provides different types of hooks: synchronous, asynchronous, sequential, parallel, and so on. That is why it’s possible to embed hooks in almost every algorithm when required.

The idea of hooks looks reasonable for libraries and big projects but can be an overkill for the small ones: sometimes it can be very difficult to debug such code because of hidden connection between the event emitter and the event listener.

neo-async

A lot of projects noways uses collection methods like forEach or map:

users.forEach(user => print(user));
emails = users.map(user => user.email);

These methods are useful for synchronous code but can’t be used with asynchronous code. One of the solutions to such a problem is a neo-async library:

const async = require('neo-async');

// array
const userIds = [1, 2, 3];

const iterator = (userId, done) {
  fetchUserInfo(userId).then(info => {
    done(null, info);
  });
};

async.map(array, iterator, (err, res) => {
  console.log(res);
});

neo-async is useful for the code with callbacks but my code often uses Promises and async/await that is why I prefer a different approach. When the order of execution is not important I write such code:

const userIds = [1, 2, 3];
const res = await Promise.all(userIds.map(fetchUserInfo));
console.log(res);

In case when the order is important and functions should be performed one after another I write such code:

const userIds = [1, 2, 3];
const res = await userIds.reduce(async (promise, userId) => {
  const infos = await promise;
  infos.push(await fetchUserInfo(userId));
  return infos;
}, Promise.resolve([]));
console.log(res);

needCalls

In the NormalModuleFactory.js I’ve found helper:

const needCalls = (times, callback) => {
  return err => {
    if (--times === 0) {
      return callback(err);
    }
    if (err && times > 0) {
      times = NaN;
      return callback(err);
    }
  };
};

The idea of it is to call some callback after several asynchronous operations or as soon as an error occurs:

const callback = needCalls(3, (err) => {
  if (err) {
    console.log(`Error found: ${err}`);
  } else {
    console.log("Operations completed");
  }
});

asyncOperation1(callback);
asyncOperation2(callback);
asyncOperation3(callback);

I think this helper can be very useful in some cases.

Recursion and stack

When a program calls the function JavaScript Virtual Machine saves the current position of the code, executes the required function, and returns to the saved position.

const result = add(1, 2);
console.log(result);

function add(a, b) {
  return a + b;
}

In the example above at the first line function add is called. At this moment JavaScript saves the position at line 1 and starts execution of add function. After the function returns the value JavaScript returns to line 1 and continues the execution.

A mechanism that stores these positions is called the call stack. The size of the call stack is limited so if a program has a lot of nested function calls it can lead to the stack overflow error.

When a program processes graph structures it’s essential to use recursion algorithms. Each step of recursion adds a new frame into the call stack that is why if graph depth is big enough it’s possible to receive a stack overflow error.

It’s easy to check maximum stack size using the example from StackOverflow:

let i = 0;

setTimeout(function() {
  console.log(i);
}, 0);

function inc() {
  i++;
  inc();
}
inc();

Webpack can process a huge amount of modules and dependencies during the building of the chunk graph that is why an interesting solution of stack overflow problem was introduced: recursion was replaced by the loop over actions queue. Webpack starts building a chunk graph from entrypoints. When a new module is discovered webpack adds it into the queue, processes it, and remove it from the queue. The queue looks like a call stack but its size is not limited that is why it can use the whole available memory of the computer.

Let’s solve a simple task with such an approach. Assume there is such graph structure:

const graph = {
  value: 10,
  children: [
    {
      value: 5,
      children: [
        {
          value: 15,
          children: []
        }
      ]
    },
    {
      value: 20,
      children: [
        {
          value: 1,
          children: []
        }
      ]
    }
  ]
}

Each node has a positive integer value and an array of children. The task is to find the maximum value in such a graph.

A solution using recursion can be like this:

console.log(findMax(graph));

function findMax(node) {
  let maxValue = node.value;

  node.children.forEach(child => {
    maxValue = Math.max(maxValue, findMax(child));
  });

  return maxValue;
}

A solution using a manual stack can be like this:

console.log(findMax(graph));

function findMax(node) {
  let maxValue = node.value;
  const queue = [node];

  while (queue.length) {
    const queueItem = queue.pop();

    maxValue = Math.max(maxValue, queueItem.value);

    queueItem.children.forEach(child => {
      queue.push(child);
    });
  }

  return maxValue;
}

I think a code with a manual stack is more difficult to read and understand so use it only if you have some problems with recursion.

Coming to a conclusion I want to say that it’s very useful to explore the code of open source projects. Even if you will not use such algorithms in your software it lets you look on everyday problems from another perspective and become a better developer.

Developer's blog

Go to Notes