Nodejs Asychronous I/o Execution

July 30, 2023 Post a Comment

As far as I understand, although there is apparently a 'helper' thread, Node.js runs in a single thread, therefore, each operation in the Event Loop stack runs one after another an

Solution 1:

Node is, essentially, non-multithreaded. The asynchronicity goes deeper than Node, deeper than libuv, and even deeper than the facilities (epoll, kqueue, IOCP, etc.) that libuv uses.

When the kernel gets an async request it does not fire up another thread of execution. Instead, it adds it to a simple list of "things to watch out for." If a process makes a network read request, for instance, the kernel will make an entry on that list. It's something like "hey, next time there is a read request that looks like this, let the process know about it." After making this entry, the kernel returns control back to the process and both go on their merry way. The only thing that survives is the data on the list.

The kernel is informed of a network read event via hardware interrupts. Using an interrupt, the processor yanks the kernel into a special loop -- stopping anything it's doing at the moment -- and tells it about the event. The kernel then checks against its list of outstanding requests, and (in the kevent AIO case) sends a similar interrupt (in the form of a signal) to the process to let it know about the network read. So, no threads. Just interruptions.

Well, this is a bit of a simplification: in the non-AIO kevent and epoll cases, after the kernel gets a network read it'll just put it on an event list. The process periodically checks that event list to see if something came in for it.

Also, from the kernel view, this is how all I/O works. The big difference is that the kernel isn't requiring the process to wait for the kernel to get back to it.

There is actually a little more complexity in libuv as non-network requests (and DNS requests, which are special, painful forms of network requests) are handled by threads. This is because the kernel facilities for making those asynchronous are generally not so great, if they exist at all.

Solution 2:

It's not multithreaded with one minor exception. Let's talk about that exception later. First, lets see why things can happen in parallel but not multithreaded.

Waiting for I/O

When you send something over the network, for example, making a http request, waiting for a http response to complete or waiting for mysql to respond your software is not doing anything.

So, how can the network work if your software is not doing anything? (sorry if this is obvious but pointing out the obvious is usually what makes it obvious).

Some things happen outside the CPU

First, most of the network is outside your CPU. You have your network card buffering data going out and coming in. You have the electrons in the network cable or photons in space/air vibrating according to the signal sent by networking equipment. You have your router, your ISP, the server on the other side of the planet etc.

All the above needs to be processed in order for your http request to return data. In most languages, while all the above is happening, your code will not be doing anything.

Not so in javascript. When an I/O request is made, instead of waiting for the data to return the interpreter will simply register a callback you supply to execute when the data finally gets here. Now that that's done, other code can execute. Maybe some other data requested earlier is now here and that callback can be executed. Maybe a setTimeout has expired and it's time to call that callback.

So multiple things can happen in parallel, most of it outside your process, a lot of it outside your CPU, some of it on another machine which may even be on the other side of the planet. While that's going on, javascript allows you to run some code.

The exception: disk I/O

The exception to this is disk I/O. At the lowest level (actually, next-to-lowest) C exposes only synchronous functions like fread(), fwrite() for I/O. Reading network packets are also technically synchronous. The difference is that the network doesn't respond immediately so the networking code has lots of time to spare waiting for packets. It is in between these reads and writes that javascript runs your code. But the filesystem will happily tell you that data is immediately available. So, unlike the networking code, the code reading from disk will spend most of its time being busy.

There are several work-arounds to this. Some OS even have asynchronous API to read from disk. The node.js developers have decided to handle this situation by spawning another thread to do disk I/O. So for disk I/O it is parallel because it's multithreaded.

JavaScript Guy