A recurring complaint about Nextcloud is how slow it can be for certain tasks. Why is Nextcloud slow? That is a very general question. Nextcloud is not always that slow. Actually it has improved vastly the past couple years. PHP newest versions get faster and faster, Nextcloud developers are now more careful to streamline assets to avoid dozens of requests to load the web interface, and so on and so forth.
But still things like uploading multiple small files or browsing the Gallery are very slow. Too slow if we compare them with their alternatives.
Both those two use cases have something in common: many requests need to be processed in parallel, and this is precisely where Nextcloud struggles.
There are two main separate reasons for this, that revolve around the architecture of Nextcloud and traditional PHP projects in general.
The process of a traditional PHP request
In the classic PHP architecture, the service is not really running while idle. It is the HTTP server who is listening on the network, then detects whenever a request is for PHP processing and knows to pass it over to the PHP interpreter.
Now, there has been an evolution in how the HTTP daemon serves PHP requests. Nowadays, PHP-FPM is most commonly used, which is a different service that implements a FastCGI interface. The HTTP server forwards the request using mod_proxy_fcgi to PHP-FPM, which will typically spawn a new process to start interpreting the PHP code, until the request has been served and then the PHP process either ends, or goes idle waiting for a new task.
This means that with every HTTP request, the process (Nextcloud in this case) needs to start and die, it needs to parse its configuration file, initialize its variables, process everything and cleanup.
Compare that to a traditional C or C++ service, such as Apache itself, which loads its configuration, allocates memory for its main structures and stays listening waiting for an incoming connection. They only need to serve the request, but they don’t need to start over from zero every time.
Now, staying for a bit longer with the Apache example the classic way of this and many other services to operate is to fork a new child process every time a request arrives so that the child processes it and the parent can go back to listening on the socket.
The issue with this is that spawning a new thread is computationally expensive because now we have the context switching performed by the OS so that both threads have their share of processing time. Additionally this reserves a lot of memory since each spawned thread contains a copy of the parent process memory in COW mode. The memory won’t be copied until it diverges between processes, but unless we use overcommit it will be accounted for twice.
It turns out that it is more efficient to keep working on a single thread, or maybe a thread pool to try to avoid said costs. No spawning and limited context switching. This model makes use of asynchronous OS primitives such as epoll that guarantee that separate concurrent events can be processed sequentially by a single thread. Libraries such as libevent, libev and libuv use these primitives to attend to requests such as reads, writes and timers in a closed loop where the library user only needs to register callbacks. This is called an event loop architecture.
This is where Nginx came strong to the scene, since it uses precisely this architecture achieving better performance. Apache ended up adopting the same strategy with MPM event, and then many others followed.
Nextcloud is a traditional PHP application. It is like Apache before MPM. Think opening the gallery: lots of requests are sent at once to the server to retrieve all those dozens thumbnails that need to be painted on the screen.
For every one of those thumbnails, we load the configuration file, parse the environment, load all Nextcloud apps with all their hooks, go through security protections such as CSRF, tokens and such.
So the number one thing to do to improve the situation is to load as least as possible for every request, in other words, make each request as light as possible. This can be done without touching the architecture.
The real improvement though would come from taking the step that the rest of the industry has already taken. Nextcloud should adopt the event loop model.
This way, the service is already running when the request arrives. The environment is already loaded and we are waiting in the event loop to dispatch requests without needing to initialize and tear down for each one of them. In this architecture, retrieving each thumbnail would mostly boil down to a database access and a file system access.
This is obviously a huge paradigm shift but it is one that should be taken seriously in order to remain relevant. If anybody thinks this is too costly I would just like to cite the sunken costs fallacy.
Check out this post to read more about other Gallery performance issues.