kernel, linux, shell

The real power of Linux executables

What happens when a file gets executed in Linux? What does it mean that a file is executable? Can we only execute compiled binaries? What about shell scripts then? If I can execute shell scripts, what else can I execute? In this article we will try to answer those questions.

 

What involves executing a file

 

Starting from the basics, let’s try to understand what happens when we type the following in our terminal

The way our programs in the userspace interact with the kernel is through system calls. In practice we need to interact with the kernel to do almost anything that is interesting, such as printing output, reading input, reading files and so on.

The syscall that is in charge of executing files is the execve() system call. When we are coding, we normally access it through the exec family of functions present in the standard library, or even more commonly through higher level abstractions such as popen() or system().

It is important to note that when we execute a file through execve() , no new process is generated. Instead, our calling process will mutate into an instance of execution of the new executable. The PID won’t change, but the machine code, data, heap and stack of the process will be replaced inside the kernel space.

This is different to the way we are used to launching executables from the terminal. When we type sleep 30 in the terminal, we get a child process of bash, and the latter does not disappear.

Here another system call is coming into play, the fork() syscall. bash will first create a copy of itself in another child process, and then this child process will call execve() in order to transform itself into sleep. This way bash doesn’t disappear, and will be there to take over control when sleep dies after 30 seconds.

We can skip the forking step through the exec bash bultin

In another terminal we can see that sleep takes over the PID

And sure enough, after 30 seconds when sleep exits there is no bash session, so the terminal window will close.

The kernel side of things

 

So far we have seen the userspace interacting through syscalls, let’s look at what happens at the other side.

The implementation of those syscalls lives in the kernel. In general lines, the execve() syscall requests the kernel the execution of a certain file in disk, and the kernel needs to load that file into memory where it can be accessed by the CPU.

This is the entry point of the system call, at fs/exec.c

From here, the kernel first performs all required preparations in order to start executing a binary, such as setting up the virtual memory for the process

, and setting the filename, command line arguments, and inherited environment. Yes, environment variables are also first class citizens of the Linux kernel.

Finally, the file “will be executed”.

ELF binaries

 

Binaries are not only a big blob of machine code. Modern binaries are using the binary ELF format, for Executable and Linkable Format. In simple terms, this is a way of packaging the different code and data sections with some attributes, such as write protection for the .text code section, so that they get mapped into virtual memory for execution. In addition, there are machine independent headers that provide basic information about the executable, such as wether is statically or dynamically linked, the architecture and so on.

The kernel code responsible for parsing the ELF format lives in fs/binfmt_elf.c. Here, the ELF headers are read and analyzed

, and the PT_LOAD sections are loaded into virtual memory

This is just slightly more convoluted for dynamically linked programs. The kernel recognizes a dynamically linked program by the presence of the PT_INTERP header.

This header is hardcoded at compile time with the path of the runtime linker ld-linux-x86-64.so that needs to be used to run it. The runtime linker will find in the filesystem the .so libraries that the binary needs to execute, and will load them into memory.

In this simple case only the standard C library needs to be dynamically linked, because vDSO is a special virtual library put there for efficient execution of read only syscalls.

The kernel will recognize the PT_INTERP header, and will also load the runtime linker (a.k.a ELF interpreter) ld.so and execute it.

Then ld.so will find and load the .so dynamic libraries, or fail if any undefined symbol remains unresolved, and finally will jump execution to the beginning (AT_ENTRY) of the original binary code.

We get the idea that the kernel is the one that inspects the binary and handles it, but in reallity, we are not executing our binary, but the ELF interpreter ld.so instead. Technically it is our binary’s machine code that it is still being executed but we are faced with a concept, the interpreter which is the one that is actually executable, and that is the one actually “interpreting” our file.

At the end of the day, we are doing this

What about scripts?

 

I have been trying hard to avoid the word binary because you can actually “execute” files that are not binary. These files are technically not executed themselves, because they don’t necessarily contain machine code, but by an interpreter as we just saw.

Now, let’s look at how a executable script is run. I used to imagine that the bash process (or the file manager) inspects the first bytes of the file, and if it finds the shebang, for instance #!/bin/python2, then it calls the appropriate interpreter. Turns out this is not how it works. The shebang detection actually happens in the Linux Kernel itself.

This means there is more to execution than just ELF: Linux supports a bunch of binary formats, being ELF just one of them. Inside the kernel, each binary format is run by a handler that knows how to deal with said file. There are some handlers that come with the standard kernel, but some others can be added through loadable modules.

Whenever a file is to be executed through execve(), its 128 first bytes are read and passed on to every handler. This occurs at fs/exec.c

Each handler can then accept it or ignore it, usually depending on some magic in the first bytes of the binary. This way, the appropriate handler takes care of the execution of that binary, or passes on the chance of doing so to another handler.

In the case of the ELF format, the magic is 0x7F ‘ELF’ in the field e_ident

This is checked by the ELF handler at binfmt_elf.c in order to accept the binary.

So what happens with scripts? well, it turns out that there is a handler for that in the kernel, that can be found at binfmt_script.c.

All binary format handlers offer an interface to the execve(), for instance this is the one for the ELF format

The main hook here is the load_binary() function that will be different for each handler.

In the case of the script handler, its load_binary() hook is at binfmt_script.c, and starts like this.

So it is the kernel who actually parses the first line of the script, and passes on execution to the interpreter with the script path as an argument. As long as the file starts with the shebang #!, it will be interpreted as a script, be it python, awk, sed, perl, bash, ash, sh, zsh or any similar other.

Once more, we are not actually executing our script, but we are doing something like

We are starting to see that the name binary format is a bit misleading, as we can execute things that are not binaries. There are other handlers for exotic or old binary formats, such as the flat format, or the old a.out format, and in particular there is a very powerful and versatile handler, the binfmt_misc handler.

Execute anything with bitfmt_misc

 

Now we know what a binary handler is, and we can understand binfmt_misc. This is a flexible format handler that allows us to specify what userland interpreter should run for a specific file type. It doesn’t just look at a hardcoded magic at the beginning of the file, but also supports detecting the binary by extension, using masks, and offers a /proc interface to the system administrator. Remember that all this is happening in kernel space. The loader for this handler is load_misc_binary() at fs/binfmt_misc.c.

If the /proc interface is not already mounted for us, we can do so with

Let’s have a look at it

We can see that we already have it populated with some python and other entries.

We can remove, enable or disable these entries.

  • echo 1  to enable entry
  • echo 0  to disable entry
  • echo -1  to remove entry

What is really cool is that we can easily add custom entries through binfmt_misc. In order to add an entry, you echo a format string to register. Details on how to configure all these flags, masks and magic values can be found here.

As an example, let’s create a handler for the JPG image format to be opened by the feh image viewer. In this case we are matching by extension (therefore the E)

The handler is registered now. We can inspect it

Now let’s take a picture, make executable, et voila.

Let’s now create an executable TODO list, based on magic number detection ( M).

PDFs start with the text %PDF, as seen in the specification.

so we just

Another example, Libreoffice files by extension

We have all the new entries in proc

With this technique we can run transparently Java applications (based on the 0xCAFEBABE  magic)

This requires the use of a wrapper, that you can get from the Arch Wiki.

This also works for Mono, and even DOS! In order to run good old Civilization transparently, install dosbox, configure binfmt_misc

, and now we can

 

Some goes for Windows emulated binaries

The problem is that all DOS, Windows and Mono binaries share the same MZ magic, so in order to combine them, we would need to use a special wrapper that is able to detect the differences deeper in the file, such as start.exe.

If we want to make these settings permanent, we can setup this configuration at boot time at /etc/binfmt.d. For instance, if we wanted to setup the PDF handler at boot time, we just add a new file to the folder

We can see that this approach is very flexible and powerful. One problem with it is that it is a bit unconventional to have “regular files” other than scripts as executable. The good thing about it is that it is truly a system wide setting, so once you set it up in the kernel, it will work from all your shells, file managers and any other user of the execve() system call.

We will see some more useful things we can do with binfmt_misc in the following post.

References

 

This article aims at being a gentle overview of binfmt_misc and the execution process with some practical examples. If you want the gory details, consider reading the following references.

How programs get run

How programs get run: ELF binaries

Anatomy of a system call, part 2

System calls in the Linux kernel

Author: nachoparker

Humbly sharing things that I find useful [ github dockerhub ]

11 Comments on “The real power of Linux executables

  1. Great writeup! It’s incredibly interesting and a great find. There are plenty of simple ways to break things; glad you found one 🙂

  2. What happens if you get into a recursive definition? Like, you install a handler that is handled by another handler? What if you configure a double-handler, one calls the other?

Leave a Reply

Your email address will not be published. Required fields are marked *