ARM, kernel, linux, qemu

Transparently running binaries from any architecture in Linux with QEMU and binfmt_misc

 

 

What? you can do that in Linux? It turns out you can!

First, let’s see it in action. Here I retrieve a binary from my Raspberry Pi which is an ARM binary and execute it in my x86_64 machine transparently.

If you try to do this… it won’t work right away.

First we have a couple things to set up. We will be using QEMU in a slightly unconventional way in a combination with a kernel feature called binfmt_misc.

 

QEMU user mode

 

Obviously our CPU is not able to run foreign machine code instructions. We said we would be using QEMU, but in a slightly unconventional way.

We all know QEMU as a virtual machine, where we load a virtual (fake) hard drive with an operating system and we setup fake hardware to interface with it: a fake CPU, fake keyboard, fake network adapter and so on. This look like this

But there is also another mode of use in QEMU, called user emulation.

When we write a program, we interact with the system through system calls. We need to do this in order to interact with the keyboard, terminal, screen, filesystem and so on. This means that when we execute a program, the code that we write is executed in user space, and then the kernel does the interacting with the system part for us. We just request things from the kernel such as writing to a file.

In QEMU system emulation this looks like this

In user mode, QEMU doesn’t emulate all the hardware, only the CPU.  It executes foreign code in the emulated CPU, and then it captures the syscalls and  forwards them to the host kernel. This way, we are interfacing the native kernel in the same way as any native piece of software. This looks like this

This has many benefits, because we are not emulating all the hardware, which is slow, and also we are not emulating the kernel which is a decent part of the computation that takes place. Actually we don’t even need a kernel. We can understand now why this runs much faster than full system emulation.

As an example, let’s crosscompile a static ARM binary

we need to install the toolchain to crosscompile from x86 to armhf, for instance

, or in Arch Linux

Then we generate the binary

Now we can run it with qemu-arm. We need to install the package qemu-user

, and now we can run

This isn’t yet very useful because most programs are dynamically linked. We still have some work to do.

 

Running ARM executables transparently

 

Recall from the last post on Linux executables what happens when we execute a file and how we can use binfmt_misc to set up our own interpreters. Now we have all the pieces and we want to put them together. We need to setup binfmt_misc in order to use QEMU user mode as an interpreter for our binary format.

We can do it ourselves manually, or install the qemu-user-binfmt package, normally installed automatically with qemu-user. We end up with the binfmt_misc entries

Now we can substitute

for

, because we have an active entry in binfmt_misc

The kernel recognizes the ARM ELF magic, and uses the interpreter /usr/bin/qemu-arm-static , which is the correct QEMU binary for the architecture.  0x7F 'ELF'  in hexadecimal is 7f 45 4c 46, so we can see how the magic and the mask work together, considering the structure of the ELF header

At the end of the day, we want our code to tell the kernel to print hello world. Let’s compare the kernel interactions of the real

and the emulated code

The execve() syscall is the same, and the write()  call too so we get the same behaviour. We can also see that a read to /proc/self/exe reveals that the binary being run natively is in fact qemu-arm-static, the interpreter.

Again, most of the work is being done natively by the kernel, so this actually runs much faster than in QEMU full emulation because the part of the kernel execution would need to be emulated too, as well as the virtual hardware. It is also much easier to setup.

This is still not that useful yet, because very few programs are statically linked. Let’s create x86 and amrhf versions of hello.c

ARM binaries take much more space, because being a RISC architecture it has a smaller instruction set and so it needs more machine code to perform many common operations. Code density can be improved by using the THUMB instruction set.

Let’s try this

Dynamically linked executables provide the path of the runtime linker ( a.k.a ELF interpreter ) hardcoded at compile time.

So the code fails because it cannot find the linker that it requires /lib/ld-linux-armhf.so.3. This normally comes with the cross-toolchain.

We could be tempted to do something really dirty like

We would need to do this not only for ld-linux-armhf.so, but also for libc.so and everything else our binary might need, and we don’t want to have a mix of libraries of different architectures in the same place, right?

We can tell QEMU where to look for the linker and libraries with

but we want transparent execution, so we can add this to .bashrc or .zshrc

, or configure it system wide at /etc/qemu-binfmt.conf

Now it works transparently!

This is still not that useful. The reason is that we now need to have a copy of all the ARM libraries required by our ARM binaries.

Our example works because everything hello.c needs is so basic that comes with the toolchain.

The situation is not too bad in Debian, where you can install libraries from other architectures, for instance

 

Emulating full ARM rootfs

 

Most often in real situations we need to work in the final system where the binary is supposed to run. It makes more sense to have the whole ARM environment with its ARM libraries and all. Enter chroot.

chroot, for change root is a system call and corresponding command wrapper that changes the root directory location of a process and its children. Given a directory with a different root filesystem, we can execute anything in it so that their view of the filesystem has been moved to the new root directory. For this reason it is often called a chroot jail. This is the predecessor of filesystem namespaces, a key component that makes containers possible.

As an example, let’s execute echo inside an x86 jail. I have prepared a whole Debian filesystem in new_root_folder .

This echo, or whatever  binary we run does not see anything outside of the jail. It is impossible for instance to remove or read a file outside of the new root folder.

We can get an existing ARM rootfs to work with, or we can generate one. In Debian we can use debootstrap with the  --arch  switch to generate a Stretch ARM rootfs.

What we want to do now is to use chroot to make the binaries inside the jail view the filesystem just like they expect it. By using chroot we already have /etc, /bin and all the regular folders in place. Next, we need to add the virtual filesystems

Finally, we will copy the qemu-user-static binary inside the ARM filesystem.

This little intruder will be the only x86 binary in an ARM filesystem, he’s surrounded!

We have everything in place! What will happen when we try to execute some ARM executable from the jail?

  • The chroot command will call execve() on the ARM binary
  • The ARM binary will be handled by the binfmt_misc binary handler, according to its configured ARM ELF magic.
  • The entry in binfmt_misc instructs the kernel to use /usr/bin/qemu-arm-static as an interpreter, that is why we had to copy it inside the jail. Remember that by chroot magic /usr/bin is really inside new_root_folder.
  • qemu-arm-static will interpret the ARM binary in user mode. We are using the static version of qemu-arm because we need the interpreter to be standalone, as it is the only x86 binary in the jail and will not have access to any x86 libraries.
  • Any ARM library that is expected by the programs inside the jail will be there, as provided by the ARM rootfs.

Let’s see all this in action, opening a bash shell in a Raspbian rootfs

I had to configure the PATH variable to match the one Raspbian expects. Naturally, our original environment from zsh will be inherited by chroot and arm-bash. We have talked about full system QEMU Raspbian emulation before, and this runs so much faster.

Things will work as long as the ARM binaries see what they expect to see. Binaries can execve() other executables and everything will mostly work perfectly well. An exception would be programs that use exotic system calls that QEMU user mode still has not implemented yet, for instance for using the pseudo random generator. As QEMU user mode becomes more mature, it is getting more strange to see this happen and normally libraries have fallback options for these situations anyway. In those cases you will see something like

Remember that we are still using our host kernel, so we can use networking, install packages with apt and all the rest. This is really useful for things like

  • Manipulating rootfs images for other architectures transparently from our x86 workstation.
  • Compiling ARM binaries more easily. Cross-compiling is hard because you need to isolate the libraries and tools that the cross-compiling environment needs with the ones from your workstation. One way of saving some headaches is to emulate native compiling instead of cross compiling. You host can then help natively by setting up a distcc system between jail and host.

Author: nachoparker

Humbly sharing things that I find useful [ github dockerhub ]

Leave a Reply

Your email address will not be published. Required fields are marked *