btrfs, linux

How to recover a BTRFS partition

Disclaimer: I take no responsibility for any data loss resulting from following this guide. Try to understand what you are doing at all times and don’t copy-paste commands without reading carefully.

BTRFS is a great filesystem for holding data. It is a modern copy-on-write filesystem featuring enhanced data integrity mechanisms to avoid silent data corruption, snapshots that can be synchronized remotely and transparent data compression.

As with any other COW filesystem, the cost to be paid comes in terms of write performance. Because every write entails also a read and copy operation, COW filesystems perform worse for loads where there are frequent writes, such as some database applications.

The most professional way to be protected against a drive failure is to use RAID, which BTRFS supports natively. This obviously has a monetary price and for many humble architectures a big performance impact. For this reason on this article we will focus on the non RAID case.

When the drive fails

We self hosters and data hoarders sooner or later have to face the situation where we have to recover our data from a hard drive that is failing. All drives fail sooner or later.

This can be a dramatic situation, definitely not fun at all. BTRFS is complex and the restoration process can be confusing. It is important to keep calm, try not to get frustrated and think that BTRFS tries really hard to avoid losing or corrupting data. If BTRFS doesn’t let us mount the partition or if the system becomes read only is for a good reason.

Normally the situation is produced by hardware failure, so our first goal is to get the data out as soon as we can before the drive completely dies.

Depending on the scope of the damage we can be in different situations

  • If the data itself is corrupted, parts of the files will have bad or inaccessible data. If this is the case, we will see checksum errors in the log. To some extent, BTRFS is capable of recovering some of the information due to its inherent redundancy, we can do this by scrubbing.
[ 1901.435050] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 11382, gen 0
[ 1901.435062] BTRFS error (device sda1): unable to fixup (regular) error at logical 13787648000 on dev /dev/sda1
  • If the journal is inconsistent, BTRFS will not let us mount the drive with write permissions. This situation occurs whenever a write operation is interrupted and the journal log doesn’t have enough information to guarantee data integrity. This could be OK or very bad depending on what data we are talking about. If the hardware is not failing but we just had an unfortunate power cut, we have the option of accepting the data loss. In order to throw away the incomplete transactions we use btrfs rescue zero-log.
parent transid verify failed on 31302336512 wanted 62455 found 62456
parent transid verify failed on 31302336512 wanted 62455 found 62456
parent transid verify failed on 31302336512 wanted 62455 found 62456
  • In case that the superblock is affected, we will not be able to mount at all. The superblock is the root of the filesystem tree and contains the information that the operating system needs to mount the unit. This is normally easy to fix because BTRFS saves extra copies at different locations, so we might be able to mount using a copy of a superblock, or even run btrfs rescue super-recover to try to fix it.
[11152.189762] BTRFS: failed to read chunk tree on sda
[11152.196224] BTRFS: open_ctree failed 
  • Also we could be in the situation where the filesystem metadata is damaged even though the actual data might be intact. This means that BTRFS doesn’t know about the existence of files or parts of files so we have no way to access them. In this scenario, if we react quickly, we can scan the whole disk and try to rebuild the filesystem metadata tree with btrfs rescue chunk-recover. Needless to say this is both risky and very slow.
[19409.487603] BTRFS error (device sda1): bad tree block start 11106478115207782198 875249664
[19412.395884] BTRFS error (device sda1): bad tree block start 11106478115207782198 875249664

Let us see what would be a general procedure.

Be prepared

Rule zero is of course to have backups. This will allow us to sleep well at night and handle a bad drive situation with a much cooler head. I can’t stress this enough: have at least three copies in two different locations. Everything will be easier and less stressful when a drive fails, which will happen.

Then, rule number one is to monitor your hard drive’s health. This is also critical because normally you will get the warning at least 24 or 48 hours before total failure so you have a good chance of getting your data out of there before it is too late. Hard drives don’t completely fail from one day to the other but we need to pay attention to them.

SMART errors in NextCloudPi

If you do not have a backup or you want to try to rescue data that was modified after the time of the last backup, keep reading.

Procedure if the drive can be mounted

So you received the warning, the first thing to unmount the drive and turn it off before it keeps degrading. Then, look for this post and study your strategy before starting the following steps. Your drive might have just a few hours left.

First, mount your drive in read only mode, and try to copy your data out normally. If it is the root filesystem, boot from a live CD and proceed to repair from there. Keep a terminal open with the kernel logs dmesg -w, and watch out for errors like this

[386229.214384] BTRFS warning (device sda1): sda1 checksum verify failed on 568344576 wanted B77C6306 found D884C20D level 0
[386229.223445] BTRFS warning (device sda1): sda1 checksum verify failed on 568344576 wanted B77C6306 found BB7B11CF level 0

Some files might fail copying if they are in a damaged section, or some might appear to copy fine but throw errors in the kernel logs. Copy your main folders one by one. Try to copy first the most valuable stuff and take notes of what might be corrupted from the information in the logs.

Next, let’s try to repair as much as we can. For this we will first try scrubbing with btrfs scrub. This is will check for data integrity using checksums and will try to recover the damaged data. Scrubbing is considered safe and is usually the first thing to try.

Run it with

# btrfs scrub start /mnt

, and follow the progress with

# btrfs scrub status /mnt

This typically takes a couple hours

This will fix as much as it can, but might not be able to fix all the issues. Try again to copy to safety whatever had errors before. Probably we will have fixed many files and maybe even all of them.

If you get errors by inode in the logs like this

[ 5488.731343] BTRFS warning (device sda1): csum failed root 5 ino 40913 off 28815360 csum 0xf4702fd5 expected csum 0xfe7c816f mirror 1
[ 5488.731830] BTRFS warning (device sda1): csum failed root 5 ino 40913 off 28815360 csum 0xf4702fd5 expected csum 0xfe7c816f mirror 1
[ 5488.732189] BTRFS warning (device sda1): csum failed root 5 ino 40913 off 28815360 csum 0xf4702fd5 expected csum 0xfe7c816f mirror 1

, you can see what file it corresponds to if the metadata is not damaged with

# btrfs inspect-internal inode-resolve 40913 /mnt

If it does not mount

If the superblocks are damaged the partition will not mount. Try first to fix the partition as above with btrfs scrub, then you might be able to mount it and proceed with the steps above.
In the case scrubbing was not enough, we can try to use a backup of the root tree in read only mode, which doesn’t alter the data and is completely safe.

# mount -o usebackuproot /dev/sdXY /mnt

Try to save as much as you can

Those steps normally work well enough. In case the above didn’t suffice, then there is no completely safe way of getting the data back. We have to try to get as much as we can and keep in mind that what we recover could very well be corrupted at least partially.

The best thing to do at this point is to run btrfs restore

# btrfs restore /dev/sdXY /mnt/

This is completely safe and will try its best to mount a read only version of the data in /mnt that is as sound as possible. For instance, a file could be mounted without integrity errors but be in an old version from an old snapshot. Still worth trying to get out this information before trying out potentially destructive tools.

If we are still not able to mount normally, we can now run btrfs rescue super-recover, which will try to restore the superblock from a good copy. This is not completely safe.

As mentioned before, if your metadata was corrupt there is a chance that files or part of files that are not damaged are not seen by the filesystem. In this scenario, we can use btrfs rescue chunk-recover /dev/sdXY to scan the whole drive contents and try to rebuild the metadata trees. This will take very long specially for big drives, and could result in some of the data being wrongly restored.

The absolute last resort

We are used to doing fsck or filesystem check as soon as we see something weird on ext4. Well, don’t do this in BTRFS. btrfs check should be the last resort as it will try hard to restore the filesystem and there is a very high chance that it will make things worse.

While the above commands will very rarely cause any more damage, and some such as scrubbing or restore are totally safe, this will very likely mess things up. We have to understand this before following our ext4 instincts.

# btrfs check --repair /dev/sdXY

Verify your copies

After moving all the data to a safe place, we will probably want to compare it with our backups to see what information is missing from the backup.

In general, the information in our backups will be more trustworthy than the one we tried to save from the failing drive, so ideally we only want to update the backup with the new data that was added or modified since the last copy.

In order to do this, the following commands will be handy. The first command will only compare file names

$ rsync --dry-run -ri --delete --ignore-existing /copy/ /old-backup/

, and this one will compare the checksum of each file in both folders

$ rsync --dry-run -ri --delete --checksum /copy/ /old-backup/

Naturally the latter can take a while. Neither of those commands modify any actual data, if you want to proceed, remove the --dry-run parameter.


I hope this post helps understand what a good strategy might look like and make sense of the variety of tools and options for recovery that BTRFS has to offer.

Author: nachoparker

Humbly sharing things that I find useful [ github dockerhub ]


  1. Thanks for your very good article. Unfortunately I found this too late. I already restored my broken btrfs from a backup.

    I saw that you recommend to run “btrfs scrub” if you can’t mount the fs. But scrub only runs on a mounted filesystem. At least on my Debian unstable with Kernel 4.19 and btrfs-progs 4.20.2

    1. I have the same problem as you, the disk is not mountable so I cannot use btrfs scrub.
      Any thought about that?

      thanks for the help

  2. I was in a panic and found this article. Your suggestion fixed what I thought was the end of my partition:
    btrfs rescue zero-log

  3. I found this article in the last moment before to reformat my disk.
    btrfs restore /dev/sdXY /mnt/ Worked for me !! Great.
    Now I have doubts about how to copy /mnt to the original hard disk.
    Can you help me?
    Thanks, You saved me more than 2 months of work.

  4. Wow! This should be documented as part of btrfs manual. I would like to ask a condition, though. In my current case, there’s no error when scrub or check –repair, I can even balance perfectly. However on every mount the corrupt value is never 0, while wr, rd, flush do (gen also not 0 sometimes, but it’s of very small value under 10). Is that a signal for an actual problem or something I can just ignore? Thanks.

  5. Thank you a lot.
    In my case ‘mount -o usebackuproot’ helped resolve my problem with unmountable file system.
    I could not fix it with any of ‘btrfs check’ and ‘btrfs rescue’ options.
    Upgrade of btrfs-progs from 4.15.1 to 5.1 (Ubuntu 18.04 Bionic to 19.10 Eoan) also did not help.
    After the umount, the system had become mountable in usual way, and I reinserted this ‘crashed’ volume back into Synology DSM which had happily seen it green again.

  6. I am experimenting with btrfs on a bunch of loop devices for two hours now. After few mounts with -o degraded, scrub showed few thousands of ‘uncorrectable errors’. Guys – you should definitely try ZFS instead. Last week I finally replaced two failing disks of two-way mirror setup with over a thousand corrected errors on both disks, with no data loss at all.

    And for massive storage I use LizardFS with chunkservers backed by ZFS storage, with ec(4,2) for personal data and xor5 for other (non-critical) data.

    Please keep in mind that Btrfs was originally an Oracle filesystem, developed to compete with ZFS.

    Take care,

  7. on windows i had btrfs support installed and I used mini tool partition manager, it came up and said gpt partition was bad on the btrfs drive and I said to fix it. but this caused the btrfs drive to be no longer bootable and i get errors
    parent transitid failed on. ….. wanted… found…
    ignore transid failure
    couldnt setup extent tree
    error could not open ctree

    What ever happened the dirve is no longer mountable strange thing is it was working fine before. im not sure what changed there are no physical errors on drive.
    I have tried every methoid of recovery i can find. Just wanted to give a heads of if you connect you btrfs to windos machine and get a error on gpt from some partition program in windows DO NOT let it repair it.

  8. Thanks a lot! My system crashed and one of the partitions got damaged. With ‘btrfs restore /dev/sdaX /MOUNTPOINT’ I got my data back 🙂

  9. Hi, I tried to mount a Synology RAID btrfs. No luck, it was complaining about the root having wrong mask. But I used UFS Explorer from Windows and it could read the RAID. This is not advertisement, it really worked for me.

  10. Thank you very much for these tips
    btrfs restore -msi /dev /target worked perfecly for me

    It even corrected the metadata after forgetting the flag the first time…

  11. Thanks you very much! My hard drive started to malfunction and I was able to restore almost all files, with the exception of a few that I chose not to restore and a few others (non-important video files) that remained corrupted.

    The method I used was btrfs restore. Now I shame on me I will acquire two disks to keep in raid using btrfs. BTRFS is awesome!

  12. btrfs repair zero-log is not for parent transid verify failed errors. It is for log replay errors only. See the man page for btrfs-rescue for an example of when to use zero-log.

    log replay errors cannot be caused by power failures. btrfs checks the integrity of the log tree before reading it, so any power-failure-related problems will be eliminated (either corrected from a mirror copy, or the mount will abort) before anything is read from the log tree. A log tree failure can only occur as the result of a kernel bug or host RAM failure.

    “parent transid verify failed” is usually caused by disk firmware issues, but it can also be an early disk failure indicator. It generally means there are serious data integrity problems with your storage devices and those should be addressed before trying to repair the filesystem. Failing hardware should be replaced, drive firmware upgraded, or write cache disabled at the drive with hdparm -W0. The error can also be caused by bugs in lower-level storage software or firmware (like a VM image file or a hardware RAID controller), and it indicates the lower-level storage is misconfigured (violates write ordering rules). It is not always possible to recover btrfs from this failure with current tools, but the tools to try in this case are ‘btrfs check –repair’ and if that doesn’t work, ‘btrfs check –repair –init-extent-tree’ (and if that doesn’t work, mkfs and start over).

    ‘btrfs repair chunk-recover’ is used for only one very specific error case:

    [61356.205444] BTRFS critical (device sdd): unable to find logical 457140613120 len 4096
    [61356.207007] BTRFS critical (device sdd): found a bad mapping, wanted 457140213120-457140217216, found

    If you get other errors it is likely you do _not_ need chunk-recover. Also, the bugs that lead to chunk-recover were fixed in 2014. It is not a tool you will likely need today, unless you have a machine with no kernel updates for 7 years.

  13. One question: I cannot mount the partition and btrfs scrub returns an error saying that the device” is not a mounted btrfs device”. So it seems the scrub option isn’t working on unmounted partitions/devices. Why then do you recommand to test scrubbing if the devices doesn’t mount? Am’I getting something wrong? Otherwise, I tried to use “mount -o usebackuproot” but the dmesg log says it is deprecated and that “rescue=usebackuproot” instead. I guess it is an option of “mount” but couldn’t find it. Tried anyway and didn’t work either anyway…

  14. One more question: what about btrfs-zero-log? At what point would you use it? Before or after btrfs restore? Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *