btrfs, linux, OYB software, shell

Easy sync BTRFS snapshots with btrfs-sync

To complement the last BTRFS tool btrfs-snp (which allows us to schedule snapshots), I would like to share a new tool to synchronize them locally or remotely  to achieve efficient data redundancy.

With btrfs-snp we can replicate our BTRFS snapshots in a different BTRFS system, and have a second copy of our versioned subvolume in a much more efficient manner than using the traditional rsync.

Features

  • Local or remote sync through SSH
  • Simple syntax
  • Progress indication
  • Support for xz or pbzip2 compression in order to save bandwidth
  • Retention policy
  • Automatic incremental synchronization
  • Cron friendly

Usage

The syntax is similar to that of scp

Usage:
  btrfs-sync [options] <src> [<src>...] [[user@]host:]<dir>

  -k|--keep NUM     keep only last <NUM> sync'ed snapshots
  -d|--delete       delete snapshots in <dst> that don't exist in <src>
  -z|--xz           use xz     compression. Saves bandwidth, but uses one CPU
  -Z|--pbzip2       use pbzip2 compression. Saves bandwidth, but uses all CPUs
  -q|--quiet        don't display progress
  -v|--verbose      display more information
  -h|--help         show usage

<src> can either be a single snapshot, or a folder containing snapshots
<user> requires privileged permissions at <host> for the 'btrfs' command

Examples

Manual

Synchronize snapshots of home to a USB drive

# btrfs-sync /home/user/.snapshots /media/USBdrive/home-snapshots

Synchronize snapshots of home to a USB drive in another machine

# btrfs-sync /home/user/.snapshots user@server:/media/USBdrive/home-snapshots

Synchronize one snapshot of home to a USB drive in another machine

# btrfs-sync /home/user/.snapshots/monthly_2018-02-08_200102 user@server:/media/USBdrive/home-snapshots

Synchronize only monthly snapshots of home to a USB drive in another machine

# btrfs-sync /home/user/.snapshots/monthly_* user@server:/media/USBdrive/home-snapshots

Use –verbose  to get more details

# btrfs-sync --verbose --delete /home/user/.snapshots user@server:/media/USBdrive/home-snapshots
* Skip existing '/home/user/.snapshots/monthly_2018-01-09_200102'
* Skip existing '/home/user/.snapshots/monthly_2018-02-08_200102'
* Skip existing '/home/user/.snapshots/weekly_2018-02-09_140102'
* Skip existing '/home/user/.snapshots/weekly_2018-02-16_150102'
* Skip existing '/home/user/.snapshots/weekly_2018-02-23_150102'
* Skip existing '/home/user/.snapshots/weekly_2018-03-02_180102'
* Skip existing '/home/user/.snapshots/daily_2018-03-03_000101'
* Skip existing '/home/user/.snapshots/daily_2018-03-04_080101'
* Skip existing '/home/user/.snapshots/daily_2018-03-05_100102'
* Skip existing '/home/user/.snapshots/daily_2018-03-06_100102'
* Skip existing '/home/user/.snapshots/daily_2018-03-07_110102'
* Synchronizing '/home/user/.snapshots/hourly_2018-03-08_090101' using seed '.snapshots/hourly_2018-03-07_090101'...
time elapsed [0:00:24] | rate [11.1MiB/s] | total size [ 132MiB]
* Synchronizing '/home/user/.snapshots/hourly_2018-03-08_100101' using seed '.snapshots/hourly_2018-03-09_090101'...
time elapsed [0:01:05] | rate [11.1MiB/s] | total size [ 275MiB]
* Deleting non existent snapshots...
Delete subvolume (no-commit): '/media/USBdrive/home-snapshots/hourly_2018-03-08_090101'
Delete subvolume (no-commit): '/media/USBdrive/home-snapshots/hourly_2018-03-08_100101'
Cron

Daily synchronization over the internet, keep only last 50

cat > /etc/cron.daily/btrfs-sync <<EOF
#!/bin/bash
/usr/local/sbin/btrfs-sync --quiet --keep 50 --xz /home user@host:/path/to/snaps
EOF
chmod +x /etc/cron.daily/btrfs-sync

Daily synchronization in LAN, mirror snapshot directory

cat > /etc/cron.daily/btrfs-sync <<EOF
#!/bin/bash
/usr/local/sbin/btrfs-sync --quiet --delete /home user@host:/path/to/snaps
EOF
chmod +x /etc/cron.daily/btrfs-sync

Installation

Get the script and make it executable. You can do this in two lines, but better inspect it first. Don’t trust anyone blindly.

sudo wget https://raw.githubusercontent.com/nachoparker/btrfs-sync/master/btrfs-sync -O /usr/local/sbin/btrfs-sync
sudo chmod +x /usr/local/sbin/btrfs-sync

It is recommended to set up a designated user for receiving the snapshots that has sudoers access to the btrfs  command.

  • Create a btrfs user at the both ends
$ sudo adduser btrfs
  • Create a public key in your sending machine
$ sudo -u btrfs ssh-keygen
  • Give passwordless access to the btrfs user at the remote machine.
$ sudo -u btrfs ssh-copy-id btrfs@<ip>
  • Give permissions to the btrfs user to use the btrfs on both ends. Create a file
# visudo /etc/sudoers.d/90_btrfs-sync

with the contents

btrfs ALL=(root:nobody) NOPASSWD:NOEXEC: /bin/btrfs

If you want to run it from cron, you might have to install it first because some distributions have already completely replaced it by systemd timers.

This was the case for me in Arch Linux. In my case, I installed cronie.

cronie logs the output to the system log by default, but you can set an email system if you want old style cron mails.

Also, note that you can use chronic if you only want logging to occur only if something goes wrong.

Comparison with rsync

The main difference between these two methods is that BTRFS works at the block level, whereas rsync works at the file level.

Because rsync works with files, it will not detect things such as renaming or moving a file, so it can only send it again. Also, it needs to analyze the existing files at the destination to see if they have been updated or not.

In order to achieve this, it can either analyze modification dates and sizes, which is relatively fast, or compare checksum of file chunks at both ends which is more robust but slower. In any case, there will be a significant processing overhead when you are synchronizating a whole partition with many thousands of files.

A plus of this approach is that you are able to exclude certain files or folders, where BTRFS works by subvolumes in an all or nothing fashion.

BTRFS on the other hand understands blocks, and because it is a COW filesystem, it already knows what bytes have changed between a snapshot and the next. If we renamed the file, only some tiny metadata has changed, and BTRFS knows that we don’t need to transfer the whole file again, only those few bytes.

The same happens when a big file changes internally, such as an image file for a virtual machine where we have been working.

This is the same reason why snapshots in COW filesystems are so space efficient, allowing us to create instant safety copies of huge volumes that only takes extra space as we change the files in them.

Obviously a drawback is that you need a BTRFS filesystem at both ends, but why would we stick to an old generation filesystem where we now have more modern and featureful ones?

 

Code

#!/bin/bash

#
# Simple script that synchronizes BTRFS snapshots locally or through SSH.
# Features compression, retention policy and automatic incremental sync
#
# Usage:
#  btrfs-sync [options] <src> [<src>...] [[user@]host:]<dir>
#
#  -k|--keep NUM     keep only last <NUM> sync'ed snapshots
#  -d|--delete       delete snapshots in <dst> that don't exist in <src>
#  -z|--xz           use xz     compression. Saves bandwidth, but uses one CPU
#  -Z|--pbzip2       use pbzip2 compression. Saves bandwidth, but uses all CPUs
#  -q|--quiet        don't display progress
#  -v|--verbose      display more information
#  -h|--help         show usage
#
# <src> can either be a single snapshot, or a folder containing snapshots
# <user> requires privileged permissions at <host> for the 'btrfs' command
#
# Cron example: daily synchronization over the internet, keep only last 50
#
# cat > /etc/cron.daily/btrfs-sync <<EOF
# #!/bin/bash
# /usr/local/sbin/btrfs-sync -q -k50 -z /home user@host:/path/to/snaps
# EOF
# chmod +x /etc/cron.daily/btrfs-sync
#
# Copyleft 2018 by Ignacio Nunez Hernanz <nacho _a_t_ ownyourbits _d_o_t_ com>
# GPL licensed (see end of file) * Use at your own risk!
#
# More at https://ownyourbits.com
#

# help
print_usage() {
  echo "Usage: 
  $BIN [options] <src> [<src>...] [[user@]host:]<dir>

  -k|--keep NUM     keep only last <NUM> sync'ed snapshots
  -d|--delete       delete snapshots in <dst> that don't exist in <src>
  -z|--xz           use xz     compression. Saves bandwidth, but uses one CPU
  -Z|--pbzip2       use pbzip2 compression. Saves bandwidth, but uses all CPUs
  -q|--quiet        don't display progress
  -v|--verbose      display more information
  -h|--help         show usage

<src> can either be a single snapshot, or a folder containing snapshots
<user> requires privileged permissions at <host> for the 'btrfs' command

Cron example: daily synchronization over the internet, keep only last 50

cat > /etc/cron.daily/btrfs-sync <<EOF
#!/bin/bash
/usr/local/sbin/btrfs-sync -q -k50 -z /home user@host:/path/to/snaps
EOF
chmod +x /etc/cron.daily/btrfs-sync
"
}

echov() { [[ "$VERBOSE" == 1 ]] && echo "$@" || true; }

#----------------------------------------------------------------------------------------------------------

# parse arguments
BIN="${0##*/}"
KEEP=0
ZIP=cat PIZ=cat
SILENT=">/dev/null"

OPTS=$( getopt -o hqzZk:dv -l quiet -l help -l xz -l pbzip2 -l keep: -l delete -l verbose -- "$@" 2>/dev/null )
[[ $? -ne 0 ]] && { echo "error parsing arguments"; exit 1; }
eval set -- "$OPTS"

while true; do
  case "$1" in
    -h|--help   ) print_usage; exit  0 ;;
    -q|--quiet  ) QUIET=1    ; shift 1 ;;
    -d|--delete ) DELETE=1   ; shift 1 ;;
    -k|--keep   ) KEEP=$2    ; shift 2 ;;
    -z|--xz     ) ZIP=xz     PIZ=( xz     -d ); shift 1 ;;
    -Z|--pbzip2 ) ZIP=pbzip2 PIZ=( pbzip2 -d ); shift 1 ;;
    -v|--verbose) SILENT=""  VERBOSE=1        ; shift 1 ;;
    --)                shift;  break   ;;
  esac
done

SRC=( "${@:1:$#-1}" )
DST="${@: -1}"

# detect remote dst argument
[[ "$DST" =~ : ]] && {
  NET="$( sed 's|:.*||' <<<"$DST" )"
  DST="$( sed 's|.*:||' <<<"$DST" )"
  SSH=( ssh -o ServerAliveInterval=5 -o ConnectTimeout=1 "$NET" )
}
[[ "$SSH" != "" ]] && DST_CMD=( ${SSH[@]} ) || DST_CMD=( eval )

#----------------------------------------------------------------------------------------------------------

# checks

## general checks
[[ $# -lt 2      ]]            && { print_usage                                ; exit 1; }
[[ ${EUID} -ne 0 ]]            && { echo "Must be run as root. Try 'sudo $BIN'"; exit 1; }
${DST_CMD[@]} true &>/dev/null || { echo "SSH access error to $NET"            ; exit 1; }

## src checks
while read entry; do SRCS+=( "$entry" ); done < <( 
  for s in "${SRC[@]}"; do
    src="$(cd "$s" &>/dev/null && pwd)" || { echo "$s not found"; exit 1; } #abspath
    btrfs subvolume show "$src" &>/dev/null && echo "0|$src" || \
    for dir in $( ls -drt "$src"/* 2>/dev/null ); do
      btrfs subvolume show "$dir" &>/dev/null || continue
      DATE="$( btrfs su sh "$dir" | grep "Creation time:" | awk '{ print $3, $4 }' )"
      SECS=$( date -d "$DATE" +"%s" )
      echo "$SECS|$dir"
    done
  done | sort -V | sed 's=.*|=='
)
[[ ${#SRCS[@]} -eq 0 ]] && { echo "no BTRFS subvolumes found"; exit 1; }

## check pbzip2
[[ "$ZIP" == "pbzip2" ]] && {
                    type pbzip2 &>/dev/null && \
    "${DST_CMD[@]}" type pbzip2 &>/dev/null || {
      echo "INFO: 'pbzip2' not installed on both ends, fallback to 'xz'"
      ZIP=xz PIZ=unxz 
  }
}

## use 'pv' command if available
PV=( pv -F"time elapsed [%t] | rate %r | total size [%b]" )
[[ "$QUIET" == "1" ]] && PV=( cat ) || type pv &>/dev/null || {
  echo "INFO: install the 'pv' package in order to get a progress indicator"
  PV=( cat )
}

#----------------------------------------------------------------------------------------------------------

# sync snapshots

## get dst snapshots ( DSTS, DST_UUIDS )
get_dst_snapshots() {
  local DST="$1"
  unset DSTS DST_UUIDS
  while read entry; do
    DST_UUIDS+=( "$( sed 's=|.*==' <<<"$entry" )" )
    DSTS+=(      "$( sed 's=.*|==' <<<"$entry" )" )
  done < <( 
    "${DST_CMD[@]}" "
      DSTS=( \$( ls -d \"$DST\"/* 2>/dev/null ) )
      for dst in \${DSTS[@]}; do
        UUID=\$( sudo btrfs su sh \"\$dst\" 2>/dev/null | grep 'Received UUID' | awk '{ print \$3 }' )
        [[ \"\$UUID\" == \"-\" ]] || [[ \"\$UUID\" == \"\" ]] && continue
        echo \"\$UUID|\$dst\"
      done" 
  )
}

## sync incrementally
sync_snapshot() {
  local SRC="$1"
  local ID LIST PATH_ DATE SECS SEED SEED_PATH SEED_ARG

  # detect existing
  SRC_UUID=$( btrfs su sh "$SRC" | grep "UUID:" | head -1 | awk '{ print $2 }' )
  for id in "${DST_UUIDS[@]}"; do
    [[ "$SRC_UUID" == "$id" ]] && { echov "* Skip existing '$SRC'"; return 0; }
  done

  # try to get most recent src snapshot that exists in dst to use as a seed
  LIST="$( btrfs subvolume list -su "$SRC" )"
  SEED=$( 
    for id in "${DST_UUIDS[@]}"; do
      ID=$(btrfs su sh -u "$id" "$SRC" 2>/dev/null|grep "UUID:"|head -1|awk '{print $2}')
      PATH_=$( awk "{ if ( \$14 == \"$ID\" ) print \$16       }" <<<"$LIST" )
      DATE=$(  awk "{ if ( \$14 == \"$ID\" ) print \$11, \$12 }" <<<"$LIST" )

      [[ "$ID" == "" ]] || [[ "$PATH_" == "$( basename "$SRC" )" ]] && continue

      SECS=$( date -d "$DATE" +"%s" )
      echo "$SECS|$PATH_"
    done | sort -V | tail -1 | cut -f2 -d'|'
  )

  # incremental sync argument
  [[ "$SEED" != "" ]] && {
    SEED_PATH="$( dirname "$SRC" )/$( basename $SEED )"
    [[ -d "$SEED_PATH" ]] && 
      SEED_ARG=( -p "$SEED_PATH" ) || \
      echo "INFO: couldn't find $SEED_PATH. Non-incremental mode"
  }

  # do it
  echo -n "* Synchronizing '$src'"
  [[ "$SEED_ARG" != "" ]] && echov -n " using seed '$SEED'"
  echo "..."

  { btrfs send -q ${SEED_ARG[@]} "$SRC" \
    | "$ZIP" \
    | "${PV[@]}" \
    | "${DST_CMD[@]}" "${PIZ[@]} | sudo btrfs receive \"$DST\" 2>&1 | grep -v '^At subvol '" \
    || exit 1; } | grep -v "^At snapshot "
  get_dst_snapshots "$DST" # sets DSTS DST_UUIDS
}

#----------------------------------------------------------------------------------------------------------

# sync all snapshots found in src
get_dst_snapshots "$DST" # sets DSTS DST_UUIDS
for src in "${SRCS[@]}"; do
  sync_snapshot "$src"
done

#----------------------------------------------------------------------------------------------------------

# retention policy
[[ "$KEEP" != 0 ]] && \
  [[ ${#DSTS[@]} -gt $KEEP ]] && \
  echov "* Pruning old snapshots..." && \
  for (( i=0; i < $(( ${#DSTS[@]} - KEEP )); i++ )); do
    PRUNE_LIST+=( "${DSTS[$i]}" )
  done && \
  ${DST_CMD[@]} sudo btrfs subvolume delete "${PRUNE_LIST[@]}" $SILENT

# delete flag
[[ "$DELETE" == 1 ]] && \
  for dst in "${DSTS[@]}"; do
    FOUND=0
    for src in "${SRCS[@]}"; do
      [[ "$( basename $src )" == "$( basename $dst )" ]] && { FOUND=1; break; }
    done
    [[ "$FOUND" == 0 ]] && DEL_LIST+=( "$dst" )
  done
[[ "$DEL_LIST" != "" ]] && \
  echov "* Deleting non existent snapshots..." && \
  ${DST_CMD[@]} sudo btrfs subvolume delete "${DEL_LIST[@]}" $SILENT

# License
#
# This script is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This script is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this script; if not, write to the
# Free Software Foundation, Inc., 59 Temple Place, Suite 330,

Author: nachoparker

Humbly sharing things that I find useful [ github dockerhub ]

8 Comments on “Easy sync BTRFS snapshots with btrfs-sync

  1. (removed markdown)

    Hey nacho,
    thanks for a great tool!

    unfortunately I cannot make it work

    created btrfs users on both machines
    to gain passwordless ssh access to the remote with i executed
    ssh-copy-id -i ~/.ssh/id_rsa.pub btrfs@
    as
    sudo -u btrfs ssh-copy-id btrfs@ didn’t work for me.
    Now being able to access ssh btrfs@ without a password and granting btrfs full sudo privileges on remote i try to run
    sudo btrfs-sync –verbose /media/backup_local btrfs@:/media/backup_remote/
    but get
    SSH access error to btrfs@192.168.178.55. Do you have passwordless login setup, and adequate permissions for /media/backup_remote?
    on remote the folder /media/backup_remote is owned by btrfs and permissions are set to 777

    what do I do wrong?
    Local machine: PRETTY_NAME=”Raspbian GNU/Linux 9 (stretch)”
    Remote machine: PRETTY_NAME=”Debian GNU/Linux 10 (buster)” (ncp installed via softy)

    Thanks

  2. Why does the script want to be run via sudo? What is the need when you already have given passwordless sudo rights to the btrfs user on both machines? I thought the precise purpose of setting up the btrfs user was that you don’t have to ssh as root. If the script is run via sudo, the ssh command is of course also executed by root not by btrfs. Thefore one would have to add root’s pubkey to .ssh/authorized_keys on the target machine, not btrfs’ as the ssh-copy-id command above does.
    But again, if the whole script is run via sudo, there is no point in granting sudo rights to the btrfs command to user btrfs. What is the idea here?

Leave a Reply

Your email address will not be published. Required fields are marked *