Librecast Multicast Sync Tool (lcsync)

Librecast file and data syncing tool.

Compare data with merkle trees, sync via multicast.

File Syncing

Data is compared by generating a merkle tree using blake3 hashes.

For local file syncing we walk the trees and compare the hashes to find which data blocks are different.

To sync remote files, each file is split into blocks and a merkle tree is built by hashing the blocks using BLAKE3. On the sending/server side, this tree is sent on Librecast Channel (IPv6 multicast group) that is formed from the hash of the filename. The receiver/client joins this channel, and receives the tree. If the client already has some data to compare, it builds a merkle tree of the destination file and uses this to quickly compare which blocks differ. It builds a bitmap with this information, and then joins the Channel(s) for the block(s) required which are sent by the server.

There is no unicast communication with the server. There are no requests sent, and the server can sit behind a firewall which is completely closed to inbound TCP and UDP traffic. Instead, the server listens on a raw socket for Multicast Listener Discovery (MLD2) reports. It compares any MLD multicast group JOINs against the index it built on startup and finds matches for file (tree) and blocks. In this way, the server only sends data when at least one client is subscribed. If more clients want to download the data, the server need take no further action. Thus, the load on the server does not change at all, regardless of whether there is one client or a billion, nor is any additional bandwidth used.

Bloom Filters and Timers

lcsync uses an experimental form of MLD triggering. Instead of using linked-lists for tracking multicast groups, as the Linux kernel does, we use something more scalable. There can potentially be 2¹¹² multicast groups in IPv6, so beyond a certain point the O(n) search on a linked-list does not scale.

Early versions of lcsync used SIMD (CPU vector operations) to implement counted bloom filters, as well as we are calling a "bloom timer", which lets us track active multicast groups in O(1) constant time. This works, but has the drawback that even for 0 active groups, CPU usage is constant. The size of the bloom filters can be tuned depending on the expected number of simultaneous groups.

Since lcsync v0.1.0 the bloom timer has been replaced with a simple bloom filter which eliminates the need for heavy vector operations, and still provides constant time O(1) searches and updates.

Most of lcsync's functionality was merged into Librecast 0.7.0 with the new Librecast Sync API.

Usage

lcshare is the server-side component which indexes files and listens for MLD multicast group joins and sends the file and directory block data.

lcsync is the client-side component which joins multicast groups to receive files and directories.

Serve a a single file:: lcshare [OPTION...] FILENAME
Serve all files below a directory:: lcshare [OPTION...] DIRECTORY
Sync remote file(s) with local:: lcsync [OPTION...] REMOTEFILENAME LOCALFILENAME
Sync two local files (Path required. Can be ./):: lcsync ./LOCALFILE1 ./LOCALFILE2

lcsync assumes source and destination are network addresses unless told otherwise. To refer to a local destination, you must specify the path. For files in the local directory, prefix them with ./ (a remote source can be forced with --remote)

Options

lcshare (server) Options:

--loopback: enable multicast loopback (sending host will receive sent data)

lcsync (client) Options:

-a, --archive: set archive options [-g -o -p -r -t]
-n, --dry-run: don’t copy any data
-g, --group: set group on destination
-o, --owner: set owner on destination
-p, --perms: set permissions on destination
--remote: source path is remote
-t, --times: set modification times on destination

General Options:

-b, --batch: Batch mode. No prompting. NB: if you do not specify a keyfile for encryption, encryption will be disabled.
--bwlimit INTEGER: set send rate limit (bps). An SI prefix of T, G, M or K may be added (eg. --bwlimit 10M)
--hex: print file hashes in hex
-i, --interface INTERFACE: set network interface to use (default: all)
--keyfile KEYFILE: Read symmetric key from keyfile, which must be the path to a file containing a 128 byte random key. This can be created with a command like: `dd if=/dev/random of=keyfile count=1 bs=128`
--loglevel INTEGER: set loglevel
-r, --recursive: recurse into directories
-q, --quiet: shhh - we’re hunting wabbits
-v, --verbose: increase verbosity
-V, --version: display version and exit

Testing

`sudo make net-setup` (`sudo make net-teardown` when finished)

`sudo ip netns exec vnet0 sudo -u `id -un` /bin/bash`

Now we can run `make test` and `sudo make cap` in our test namespace.

Code

Code is available on Codeberg.

License

GPLv2 or (at your option) GPLv3