Librecast Multicast Sync Tool (lcsync)

Librecast file and data syncing tool.

Compare data with merkle trees, sync via multicast.

File Syncing

Data is compared by generating a merkle tree using blake2s hashes.

For local file syncing we walk the trees and compare the hashes to find which data blocks are different.

To sync remote files, each file is split into blocks and a merkle tree is built by hashing the blocks using BLAKE2S. On the sending/server side, this tree is sent on Librecast Channel (IPv6 multicast group) that is formed from the hash of the filename. The receiver/client joins this channel, and receives the tree. If the client already has some data to compare, it builds a merkle tree of the destination file and uses this to quickly compare which blocks differ. It builds a bitmap with this information, and then joins the Channel(s) for the block(s) required which are sent by the server.

There is no unicast communication with the server. There are no requests sent, and the server can sit behind a firewall which is completely closed to inbound TCP and UDP traffic. Instead, the server listens on a raw socket for Multicast Listener Discovery (MLD2) reports. It compares any MLD multicast group JOINs against the index it built on startup and finds matches for file (tree) and blocks. In this way, the server only sends data when at least one client is subscribed. If more clients want to download the data, the server need take no further action. Thus, the load on the server does not change at all, regardless of whether there is one client or a billion, nor is any additional bandwidth used.

lcsync uses an experimental form of MLD triggering. Instead of using linked-lists for tracking multicast groups, as the Linux kernel does, I wanted to test something more scalable. There can potentially be 2112 multicast groups in IPv6, so beyond a certain point the O(n) search on a linked-list does not scale. lcsync uses SIMD (CPU vector operations) to implement counted bloom filters, as well as what I'm calling a "bloom timer", which lets us track active multicast groups in O(1) constant time. This works, but has the drawback that even for 0 active groups, CPU usage is constant. The size of the bloom filters can be tuned depending on the expected number of simultaneous groups. It really only makes sense to use this approach for a large number or groups. For smaller numbers of groups, a binary tree or even a linked-list such as the Linux kernel uses is more appropriate. The option to use a simpler structure will be added in a future release.

Usage

Syncing local files:

`lcsync source destination`

lcsync assumes source and destination are network addresses unless told otherwise. To refer to a local destination, you must specify the path. For files in the local directory, prefix them with ./

Fetch remote file, save as localfile in current directory:

`lcsync remote ./localfile`

or

`lcsync share/dir/file ./localfile`

The following command fetches share/dir/file from the network and saves it as /tmp/oot/file:

lcsync share/dir/file /tmp/oot/

Serve local file:

`lcsync /path/to/localfile`

Serve local directory files. lcsync will index and serve all files under the source directory:

lcsync /path/to/files/

Testing

`sudo make net-setup` (`sudo make net-teardown` when finished)

`sudo ip netns exec vnet0 sudo -u `id -un` /bin/bash`

Now we can run `make test` and `sudo make cap` in our test namespace.

Code

Code is available on Codeberg.

Packaging status

License

GPLv2 or (at your option) GPLv3