Librecast file and data syncing tool.
Compare data with merkle trees, sync via multicast.
Data is compared by generating a merkle tree using blake3 hashes.
For local file syncing we walk the trees and compare the hashes to find which data blocks are different.
To sync remote files, each file is split into blocks and a merkle tree is built by hashing the blocks using BLAKE3. On the sending/server side, this tree is sent on Librecast Channel (IPv6 multicast group) that is formed from the hash of the filename. The receiver/client joins this channel, and receives the tree. If the client already has some data to compare, it builds a merkle tree of the destination file and uses this to quickly compare which blocks differ. It builds a bitmap with this information, and then joins the Channel(s) for the block(s) required which are sent by the server.
There is no unicast communication with the server. There are no requests sent, and the server can sit behind a firewall which is completely closed to inbound TCP and UDP traffic. Instead, the server listens on a raw socket for Multicast Listener Discovery (MLD2) reports. It compares any MLD multicast group JOINs against the index it built on startup and finds matches for file (tree) and blocks. In this way, the server only sends data when at least one client is subscribed. If more clients want to download the data, the server need take no further action. Thus, the load on the server does not change at all, regardless of whether there is one client or a billion, nor is any additional bandwidth used.
lcsync uses an experimental form of MLD triggering. Instead of using linked-lists for tracking multicast groups, as the Linux kernel does, we use something more scalable. There can potentially be 2112 multicast groups in IPv6, so beyond a certain point the O(n) search on a linked-list does not scale.
Early versions of lcsync used SIMD (CPU vector operations) to implement counted bloom filters, as well as we are calling a "bloom timer", which lets us track active multicast groups in O(1) constant time. This works, but has the drawback that even for 0 active groups, CPU usage is constant. The size of the bloom filters can be tuned depending on the expected number of simultaneous groups.
Since lcsync v0.1.0 the bloom timer has been replaced with a simple bloom filter which eliminates the need for heavy vector operations, and still provides constant time O(1) searches and updates.
Most of lcsync's functionality was merged into Librecast 0.7.0 with the new Librecast Sync API.
lcshare is the server-side component which indexes files and listens for MLD multicast group joins and sends the file and directory block data.
lcsync is the client-side component which joins multicast groups to receive files and directories.
lcshare [OPTION...] FILENAME
lcshare [OPTION...] DIRECTORY
lcsync [OPTION...] REMOTEFILENAME LOCALFILENAME
lcsync ./LOCALFILE1 ./LOCALFILE2
lcsync assumes source and destination are network addresses unless told otherwise. To refer to a local destination, you must specify the path. For files in the local directory, prefix them with ./
(a remote source can be forced with --remote
)
--loopback
-a, --archive
-n, --dry-run
-g, --group
-o, --owner
-p, --perms
--remote
-t, --times
-b, --batch
--bwlimit INTEGER
--hex
-i, --interface INTERFACE
--keyfile KEYFILE
`dd if=/dev/random of=keyfile count=1 bs=128`
--loglevel INTEGER
-r, --recursive
-q, --quiet
-v, --verbose
-V, --version
`sudo make net-setup`
(`sudo make net-teardown`
when finished)
`sudo ip netns exec vnet0 sudo -u `id -un` /bin/bash`
Now we can run `make test`
and `sudo make cap`
in our test namespace.
Code is available on Codeberg.
GPLv2 or (at your option) GPLv3