Skip to content

oschrenk/p2p-sync

Repository files navigation

Folder Sync

There are only two hard things in Computer Science: cache invalidation and naming things – Phil Karlton

Folder Sync is quick solution if you want to sync up two directories in the same network without hassle.

Its really good for quickshot syncs, where one party doesn’t have any files of the other party. Once the service is set up om both parties, and is kept running, the two directories will keep in sync.

It uses hazelcast to do its thing.

Development

Folder Sync uses JDK 7 and the following frameworks:

Prerequisites

  • JDK 7 (the application uses NIO.2) – JDK 7 is still under development, you can get early releases for a few platforms here. If you run Mac OS X – you have to compile it yourself.
  • Ant – To create the distribution

Building the Application

$ git clone git@github.com:oschrenk/p2p-sync.git
$ cd p2p-sync
$ ant

Usage

Running the application

$ java -jar dist/sync.jar /path/to/directory/to/watch

You can specify the port under which the files are offered, by using the -p option

$ java -jar dist/sync.jar -p 49154 /path/to/directory/to/watch

Warning!

There is no interactive mode to create a global state. The first member defines the global state! Let’s say User A is the first, User B comes second; that means:

  • if A has a file in relative/path/file and B doesn’t, it will be created
  • if B (!) has a file in relative/path/file and A doesn’t, it WILL BE DELETED

Implementation

Main Idea

Each user monitors one directory. This directory will be indexed via a local index. Hazelcast also manages a remote index (of relative paths), which represents the global state.

If the user changes/updates/deletes a file the local index changes and a CRUD Push event will be fired. The remote indexes, serving as a “global” entity listens to these events and incorporates the changes. In turn the remote index fires CRUD Pull events. The users listens to these events and connects to the user responsible for the event.

Background

Initialization Phase

The initialization phase takes place when a new member joins the network and it has to pair up with a preexisting member, let’s call him member A, in order to sync the directory being watched. The new member, let’s call him B, hasn’t synchronized before but may already have some files in that directory.

Both members have to exchange their index, to decide what differs between. If both parties communicate the first time they both have to exchange their full index. Later they might just exchange a version number of the index file to decide if changes to the index has been made.

If both parties have access to the current index of their respective partner they can decide what changes are neccesary to get to the current state.

Both parties have to communicate the state of the files in order to decide which state is the correct. Both parties have to acknowledge changes that will be made to the file system, as files may be added, changed or deleted that weren’t supposed to.

Let’s consider the following simple scenario, where we have to sync two directories, where each party holds four different files. Files c and d are the same file, where _1 and _2 indicate different versions, _1 one being the older and _2 being the newer one. If a file is missing in the row, the user doesn’t have the file.

Case User A User B
1 a
2 b b
3 c_1 c_2
4 d_2 d_1
5 e

The first and the fifth, and respectively the third and foruth case are symetrical and we are left with three use cases:

Member A has a file that Member B has not

This case can arise in two situations:

  1. Member A has added the file a and Member B didn’t added the file yet. The solution is to push the file to Member B
  2. Member B deleted the file while being offline. The solution is to delete to file on Member A

As this can’t be decided automatically both members have to acknowledge the changes to the file system.

Member A has a file which Member B also has

In this case we don’t have to do anything.

Member A has a file that is older than the file that Member B has

Again the solution seems to push the changes to Member B. But Member A might have deleted the file while being offline and exchanged it with another with the same name or just an older copy of the same file. The date of the file doesn’t reflect the current state.

The solution is to manually find a state on which both users agree on. This means that each difference has to be acknowledged.

Current solution

Right now there is no interactive mode or ui to define the global state. The first member to connect, defines the global state.

Hot synchronization phase

Once both users have established a state of their files on which they both agree on. Every change to the respective filesytem can be pushed directly to the other particpating members.

Problems

Oscillation

As both indexes work seperately, the local index couldn’t differentiate between user generated changes and changes that were pulled because of CRUD Pull events (and changed the filesystem).

An identifier was needed. The first idea was making use of the last modified field. Unfortunately there were problems setting the datetime via the NIO.2 API.

So I only make use of a SHA-1 hash value of the contents of the file.

Future

  • Setting date/times via Java API
  • Add support of deleting/creating directories (lost because of SHA-1 use)
  • Interactive ConflictResolver
  • support to download from various sources
  • FileInspector to support merges in files (vcards)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published