There are only two hard things in Computer Science: cache invalidation and naming things – Phil Karlton
Folder Sync is quick solution if you want to sync up two directories in the same network without hassle.
Its really good for quickshot syncs, where one party doesn’t have any files of the other party. Once the service is set up om both parties, and is kept running, the two directories will keep in sync.
It uses hazelcast to do its thing.
Folder Sync uses JDK 7 and the following frameworks:
- Commons Logging
- Grizzly HTTP Webserver 1.9.18
- Hazelcast 1.8.2
- HttpClient 4.0.1
- JewelCLI 0.6
- Log4J 1.2.15
- JDK 7 (the application uses NIO.2) – JDK 7 is still under development, you can get early releases for a few platforms here. If you run Mac OS X – you have to compile it yourself.
- Ant – To create the distribution
$ git clone git@github.com:oschrenk/p2p-sync.git
$ cd p2p-sync
$ ant
$ java -jar dist/sync.jar /path/to/directory/to/watch
You can specify the port under which the files are offered, by using the -p
option
$ java -jar dist/sync.jar -p 49154 /path/to/directory/to/watch
There is no interactive mode to create a global state. The first member defines the global state! Let’s say User A is the first, User B comes second; that means:
- if A has a file in
relative/path/file
and B doesn’t, it will be created - if B (!) has a file in
relative/path/file
and A doesn’t, it WILL BE DELETED
Each user monitors one directory. This directory will be indexed via a local index. Hazelcast also manages a remote index (of relative paths), which represents the global state.
If the user changes/updates/deletes a file the local index changes and a CRUD Push event will be fired. The remote indexes, serving as a “global” entity listens to these events and incorporates the changes. In turn the remote index fires CRUD Pull events. The users listens to these events and connects to the user responsible for the event.
The initialization phase takes place when a new member joins the network and it has to pair up with a preexisting member, let’s call him member A, in order to sync the directory being watched. The new member, let’s call him B, hasn’t synchronized before but may already have some files in that directory.
Both members have to exchange their index, to decide what differs between. If both parties communicate the first time they both have to exchange their full index. Later they might just exchange a version number of the index file to decide if changes to the index has been made.
If both parties have access to the current index of their respective partner they can decide what changes are neccesary to get to the current state.
Both parties have to communicate the state of the files in order to decide which state is the correct. Both parties have to acknowledge changes that will be made to the file system, as files may be added, changed or deleted that weren’t supposed to.
Let’s consider the following simple scenario, where we have to sync two directories, where each party holds four different files. Files c and d are the same file, where _1
and _2
indicate different versions, _1
one being the older and _2
being the newer one. If a file is missing in the row, the user doesn’t have the file.
Case | User A | User B |
---|---|---|
1 | a | |
2 | b | b |
3 | c_1 | c_2 |
4 | d_2 | d_1 |
5 | e |
The first and the fifth, and respectively the third and foruth case are symetrical and we are left with three use cases:
This case can arise in two situations:
- Member A has added the file
a
and Member B didn’t added the file yet. The solution is to push the file to Member B - Member B deleted the file while being offline. The solution is to delete to file on Member A
As this can’t be decided automatically both members have to acknowledge the changes to the file system.
In this case we don’t have to do anything.
Again the solution seems to push the changes to Member B. But Member A might have deleted the file while being offline and exchanged it with another with the same name or just an older copy of the same file. The date of the file doesn’t reflect the current state.
The solution is to manually find a state on which both users agree on. This means that each difference has to be acknowledged.
Right now there is no interactive mode or ui to define the global state. The first member to connect, defines the global state.
Once both users have established a state of their files on which they both agree on. Every change to the respective filesytem can be pushed directly to the other particpating members.
As both indexes work seperately, the local index couldn’t differentiate between user generated changes and changes that were pulled because of CRUD Pull events (and changed the filesystem).
An identifier was needed. The first idea was making use of the last modified field. Unfortunately there were problems setting the datetime via the NIO.2 API.
So I only make use of a SHA-1 hash value of the contents of the file.
- Setting date/times via Java API
- Add support of deleting/creating directories (lost because of SHA-1 use)
- Interactive ConflictResolver
- support to download from various sources
- FileInspector to support merges in files (vcards)