Synchronization puzzle.
Alex Tweedly
alex at tweedly.net
Fri Jan 21 18:52:34 EST 2005
[ slightly OT - this is a "design / algorithm" question rather than a
Rev/xtalk one, really ]
[and it's a long, rambling email - so feel free to skip it]
I am in the process of writing my own Address Book application.
That may sound like an odd thing to do :
aren't there already hundreds of programs to do that already ?
isn't there an address book built into the OS and/or major apps ?
can't you get everything from freeware to professional apps to do that ?
what's wrong with the software that came with your Palm Pilot (or
PDA, or Pocket PC, or ....)
why re-invent the wheel ?
The reason that none of them quite satisfies my needs is simple - I want
a shared address book, between myself, my wife and various other family
members. There need to be up to 8 different computers running this
application - all of them on a network, but never all on the same
network. There are two separate home networks, plus maybe a server on a
web-site somewhere so I can use it from anywhere with an Internet
connection; the 3 laptops will intermittently appear on one or other
network.
Any person can make updates to the address book, and I want a simple way
to synchronize those changes on to the other instances of the app. What
I've been doing until recently was using the Palm Desktop software that
came with my Palm Pilot on each machine; periodically I would sync the
Palm Desktop to my Palm Pilot, then re-connect the Pilot to another
machine and sync there, then to another, then ..... which all worked
OK, though it's a bit of a drag connecting the Pilot to the serial port
of each machine in turn. And it just seems crazy when they're connected
together by 100M Ethernet :-)
But now the two newest machines don't have a traditional serial port,
only USBs. So I have the choice of spending £29.95 for a cable or tens
(hundreds) of hours writing software - no competition :-)
I looked at using the Palm database, and simply (??) writing an app to
sync databases - but the combination of incomplete documentation on the
format (not to mention its complexity) and concern that I might conflict
with Palm's synchronization technique has put me off that. I should
mention that I never (or very, very rarely) actually use the Palm Pilot
for anything other than this synch'ing effort. The Palm Pilot is almost
never carried around; it has, for me, been squeezed out of its niche by
my laptop one one side and my cell phone with its 100+ phone numbers and
storage for many SMS messages on the other. (I sometime text myself an
address or directions if I think I'm likely to need it and won't be
taking the laptop - or occasionally even use pen+paper !!)
What I'm looking for, then, is a method to do "peer-to-peer"
synchronization between an undefined number of instances of the address
database. I've got a design, I've thought it through as much as I can,
I've done "dry-run"s on paper to check it works, I've implemented it and
tested it - but I'm still concerned that I've missed something, and that
it will fail in some circumstances.
Here's the relevant section of my "design document" (i.e. transcription
of the scribbling on the back of an envelope); I'd be very grateful for
any comments, corrections, etc. On the other hand, if there is some
standard technique or algorithm for doing this, and I just didn't know
about it (and couldn't find it in a Google search), then a pointer to a
good description would be also very useful.
---------------------------
Synchronization.
All synchs are user-triggered - i.e. a user on one machine will trigger
a sync to another machine; the user is available to resolve anyh
conflicts immediately) or she can choose to postpone that part of the
synchronization until later. The two machines must be networked together
at the time of a sync. That sync can use any of:
- shared file systems (e.g. laptop to file server)
- http request / post (e.g. laptop to web site)
- TCP based connection (e.g. laptop to laptop while on same network).
(I don't export the filesystems on the laptops, nor do I run http server
on them, so the plan is that each machine can run a address-book-sync
server that will accept TCP connections on some odd port, and transfer
data that way).
The fundamental sync algorithm is the same in each case; the description
below is written as though the synching machine had complete access to
the 'other' address files. The optimizations possible to avoid complete
transfer of the data should be apparent - but in fact for my address
book, the whole file is probably never going to be above 100K, so it's
not clear if those optimizations are worth the implementaion effort.
Time synchronization.
It cannot be assumed that the clocks of the different computers are in
sync, even approximately. It can (and will) be assumed that each
computer's clock will return times that are continuously increasing.
(Yes, I do know about NTGP, and SNTP, and TSP, and various other time
sync protocols - but I just know that I can't rely on people running them.)
Database format.
The Address Book consists of a number of records; each record has 3
"control" fields, and a number of data fields. The sync should use ONLY
the 3 control fields, except for conflict resolution, and hence should
be readily extended to cover Calendar, Tasks, and other similar PIM
database synchronization.
The three control fields for each record are:
ID : this is a 'globally' unique ID. It is allocated when the record
is first created, and never changed thereafter. It comprises the machine
name (these must be unique between all machines which are to share a
database), and a numeric ID allocated by the machine which creates the
record. The numeric portion of the ID will be increasing - but there
should, if possible, be no dependency on the fact that a later-created
record has a higher ID number than another record created earlier; they
just need to be all different.
last-event: a machine/time stamp identifying when this record was last
modified, and the machine on which this modification was done.
status: not currently defined, but put in here for future growth. Set
to "OK" for now.
Record creation: when a new record is created, the new entry receives an
ID consisting of the machine name of the machine in question plus the
"next" numeric ID, a last-event of the (same) machine name plus current
millisec timestamp, and status of OK.
Record modification: the ID and status are unchanged, the last-event is
set to the machine in question plus current timestamp. The complete
record (control and data fields) is added to the History database PRIOR
to any changes.
Record deletion: the record is removed from the address database, and
placed in the History database.
Note that the allocation of next record ID cannot simply check for the
highest current record (because deleted records are removed): the
highest numeric ID used must be tracked separately.
Synchronization set-up
When asked to do a sync, the program will fetch the the address and
history databases from the other machine.
(Note : optimization possible - fetch only control fields !)
Synch main loop.
Sort each address database ('mine' and 'his') in ID order. Reset
'current' counters to start.
Compare current record IDs:
mine is before his:
either mine has been added, or his has been deleted.
check his history database:
if it contains 'my' current entry, then it has been deleteed from
his database:
ACTION: delete from my database
else: it must have been added to mine, so
ACTION: add this recodrd to his database
advance my current counter
his is before mine: (exact mirror of above)
same ID.
compare 'last-events' :
same : everything is good, advance both current counters.
different - but the last-event machine is the same
take whichever entry has the later edit time (note -
from same machine, so they can be compared),
and add to the other database
advance both counters
different - and different machines:
there is no way to tell which is later - user needs to
resolve differences, or postpone.
then advance both counters.
User resolution of conflicts.
User should be presented with the set of common, and different fields,
and for each field chose which of the alternates is to be chosen.
BOTH prior versions are added to BOTH history databases (this may be
overkill - but seems a good option)
newly resolved entry gets last-event of the source machine plus its
current timestamp, and is added to both address databases.
User postpones resolution.
Both database retains its own version of the entry, and neither has its
history updated. (i.e. all other entries are synched - just this one is
left unchanged).
When both current counters have reached their end, all actions have been
collected.
[optimization: if the entire database was not fetched ahead of time,
then any required entries can be collected from the remote system (this
would be those needing resolution plus those where 'his' entry needs to
be added to 'my' database.]
Send to the remote system : either the complete updated databse and
history for him, or a series of add/delete/change records arising from
the actions collected above.
Done.
Note that even if the write to remote system fails, the updates can
still be done to the local system without any ill effects (and vice
versa, though that seems unlikely).
--------------------------------------------
I can't see any failing scenarios. The one part that I haven't worked
out fully is archiving / cleaning-out the history database. Clearly, it
can be pruned back such that only the latest event needs to be kept for
each entry - so the max size of the history database is limited to the
total size of all entries that have ever existed.
Since this is a 'tool' and not a 'product', that's probably good enough
for me. I can wait until I *know* that every existing copy is in sync,
and then manually clear out all the history files. Clearly that wouldn't
be adequate for a real product - but it's good enough for me.
('tool' - something the author can use; 'product' - something that
anyone can use, is easy to use, hard to misuse, fully documented, etc.)
This is a tool for my own use (though it will naturally be available on
revonline (or maybe a web site, if I ever organize one for my Rev
stacks) for anyone who wants to play with it, look at it, modify it for
their own purposes, etc.)
All comments welcome,
-- Alex.
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.7.1 - Release Date: 19/01/2005
More information about the use-livecode
mailing list