Sunday, September 8, 2013

How Google Docs Works

Here's an interview question for you: How would you implement an online word processing system like Google Docs?

This isn't too complicated if people are working on different pieces of a document or if they work at different times.  But pretty soon you'll start thinking about concurrent edits, lag, network failures, and the like.  What happens when two people edit the same line at the exact same time?  What if one person deletes a line above one that another person is editing, but the message takes a while to get there?

If you think about this a while, you may come up with a solution similar to the following, which uses vector clocks.  1) Each person keeps a clock and 2) each event is associated with a new tick on the clock so that 3) at any time the state of a document can be constructed from the events executed in temporal order.  Making a change to the document or receiving changes from another author causes an author's clock to increment.  See the diagram below, in which three authors edit the same document.  You can see the causes and effects of event B4.


C makes a change.  B gets the change and propagates the change to A.  A and B make changes at the same time and propagate to B and C, respectively.  Then B and C make changes at the same time and propagate to C and A, respectively.  Then C makes a change and forwards to A.  At the end of this timeline, A and C have the latest changes, but B does not.

Is it OK that B doesn't have the latest changes?  That depends on the system you're building.  You have to make a choice between multiple authorship and transactional integrity.  (More generally, there's a trade-off between consistency, availability, and partitioning).  Word-processing is typically not mission-critical.  It's not worth the delays that would be incurred by ensuring that all changes are propagated to all authors.

Although many people think that financial transactions require ACID transactions, ATMs use vector clocks in much the same way.  This is because ATMs are not always connected, and it takes time to sync them.  If you and your partner drain a joint account at the same time, you both get the money.  When the systems eventually talk to each other, they agree that the account is overdrawn, and you get an overdraft fee (or worse).  Debits and deposits are like edits to a shared document.  You just have to add them up to get the current balance of an account.

This is a bit of a simplification, but you get the idea.  Note that you must have some method for resolving conflicts.  If A and B edit the document at the same time and propagate changes to C, C must decide what to do.  If A and B both add a title to the first line of the document, for example, C could arbitrarily put B's title on the first line and decide that A's title should go on the second line.  At a bank, you have overdraft fees and fraud alerts.  You'll have to try to resolve conflicts in ways that aren't too confusing to users.

There are some other things to think about.  For example, at some point the clocks need to be reset and the systems synced.  This used to be more noticeable in Google Docs when you'd get a syncing message.  Also, when you get a lot of authors (or servers that need to be synced), things can get pretty complicated.  But, in situations in which you have a relatively small number of people who can make changes and in which response time (availability and partitioning) is more important than consistency, you may need to use vector clocks.

To read more, check out Wikipedia's article on operational transformation.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...