DVCSs and Changeset Numbering

January 29, 2010 at 05:30 PM | Version control | View Comments

One of my big beefs with DVCSs is their version numbers.

For example, take two changesets I committed today: 6899 and 02a9. Which one is more recent? How many changesets separate them? Without access to a repository, there's no way to tell… But that sort of information can be useful to have.

Two of the three DVCSs I have experience with, bzr and hg, both take steps towards solving this problem.

bzr tries "really hard" to give everything a sequential ID, and uses those IDs in the UI (as far as I can tell) all the time (it does use unique hashes under the hood, but they aren't shown very much):

$ bzr log
------------------------------------------------------------
revno: 3
message:
  A merge
    ------------------------------------------------------------
    revno: 1.1.1
    message:
      A conflicting change

hg assigns local aliases for each changeset, and displays those aliases along with hashes:

$ hg log -r 4:6
changeset:   4:658109dca65b
description:
A merge

changeset:   5:bd8053bf02f1
description:
A conflicting change

And, of course, git doesn't stand for this sort of frivolity and shows pure, unadulterated, hashes:

$ git log
commit aa55884d693c92da6dc96eb7a45c9ecd774fefc2

    A merge

commit 8e47937468071ae29d385b76ff925d231c65b97b

    A conflicting change

I'd like to see this taken a step further, though: I'd like the changeset hashes themselves to encode some basic information about where they live in the repository.

For example, one way to do this could be using the first two bytes of the hash to store the distance from the root*, and the next two bytes to store a "repository id", which is generated once, when a repository is first cloned or initialized.

So, for example, one of these changeset hashes might look like this: 0afc53bf02f1, and committing again to the same repository would produce 0bfc109dca65.

Of course, this scheme doesn't guarantee anything - it's entirely possible to generate two completely different changesets with the exact same four-byte prefix… But in the general case, this sort of scheme could make it significantly easier to figure out how arbitrary changesets relate to one another.

*: Doug Philips suggested this - thanks.