Nested Repository Handling in git and Mercurial

July 14, 2011 at 12:31 AM | Version control | View Comments

git and Mercurial both have extensions for nesting particular versions of external repositories (git has submodules, Mercurial has subrepos), and I've found it interesting (and telling?) to compare the implementations.

To nest the repository nested at .subrepos/nested in Mercurial, the line .subrepos/nested = nested is added to the file .hgsub and the line [hash] .subrepos/nested (where [hash] is the current revision's hash) is added to the file .hgsubstate. As the version of nested changes, the .hgsubstate file is updated and committed like any other file.

In contrast, to nest nested at .modules/nested in git, the lines:

[submodule ".modules/nested"]
    path = .modules/nested
    url = nested

are added to .gitmodules, and an entry is added to the tree:

160000 commit [hash]        .modules/nested

I found this interesting because it re-enforces my feelings towards both systems: I appreciate the simplicity and accessibility of Mercurial's implementation, and I appreciate the intellectual stimulation I got from learning about git's implementation.

But I also believe that git's implementation is inferior in every practical way.

Because Mercurial's subrepo state is stored in plain text files which are committed into the repository, any tool which is used to view/edit/exchange a repository will trivially be able to handle subrepos. For example, changes to subrepos can be trivially exchanged with standard diff/patch tools:

$ hg diff -c bump_nested
diff --git a/.hgsubstate b/.hgsubstate
--- a/.hgsubstate
+++ b/.hgsubstate
@@ -1,1 +1,1 @@
-[... old hash ...] .subrepos/nested
+[... new hash ...] .subrepos/nested

Versions pinned in subrepos can be easily changed with a text editor (for example, to resolve merge conflicts), and tools which interact with Mercurial repositories (for example, hgweb) will function correctly, even if they are ignorant of subrepos.

Contrast this with git's submodules, where submodule versions are “hidden” in the tree, and every tool must be aware of their existence. For example, it is impossible to use standard diff and patch tools:

$ git show bump_nested
...
diff --git a/.modules/nested b/.modules/nested
index f44396c..1fd6830 160000
--- a/.modules/nested
+++ b/.modules/nested
@@ -1 +1 @@
-Subproject commit [... old hash ...]
+Subproject commit [... new hash ...]

Instead, git am must be used. Any tool that interacts with a submodule-enabled repository must be aware of submodues, otherwise it will crash with “fatal: bad object” (ie, because it likely assumes that any hash in the tree references a blob in the repository, which is not true in the case of a commit entry) and submodule-specific commands must be learned and used to resolve submodule merge conflicts [0].

All of these complications could be acceptable if they allowed git's submodules to be somehow “better” than if a simpler scheme, like Mercurial's .hgsubstate was used… But, as far as I can tell, this implementation affords no practical benefit.

[0]

hint: after a submodule merge conflict, git submodule status will show all the revisions you might care about:

$ git submodule status
-595c7a8dd110ab3f0f305bb0f3d6356ca5d62d99 nested
-cacb40625cc891b33c9c935442c0180e8ba5ab15 nested
-5e71d23bc5d24d18e026bcf12773f3fade1ac6b9 nested

And you'll need to remember that the first line is the common ancestor, the second line “our” version, the third line is “their” version. To resolve the conflict, the standard git checkout --{ours,theirs} appears to do nothing — you need to copy the hash of the desired revision, cd nested; git checkout $hash; cd .., then commit as normal.