Splitting and merging repos

Splitting repos

It is often useful to divide a large Git repository into multiple smaller ones. This can be necessary in a project that has grown over time, or if you want to manage a sub-project in a separate repository. Of course you could simply create a new repository and copy the files, but you would also loose the entire version history.

Here I describe how you can split a Git repository without losing the associated history.

Scenario and goals

We want to split out from the Jupyter tutorial repository the part that deals with visualising the data: docs/viz/. The challenge is that the history for the docs/viz/ directory is mixed with other changes. Therefore, we first clone the same repository twice:

$ git clone git@github.com:veit/jupyter-tutorial.git
Klone nach 'jupyter-tutorial'...
$ git clone git@github.com:veit/jupyter-tutorial.git pyviz-tutorial
Klone nach 'pyviz-tutorial' ...

The next step is to filter out the unwanted histories from each of the two repos. To rewrite the history and keep only those commits that actually affect your content of a particular subfolder, we use git-filter-repo:

$ curl https://raw.githubusercontent.com/newren/git-filter-repo/main/git-filter-repo -o git-filter-repo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  161k  100  161k    0     0   578k      0 --:--:-- --:--:-- --:--:--  584k

$ cd pyviz-tutorial
$ python3 ../git-filter-repo --path docs/viz

The only thing left to do now is to adjust the remote URL:

$ git remote add origin git@github.com:veit/pyviz-tutorial.git
$ git push -u origin main

For our Jupyter tutorial repository, we now invert the selected path:

$ cd jupyter-tutorial
$ python3 ../git-filter-repo --invert-paths --path docs/viz
$ git remote add origin git@github.com:veit/jupyter-tutorial.git
$ git push -f -u origin main

Repos zusammenführen

Repos with different histories can also be merged. This can be desirable, for example, if a project was started locally but a project with an initial commit was created on the Git server. In this case, you can simply use git pull --allow-unrelated-histories; the option is then passed on to the underlying git merge.

If you want to merge two larger projects, you can do this as follows:

$ git remote add -f pyviz git@github.com:veit/pyviz-tutorial.git docs/viz/
$ git merge -s ours --no-commit --allow-unrelated-histories pyviz/main
$ git read-tree --prefix=docs/pyviz -u pyviz/main
$ git commit -m "Merge pyviz tutorial as subdirectory"
$ git push

See also

merge-options