Git for binary files#
git diff
can be configured so that it can also display meaningful diffs for
binary files.
… for Excel files#
For this we need openpyxl and pandas:
$ pipenv install openpyxl pandas
Then we can use pandas.DataFrame.to_csv in
exceltocsv.py
to convert the Excel files:
# SPDX-FileCopyrightText: 2023 Veit Schiele
#
# SPDX-License-Identifier: BSD-3-Clause
import sys
from io import StringIO
import pandas as pd
for sheet_name in pd.ExcelFile(sys.argv[1]).sheet_names:
output = StringIO()
print("Sheet: %s" % sheet_name)
pd.read_excel(sys.argv[1], sheet_name=sheet_name).to_csv(
output, header=True, index=False
)
print(output.getvalue())
Now add the following section to your global Git configuration
~/.gitconfig
:
[diff "excel"]
textconv=python3 /PATH/TO/exceltocsv.py
binary=true
Finally, in the global ~/.gitattributes
file, our excel
converter is
linked to *.xlsx
files:
*.xlsx diff=excel
… for PDF files#
For this, pdftohtml
is additionally required. It can be installed with
$ sudo apt install poppler-utils
$ brew install pdftohtml
Add the following section to the global Git configuration ~/.gitconfig
:
[diff "pdf"]
textconv=pdftohtml -stdout
Finally, in the global ~/.gitattributes
file, our pdf
converter is
linked to *.pdf
files:
*.pdf diff=pdf
Now, when git diff
is called, the PDF files are first converted and then a
diff is performed over the outputs of the converter.
… for Word documents#
Differences in Word documents can also be displayed. For this purpose Pandoc can be used, which can be easily installed with
Then add the following section to your global Git configuration
~/.gitconfig
:
[diff "word"]
textconv=pandoc --to=markdown
binary=true
prompt=false
Finally, in the global ~/.gitattributes
file, our word
converter is
linked to *.docx
files:
*.docx diff=word
The same procedure can be used to obtain useful diffs from other binaries, for
example *.zip
, *.jar
and other archives with unzip
or for changes in
the meta information of images with exiv2
. There are also conversion tools
for converting *.odt
, .doc
and other document formats into plain text.
For binary files for which there is no converter, strings are often sufficient.