Feng Gao
4 min readOct 11, 2020

--

Implement Git by yourself (1: Introduction)

I’d like to say that Git is the most popular version control system (VCS). As a developer, you probably use git porcelain commands in your daily work and treat it as black box. But

If you can’t make one, you don’t know how it works.

You can access to git source code though, it’s a little bit of challenge to go through this repo now. What’s worse, git is written in C language which is sophisticated for us. Git is also Linus Torvalds’s masterpiece. It contains a lot of tricky code.

So, we are going to implement Git by ourselves. This time, we will choose C# language with .Net Core platform which is cross-platform language.

This blog series has two major references

  1. ugit: DIY Git in Python
  2. Git internal

1 Structure of .git folder

The first step using git is to type git init in the target folder. It will create a sub-folder named .git.

We’re going to dive into HEAD, objects and refs items, which are the core part of git at the first version.

In general, .git folder is file-base database which means we can restore the codebase as long as .git folder is intact.

2 Git Object

Plumbing command git hash-object takes some data and store it in .git/objects directory, then display the unique key which maps to this data object.

$ echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4

Let’s see what happened in the .git/objects directory

It create a file 70460b4b4aece5915caf5c68d12f560a9fe3e4 in d6 folder. Where does the value come from ?

It’s SHA-1 digest of data which consists of content and header.

The type could be

  • blob: The common file
  • tree: The folder
  • commit: the commit log

With SHA-1 value, we can also restore the blob file easily.

As we known, the blob file doesn’t include any file name and attributes information. All of them are kept track in tree object.

From the above output, the structure would like be that.

With this nest structure, it’s possible the restore the file with correct file name and folders.

Every commit command will create an object as well, which includes

  1. Current work directory tree
  2. Previous commit object
  3. Committer user information
  4. Commit message.

2 Git Reference

We can travel through the commit history by the commit id (the SHA-1 value). But it looks like too difficult to remember such long value. Git provides the readable mechanism to reach specific commit.

$ find .git/refs/
.git/refs
.git/refs/heads
.git/refs/heads/master
.git/refs/tags

It has two directories in the .git/refs directory. Each file in the .git/refs/heads means a individual branch. And files in the .git/refs/tags represents each tag you create.

Let’s look into the content of the .git/refs/head/master .

$ cat .git/refs/heads/master
584ad834b71a95161ee79d237b730c30a06a080a
$ git cat-file -p 584ad834b71a95161ee79d237b730c30a06a080a
tree bdb8545ede07e475bdfbfa0fb2c9bd3ad2653f00
parent 28cd27cccf7ed33b4556e2ea66d06cdbbac038fc
author gaufung <gaufung@outlook.com> 1602422297 +0800
committer gaufung <gaufung@outlook.com> 1602422297 +0800
second commit

The same with .git/ref/tag

$ git tag v1.0 28cd27cccf7ed33b4556e2ea66d06cdbbac038fc
$ cat .git/refs/tags/v1.0
28cd27cccf7ed33b4556e2ea66d06cdbbac038fc
$ git cat-file -p 28cd27cccf7ed33b4556e2ea66d06cdbbac038fc
tree c5b400005f1175376a54c6e4ba5d45072747adf9
author gaufung <gaufung@outlook.com> 1602421464 +0800
committer gaufung <gaufung@outlook.com> 1602421464 +0800
first commit

How does git know which branch are in? The answer is the HEAD file.

$ cat .git/HEAD
ref: refs/heads/master

When you advance the commit history, HEAD file would point out which head needs to be updated.

What if you type git checkout <commit id> ?

$ git checkout 28cd27cccf7ed33b4556e2ea66d06cdbbac038fc
Note: checking out '28cd27cccf7ed33b4556e2ea66d06cdbbac038fc'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>HEAD is now at 28cd27c first commit$ cat .git/HEAD
28cd27cccf7ed33b4556e2ea66d06cdbbac038fc

Now the HEAD doesn’t point to any refs any more just a commit id. We call this case as detached HEAD . It’s dangerous since you cannot come back if you switch to other branch.

3 Conclusion

They are basic knowledge for git internal. It’s good beginning for us to implement basic feature by ourselves.

--

--

Feng Gao

A software developer in Microsoft at Suzhou. Most articles spoken language is Chinese. I will try with English when I’m ready