Implement Git by Yourself (2: Data provider)

Feng Gao
3 min readNov 7, 2020


Implement Git series

In the previous post, we have known that Git chooses local file system as database to implement distribution. The key is the the .ugit folder in the repo. On top of that we build all the command to manipulate the repo for version control.

The architecture of ugit .

With IDataProvider we hidden all basic IO operation and provide the local file database operator, like HashObject,GetObject,Get/Update/Delete ref and so on.

We leverage System.IO.Abstractions to make ugit testable and injectable. Besides the ugit file database operator, the IFileOperator also exposes some primitive IO operation.

1 Init

It doesn’t have magic in the init operation and just create .ugit and ugit\objects folder in the working directory. It also will delete existing folder if you run multi-times of init command.

if (this.fileSystem.Directory.Exists(this.GitDir)){    this.fileSystem.Directory.Delete(this.GitDir, true);}this.fileSystem.Directory.CreateDirectory(this.GitDir);this.fileSystem.Directory.CreateDirectory(Path.Join(this.GitDir, "objects"));

2 Object

Object setter and getter are back bones for a file database. In order to distinguish different files, we have to add a tag for them. The GetObject and HashObject methods signature are

byte[] GetObject(string oid, string expected="blob");
string HashObject(byte[] data, string type="blob");

To simplify to scenario, we choose UTF-8 encoding for all files and it has \0 byte as the separator of type and data. tree and commit are the other types of object.

3 Ref

The ref stands for reference. In ugit it has three kinds of references:

  • branch: .ugit\refs\heads\<branch-name>
  • tag: .ugit\refs\tags\<tag-name>
  • HEAD: ugit\HEAD

In branchand tag reference, the file must be commit object id. The tag reference cannot update once it has been create. However, the branch reference will go forward with the commit change in that branch.

Now the question is

How does ugit know which branch it belongs to?

The answer is the HEAD . It might point to the head ref currently. Looking at the following illustration.

When the repo go forward, the HEAD doesn’t need to update. It looks like the pointer of pointer.

Sometimes, the HEAD doesn’t always point to the specific branch. When you check out the specific tag or commit Id, the HEAD value will refresh that object id value.

4 Index

Comparing with other CVS, ugit has staging area.

the stage area is the .ugit\index file in the JSON format.



Feng Gao

A software developer in Microsoft at Suzhou. Most articles spoken language is Chinese. I will try with English when I’m ready