Git uses a series of BLOBs and trees to store content of the working directory of a project. Whenever we perform a commit operation, Git internally creates a series of trees and BLOBs, which is the binary representation of the project folder structure at that point in time of commit.
BLOB stands for Binary Large Object. Each version of a file in Git is represented as a BLOB. A BLOB holds a file’s data but doesn’t contain any metadata about the file or even its name.
To understand a BLOB let us see an example.
Create 3 files “file1.txt”, “file2.txt” and “file3.txt” text files. The first two files will contain the same contents and the second file will have different content.
$ git init // initialize a repo $ echo hello>file1.txt // create a file and enter some content $ echo hello>file2.txt // create a file and enter the same content $ echo hello world>file3.txt // create a file and enter some content
Let us add each of these files to the staging area. Staging these files will create BLOBs under the “.git\objects” folder. In this example, we will list the contents in the “.git\objects” folder each time a file is staged.
$ git add file1.txt // stage the file $ ls .git/objects/ // list contents $ git add file2.txt $ ls .git/objects/ $ git add file3.txt $ ls .git/objects/
The output of the ls commands is shown below.
dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master) $ git add filel.txt dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master) $ Is .git/objects/ ce/ info/ pack/ dell@DESKTOP-N96LNR5 MINGW64 /e/tut_repo (master) $ git add file2.txt dell@DESKTOP-N961NR5 MINGw64 /e/tut_repo (master) $ Is .git/objects/ ce/ info/ pack/ dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master) $ git add file3.txt dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master) $ Is .git/objects/ 3b/ ce/ info/ pack/
Folders with the name ‘ce’ and ‘3b’ are created when “file1.txt” and “file3.txt” are staged. However, no new folder is created when “file2.txt” is staged. This is because “file1.txt” and “file2.txt” have the same content.
Let us now see the contents of the folder ‘ce’ and ‘3b’.
$ ls .git/objects/ce $ ls .git/objects/3b
The output shows that the folders contain BLOB objects represented as SHA1 hash.
dell@DESKTOP-N961NRS MINGW64 /e/tut_repo (master) $ Is .git/objects/ce 013625030ba8dba906f756967f9e9ca394464a dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master) $ Is .git/objects/3b 18e512dba79e4c8300dd08aeb37f8e728b8dad
Let us now verify the type of the files and its contents.
$ git cat−file −t ce01 $ git cat−file −p ce01 $ git cat−file −t 3b18 $ git cat−file −p 3b18
From the output it is clear that Git created 2 BLOBs though we added 3 files. This is because the content in the first two files is the same. It is clear from the output that a BLOB stores only a file’s contents and doesn’t store file names.
//output of cat−file −t ce01 blob //output of cat−file −p ce01 hello //output of cat−file −t 3b18 blob //output of cat−file −p 3b18 hello world
A tree is like a directory. Each commit in Git points to a tree object, which in turn references the BLOBs. A tree object records the following.
Metadata of all files in that directory
A tree can recursively reference other tree objects or subtrees. Thus, a tree builds a complete hierarchy of files and subdirectories. Just like BLOBs, trees can be viewed under the “.git/objects” folder.
Let us understand a tree through our previous example. We had created three files and added all of them to the staging area. Let us verify this by using the git status command. Let us also commit all changes to the repository.
$ git status -s // verify status $ git commit -m 'initial commit' // commit to the repo
The status indicates that 3 files have been staged. On issuing the git commit command a commit with the hash “84a00db” is created
dell@0ESKTop-N961NR5 MINGW64 /e/tut_repo (master) $ git status -s A fiIe1.txt A file2.txt A file3.txt dell@DESKTOP-N961NR5 MINGW64 /e/tut—repo (master) $ git commit -m 'initial commit' [master (root-commit) 84aOOdb) initial commit 3 files changed, 3 insertions(+) create mode 100644 file1. txt create mode 100644 file2. txt create mode 100644 file3.txt
The internal structure of our example can be represented as given below −
The above diagram shows the commit “84a0” points to a tree which is the root folder of the project. The root folder has 3 files that are stored as BLOBs. The first 2 files point to the same BLOB as their contents are the same. The tree object holds reference to all BLOBs. If we create new folders within the current project, then the folders will be created as a subtree of the root project tree “e115”.
Let us verify the objects folder to see if any commit object or tree object is created. This can be viewed using the ls command.
$ ls .git/objects $ ls .git/objects/84 $ ls .git/objects/e1
Output shows that the folder “84” represents a commit and “e1” will be the tree associated with it. These folders have a pointer file represented by a SHA1 hash.
dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master) $ Is .git/0bjects/84 aOOdb87bb5c69926b3343a564db1b3a96a389d dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master) $ Is .git/objects/el 1588b4a639bb90b18268b7e26f243ba31706fd
Let us now verify the content in these pointer files using the cat-filecommand.
$ git cat−file −p 84a0 $ git cat−file −p e115
From the output, it is clear that the commit (84a0) points to a tree which is the root folder of the project.
tree e11588b4a639bb90b18268b7e26f243ba31706fd author Kiran 1612777422+0530 committer Kiran 1612777422+0530 initial commit
The root folder of the project holds three files that are stored as BLOBs.
dell@DESKTop-N961NR5 MINGW64 /e/tut_repo (master) $ git cat −file −p e115 100644 blob ce013625030ba8dba906f756967f9e9ca394464a file1.txt 100644 blob ce013625030ba8dba906f756967f9e9ca394464a file2.txt 100644 blob 3b18e5L2dba79e4c8300dd08aeb37f8e728b8dad file3.txt