Explain BLOB object and tree object in Git.

GitLinuxOpen SourceSoftware & Coding

Git uses a series of BLOBs and trees to store content of the working directory of a project. Whenever we perform a commit operation, Git internally creates a series of trees and BLOBs, which is the binary representation of the project folder structure at that point in time of commit.

What is BLOB?

BLOB stands for Binary Large Object. Each version of a file in Git is represented as a BLOB. A BLOB holds a file’s data but doesn’t contain any metadata about the file or even its name.

To understand a BLOB let us see an example.

  • Create 3 files “file1.txt”, “file2.txt” and “file3.txt” text files. The first two files will contain the same contents and the second file will have different content.

$ git init                    // initialize a repo
$ echo hello>file1.txt        // create a file and enter some content
$ echo hello>file2.txt        // create a file and enter the same content
$ echo hello world>file3.txt  // create a file and enter some content
  • Let us add each of these files to the staging area. Staging these files will create BLOBs under the “.git\objects” folder. In this example, we will list the contents in the “.git\objects” folder each time a file is staged.

$ git add file1.txt    // stage the file
$ ls .git/objects/     // list contents
$ git add file2.txt
$ ls .git/objects/
$ git add file3.txt
$ ls .git/objects/

The output of the ls commands is shown below.

dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master)
$ git add filel.txt

dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master)
$ Is .git/objects/
ce/ info/ pack/

dell@DESKTOP-N96LNR5 MINGW64 /e/tut_repo (master)
$ git add file2.txt

dell@DESKTOP-N961NR5 MINGw64 /e/tut_repo (master)
$ Is .git/objects/
ce/ info/ pack/

dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master)
$ git add file3.txt

dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master)
$ Is .git/objects/
3b/ ce/ info/ pack/

Folders with the name ‘ce’ and ‘3b’ are created when “file1.txt” and “file3.txt” are staged. However, no new folder is created when “file2.txt” is staged. This is because “file1.txt” and “file2.txt” have the same content.

  • Let us now see the contents of the folder ‘ce’ and ‘3b’.

$ ls .git/objects/ce
$ ls .git/objects/3b

The output shows that the folders contain BLOB objects represented as SHA1 hash.

dell@DESKTOP-N961NRS MINGW64 /e/tut_repo (master)
$ Is .git/objects/ce

dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master)
$ Is .git/objects/3b

Let us now verify the type of the files and its contents.

$ git cat−file −t ce01
$ git cat−file −p ce01
$ git cat−file −t 3b18
$ git cat−file −p 3b18

From the output it is clear that Git created 2 BLOBs though we added 3 files. This is because the content in the first two files is the same. It is clear from the output that a BLOB stores only a file’s contents and doesn’t store file names.

//output of cat−file −t ce01
//output of cat−file −p ce01
//output of cat−file −t 3b18
//output of cat−file −p 3b18
hello world

What is a tree?

A tree is like a directory. Each commit in Git points to a tree object, which in turn references the BLOBs. A tree object records the following.

  • BLOB identifiers

  • Path names

  • Metadata of all files in that directory

A tree can recursively reference other tree objects or subtrees. Thus, a tree builds a complete hierarchy of files and subdirectories. Just like BLOBs, trees can be viewed under the “.git/objects” folder.

Let us understand a tree through our previous example. We had created three files and added all of them to the staging area. Let us verify this by using the git status command. Let us also commit all changes to the repository.

$ git status -s // verify status
$ git commit -m 'initial commit' // commit to the repo

The status indicates that 3 files have been staged. On issuing the git commit command a commit with the hash “84a00db” is created

dell@0ESKTop-N961NR5 MINGW64 /e/tut_repo (master)
$ git status -s
A fiIe1.txt
A file2.txt
A file3.txt
dell@DESKTOP-N961NR5 MINGW64 /e/tut—repo (master)
$ git commit -m 'initial commit'
[master (root-commit) 84aOOdb) initial commit
3 files changed, 3 insertions(+)
create mode 100644 file1. txt
create mode 100644 file2. txt
create mode 100644 file3.txt

The internal structure of our example can be represented as given below −

The above diagram shows the commit “84a0” points to a tree which is the root folder of the project. The root folder has 3 files that are stored as BLOBs. The first 2 files point to the same BLOB as their contents are the same. The tree object holds reference to all BLOBs. If we create new folders within the current project, then the folders will be created as a subtree of the root project tree “e115”.

Let us verify the objects folder to see if any commit object or tree object is created. This can be viewed using the ls command.

$ ls .git/objects
$ ls .git/objects/84
$ ls .git/objects/e1

Output shows that the folder “84” represents a commit and “e1” will be the tree associated with it. These folders have a pointer file represented by a SHA1 hash.

dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master)
$ Is .git/0bjects/84
dell@DESKTOP-N961NR5 MINGW64 /e/tut_repo (master)
$ Is .git/objects/el

Let us now verify the content in these pointer files using the cat-filecommand.

$ git cat−file −p 84a0
$ git cat−file −p e115

From the output, it is clear that the commit (84a0) points to a tree which is the root folder of the project.

tree e11588b4a639bb90b18268b7e26f243ba31706fd
author Kiran 1612777422+0530
committer Kiran 1612777422+0530
initial commit

The root folder of the project holds three files that are stored as BLOBs.

dell@DESKTop-N961NR5 MINGW64 /e/tut_repo (master)
$ git cat
−file −p e115
100644 blob ce013625030ba8dba906f756967f9e9ca394464a file1.txt
100644 blob ce013625030ba8dba906f756967f9e9ca394464a file2.txt
100644 blob 3b18e5L2dba79e4c8300dd08aeb37f8e728b8dad file3.txt
Updated on 20-Feb-2021 09:04:17