Git - Basic Concepts

Advertisements


Version Control System (VCS)

Version Control System (VCS) is a software that helps Software developers to work together and maintains the complete history of their work.

Following are goals of VCS:

  1. Allow developers to work simultaneously.

  2. Do not overwrite each other’s changes.

  3. Maintain history of every version of everything.

Following are types of VCS:

  1. Centralized version control system (CVCS).

  2. Distributed/Decentralized version control system (DVCS).

In this tutorial session, we will concentrate only on Distributed version control system and especially Git. Git falls under distributed version control system.

Distributed Version Control System (DVCS)

Centralized version control system uses central server to store all files and enables team collaboration. But the major drawback of CVCS is single point of failure, i.e., failure of central server. Unfortunately, if central server goes down for an hour, then during that hour no one can collaborate at all. And even in worst case if disk of central server gets corrupted and proper backup haven’t taken, then you will lose entire history of the project. Here, DVCS comes into picture.

DVCS clients not only check out the latest snapshot of the directory but they also fully mirror the repository. If sever goes down, then repository from any client can be copied back to server to restore it. Every checkout is full backup of the repository. Git does not rely on central server that is why you can perform many operations when you are offline. You can commit changes, create branches view logs and perform other operations when you are offline. You require network connection only to publish your changes and take latest changes.

Advantages of Git

  1. Free and open source

  2. Git is released under GPL’s open source license. It is available freely over the internet. You can use Git to manage propriety projects without paying single penny. As it is open source, you can download its source code and also perform changes according to your requirements.

  3. Fast and small

  4. As most of the operations are performed locally, it gives huge benefit in terms of speed. Git does not rely on central server that is why for every operation there is no need to interact with remote server. Core part of the Git is written in C, which avoids runtime overhead associated with the other high level languages. Though Git mirrors entire repository, size of the data on the client side is small. This illustrates that how efficient it is at compressing and storing data on client side.

  5. Implicit backup

  6. The chances of losing data are very rare when there are multiple copies of it. Data present on any client side is mirror of the repository, hence it can be used in the event of crash or disk corruption.

  7. Security

  8. Git uses common cryptographic hash function called secure hash function (SHA1) to name and identify objects within its database. Every file and commit is check-summed and retrieved by its checksum at the time of checkout. Meaning that, it is impossible to change file, date, commit message and any other data from Git database without knowing Git.

  9. No need of powerful hardware

  10. In case of CVCS, the central server needs to be powerful enough to serve request of the entire team. For smaller team, it’s not an issue but as team size grows, the hardware limitation of the server can be a performance bottleneck. In case of DVCS, developers don’t interact with the server unless they need to push or pull changes. All the heavy lifting happens on the client side, so the server hardware can be very simple indeed.

  11. Easier branching

  12. CVCS uses cheap copy mechanism, means if we create new branch it will copy all code to the new branch, so it’s time consuming and not efficient. Also, deletion and merging of branches in CVCS is complicated and time consuming. But branch management with Git is very simple. It takes only few seconds to create, delete and merge branches.

DVCS Terminologies

Local repository

Every VCS tool provides private workplace as a working copy. Developer does changes in his private workplace and after commit these changes become part of the repository. Git takes this one step further by providing them a private copy of the whole repository. User can perform many operations with this repository like add file, remove file, rename file, move file, commit changes and many more.

Working directory and staging area or Index

The working directory is the place where files are checked out. In other CVCS, developer generally does modification and commits his changes directly to the repository. But Git uses different strategy. Git doesn’t track each and every modified file. Whenever you do commit operation, Git looks for the files present in staging area. Only files present in the staging area is considered for commit and not all modified files.

Let us see basic workflow of the Git.

Step 1: You modify file from working directory.

Step 2: You add these files to the staging area.

Step 3: You perform commit operation that moves files from staging area. After push operation, it stores changes permanently to the Git repository.

Git Tutorial

Suppose you modified two files namely “sort.c” and “search.c” and you want two different commits for each operation. You can add one file in staging area and do commit. After first commit, repeat the same procedure for another file.

# First commit
[bash]$ git add sort.c

# adds file to the staging area
[bash]$ git commit –m “Added sort operation”

# Second commit
[bash]$ git add search.c

# adds file to the staging area
[bash]$ git commit –m “Added search operation”

Blobs

Blob stands for Binary Large Object. Each version of file is represented by blob. A blob holds file data but doesn’t contain any metadata about file. It is a binary file, in Git database it is named as SHA1 hash of that file. In Git, files are not addressed by name. Everything is content-addressed.

Trees

Tree is an object, which represents a directory. It holds blobs as well as other sub-directories. A tree is a binary file that stores references to blobs and trees which is also named as the SHA1 hash of the tree object.

Commits

Commit holds the current state of the repository. A commit is also named by SHA1 hash. You can consider commit object as a node of the linked list. Every commit object has a pointer to the parent commit object. From given commit, you can traverse back by looking at the parent pointer to view the history of the commit. If a commit has multiple parent commits, that means the particular commit is created by merging two branches.

Branches

Branches are used to create another line of development. By default, Git has a master branch, which is same as trunk in Subversion. Usually to work on new feature, a branch is created. Once feature is completed, it is merged back with master branch and we delete the branch. Every branch is referenced by HEAD, which points to the latest commit in the branch. Whenever you make a commit, HEAD is updated with latest commit.

Tags

Tag assigns a meaningful name with a specific version in the repository. Tags are very similar to branches, but the difference is tags are immutable. Means tag is a branch, which nobody intends to modify. Once tag is created for particular commit, even if you create a new commit, it will not be updated. Usually, developer creates tags for product releases.

Clone

Clone operation creates the instance of the repository. Clone operation not only check outs the working copy, but it also mirrors the complete repository. User can perform many operations with this local repository. The only time networking gets involved is when the repository instances are being synchronized.

Pull

Pull operation copies changes from a remote repository instance to local one. The pull operation is used for synchronization between two repository instances. This is same as update operation in Subversion.

Push

Push operation copies changes from a local repository instance to a remote one. This is used to store changes permanently into the Git repository. This is same as commit operation in Subversion.

HEAD

HEAD is pointer, which always points to the latest commit in the branch. Whenever you make a commit, HEAD is updated with latest commit. The heads of the branches are stored in .git/refs/heads/ directory.

[CentOS]$ ls -1 .git/refs/heads/
master

[CentOS]$ cat .git/refs/heads/master
570837e7d58fa4bccd86cb575d884502188b0c49

Revision

Revision represents the version of the source code. Revisions in Git are represented by commits. These commits are identified by SHA1 secure hashes.

URL

URL represents the location of the Git repository. Git URL is stored in config file.

[tom@CentOS tom_repo]$ pwd
/home/tom/tom_repo

[tom@CentOS tom_repo]$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = gituser@git.server.com:project.git
fetch = +refs/heads/*:refs/remotes/origin/*


Advertisements
Advertisements