Git - Basic Concepts
Version Control System (VCS)
Version Control System (VCS) is a software that helps Software developers to work together and maintains the complete history of their work.
Following are goals of VCS:
Allow developers to work simultaneously.
Do not overwrite each other’s changes.
Maintain history of every version of everything.
Following are types of VCS:
Centralized version control system (CVCS).
Distributed/Decentralized version control system (DVCS).
In this tutorial session, we will concentrate only on Distributed version control system and especially Git. Git falls under distributed version control system.
Distributed Version Control System (DVCS)
Centralized version control system uses central server to store all files and enables team collaboration. But the major drawback of CVCS is single point of failure, i.e., failure of central server. Unfortunately, if central server goes down for an hour, then during that hour no one can collaborate at all. And even in worst case if disk of central server gets corrupted and proper backup haven’t taken, then you will lose entire history of the project. Here, DVCS comes into picture.
DVCS clients not only check out the latest snapshot of the directory but they also fully mirror the repository. If sever goes down, then repository from any client can be copied back to server to restore it. Every checkout is full backup of the repository. Git does not rely on central server that is why you can perform many operations when you are offline. You can commit changes, create branches view logs and perform other operations when you are offline. You require network connection only to publish your changes and take latest changes.
Advantages of Git
Free and open source
Fast and small
No need of powerful hardware
Git is released under GPL’s open source license. It is available freely over the internet. You can use Git to manage propriety projects without paying single penny. As it is open source, you can download its source code and also perform changes according to your requirements.
As most of the operations are performed locally, it gives huge benefit in terms of speed. Git does not rely on central server that is why for every operation there is no need to interact with remote server. Core part of the Git is written in C, which avoids runtime overhead associated with the other high level languages. Though Git mirrors entire repository, size of the data on the client side is small. This illustrates that how efficient it is at compressing and storing data on client side.
The chances of losing data are very rare when there are multiple copies of it. Data present on any client side is mirror of the repository, hence it can be used in the event of crash or disk corruption.
Git uses common cryptographic hash function called secure hash function (SHA1) to name and identify objects within its database. Every file and commit is check-summed and retrieved by its checksum at the time of checkout. Meaning that, it is impossible to change file, date, commit message and any other data from Git database without knowing Git.
In case of CVCS, the central server needs to be powerful enough to serve request of the entire team. For smaller team, it’s not an issue but as team size grows, the hardware limitation of the server can be a performance bottleneck. In case of DVCS, developers don’t interact with the server unless they need to push or pull changes. All the heavy lifting happens on the client side, so the server hardware can be very simple indeed.
CVCS uses cheap copy mechanism, means if we create new branch it will copy all code to the new branch, so it’s time consuming and not efficient. Also, deletion and merging of branches in CVCS is complicated and time consuming. But branch management with Git is very simple. It takes only few seconds to create, delete and merge branches.
Every VCS tool provides private workplace as a working copy. Developer does changes in his private workplace and after commit these changes become part of the repository. Git takes this one step further by providing them a private copy of the whole repository. User can perform many operations with this repository like add file, remove file, rename file, move file, commit changes and many more.
Working directory and staging area or Index
The working directory is the place where files are checked out. In other CVCS, developer generally does modification and commits his changes directly to the repository. But Git uses different strategy. Git doesn’t track each and every modified file. Whenever you do commit operation, Git looks for the files present in staging area. Only files present in the staging area is considered for commit and not all modified files.
Let us see basic workflow of the Git.
Step 1: You modify file from working directory.
Step 2: You add these files to the staging area.
Step 3: You perform commit operation that moves files from staging area. After push operation, it stores changes permanently to the Git repository.
Suppose you modified two files namely “sort.c” and “search.c” and you want two different commits for each operation. You can add one file in staging area and do commit. After first commit, repeat the same procedure for another file.
# First commit [bash]$ git add sort.c # adds file to the staging area [bash]$ git commit –m “Added sort operation”
# Second commit [bash]$ git add search.c # adds file to the staging area [bash]$ git commit –m “Added search operation”
Blob stands for Binary Large Object. Each version of file is represented by blob. A blob holds file data but doesn’t contain any metadata about file. It is a binary file, in Git database it is named as SHA1 hash of that file. In Git, files are not addressed by name. Everything is content-addressed.
Tree is an object, which represents a directory. It holds blobs as well as other sub-directories. A tree is a binary file that stores references to blobs and trees which is also named as the SHA1 hash of the tree object.
Commit holds the current state of the repository. A commit is also named by SHA1 hash. You can consider commit object as a node of the linked list. Every commit object has a pointer to the parent commit object. From given commit, you can traverse back by looking at the parent pointer to view the history of the commit. If a commit has multiple parent commits, that means the particular commit is created by merging two branches.
Branches are used to create another line of development. By default, Git has a master branch, which is same as trunk in Subversion. Usually to work on new feature, a branch is created. Once feature is completed, it is merged back with master branch and we delete the branch. Every branch is referenced by HEAD, which points to the latest commit in the branch. Whenever you make a commit, HEAD is updated with latest commit.
Tag assigns a meaningful name with a specific version in the repository. Tags are very similar to branches, but the difference is tags are immutable. Means tag is a branch, which nobody intends to modify. Once tag is created for particular commit, even if you create a new commit, it will not be updated. Usually, developer creates tags for product releases.
Clone operation creates the instance of the repository. Clone operation not only check outs the working copy, but it also mirrors the complete repository. User can perform many operations with this local repository. The only time networking gets involved is when the repository instances are being synchronized.
Pull operation copies changes from a remote repository instance to local one. The pull operation is used for synchronization between two repository instances. This is same as update operation in Subversion.
Push operation copies changes from a local repository instance to a remote one. This is used to store changes permanently into the Git repository. This is same as commit operation in Subversion.
HEAD is pointer, which always points to the latest commit in the branch. Whenever you make a commit, HEAD is updated with latest commit. The heads of the branches are stored in .git/refs/heads/ directory.
[CentOS]$ ls -1 .git/refs/heads/ master [CentOS]$ cat .git/refs/heads/master 570837e7d58fa4bccd86cb575d884502188b0c49
Revision represents the version of the source code. Revisions in Git are represented by commits. These commits are identified by SHA1 secure hashes.
URL represents the location of the Git repository. Git URL is stored in config file.
[tom@CentOS tom_repo]$ pwd /home/tom/tom_repo [tom@CentOS tom_repo]$ cat .git/config [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] url = email@example.com:project.git fetch = +refs/heads/*:refs/remotes/origin/*