What Is Git

In the previous chapter, we understood what Version Control Systems are and what are the advantages of using them. Over time, multiple different Version Control Systems have been developed and popularized. Most of the banks on similar terminology and features, but some are customized towards the specific needs of any particular project than the rest, and hence are more suited. For example, Baazar, a distributed version control system offers fine-grained control over the distributed versioning system by placing the main server which accepts or rejects changes coming from local copies of the project. However, the most popular and widely used version control system is Git. Git is a fast, flexible and scalable version control system that was developed by Linus Torvalds, who is also the creator of Linux.

Git is a distributed version control system that is currently being used by a staggering number of software projects, which includes open-source and commercial projects. Compared to previous Version Control Systems, which used a centralized server to store the project versions and history, Git is distributed and hence the changes can be merged and synced across multiple machines, even when a central server is not available. Git has also been adapted to various popular text editors, IDE's and other toolings that makes it ubiquitous for developers.

Why use Git?

There are several reasons why you should use Git. From a software development perspective, here are a few reasons:

Git is relatively easy to pick and learn with extensive documentation and learning resources.
Git can be used with popular text editors, IDE's and other Git-based providers like GitHub, GitLab and BitBucket.
Git is efficient, fast, provides a handy command-line interface (along with GUI clients) and does not involve any boilerplate to get started.
Git helps you track the history, version and associated metadata of your project inside a "repository".
Git allows you to tag and annotate versions of your project, help you revert to a previous version and maintain the entire history.
Git allows you to parallelly develop your project in "branches" without breaking any part of the working code.
Git allows you to collaborate with multiple developers on the project without overwriting each other's changes.

With the rise of Version Control Systems and especially Git, it becomes an absolute necessity to use one of them. Git cannot be just used for large scale projects, like the Linux kernel or even Git itself, but it can be used for small scale projects like a personal website or a homework assignment. The support provided by GitHub and GitLab has heralded Git to be used even various other aspects of software development: testing changes based on specific Git events, pushing a software release after a tag, running specific events based on Git commands, etc.

Git terminology

To understand Git better, it is necessary to understand the terminology used in Git. Most of the terms used in the terminology would be better explained in the following chapters and sections. As a reader, don't fret out if you don't understand a term; We will uncover all the terms herewith practical demonstration and examples.

Repository: Git stores information in a data structure called a repository. A repository is a place where all the versions and history of the code is stored and made available to all the contributors. While making a repository for the first time, with no files or directories, you will find the .git directory in the root of the repository.
Untracked files: Git tracks all the files and directories that are part of the repository. Untracked files are files that are not tracked by Git and might have been newly created or deleted. They have not been added to the repository yet.
Staged files: Git has a staging area where all the files and directories are stored. Staged files are files that have been added to the repository.
Modified files: All files that have been added to the repository and have been modified are called modified files.
Staging area: The staging area denotes the files that have been added to the repository and have not been committed yet. It prepares the files to be added to the repository after a commit has been made.
Working tree: Working tree is the directory where the untracked files are stored. If you make changes to untracked files, and do not move them to the staging area and then commit them, your changes might be lost.
.gitingore: A file called .gitignore is used to ignore files and directories that are not tracked by Git.
Hash: A hash is a unique identifier of a file or a directory. It is generated by Git and is used to identify the version of a file or a directory. Git hashes are 160-bit long and are generated using the SHA-1 algorithm. Git can demonstrate what files have been changed by comparing the hashes of the files.
Branches: Git branches are a series of linked commits that are parallelly used to work on a new feature or to fix a bug. It ensures that the changes don't interfere with the other branches or the main/master branch.
HEAD: HEAD is a symbolic reference to the latest commit in the current branch.
Commit: A commit is a snapshot of the state of the repository at a particular point in time. It is a collection of all the files and directories that have been added or modified in the repository.
Remote: A remote is a named reference to another Git repository. Git has origin as the default remote for all the Git operations.

We also have various Git commands for different purposes and coordinate some of the operations. Some of them include git add for moving files from the working tree to the staging area, git commit for making a commit with all the changes, git push to push your changes to the remote and more.

Conclusion

One of the major differences between Git and other Version Control Systems is how Git handles the data. While Git stores the information in a repository, where all the versions and history of the code is stored, other Version Control Systems store information as a set of files and the changes made to each file over time.

Using Git, every time you commit your changes, a snapshot is created for that specific moment and a reference to that is stored in the .git directory. This way, Git does not need to store the file once again, making it more efficient and less memory intensive. Git just needs to create a hash and link to the previous snapshot, thus making the entire log of Git more like a snapshot stream.

In the next chapter, we will understand the essential Git commands and their usage with practical examples. We will also get hands-on practice with the Git workflow and how do you efficiently collaborate using Git.