Skip to content
This repository has been archived by the owner on Nov 15, 2024. It is now read-only.

Commit

Permalink
updated the SHA and its problems
Browse files Browse the repository at this point in the history
  • Loading branch information
schacon committed Apr 21, 2008
1 parent ed109b9 commit ad2fd3c
Show file tree
Hide file tree
Showing 16 changed files with 29 additions and 29 deletions.
2 changes: 1 addition & 1 deletion text/s0-c03-short-history.textile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ h2. A Short History of Git

The Git project started with Linus Torvalds scratching the very serious itch of needing a fast, efficient and massively distributed source code management system for Linux kernel development.

The kernel team had moved from a patch emailing system to the proprietary BitKeeper SCM in 2002. That ended in April 2005 when BitMover stopped providing a free version of it's tool to the open source community because they felt some developers had reverse engineered it in violation of the license.
The kernel team had moved from a patch emailing system to the proprietary BitKeeper SCM in 2002. That ended in April 2005 when BitMover stopped providing a free version of its tool to the open source community because they felt some developers had reverse engineered it in violation of the license.

Since Linus had (and still has) a passionate dislike of just about all existing source code management systems, he decided to write his own. Thus, in April of 2005, Git was born. A few months later, in July, maintenance was turned over to Junio Hamano, who has maintained the project ever since.

Expand Down
4 changes: 2 additions & 2 deletions text/s1-c01-what-is-git.textile
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ h2. What is Git?

Git is a stupid content tracker. That is probably the best description of it - don't think of it in a 'like (insert favorite SCM system), but...' context, but more like a really interesting file system.

Git tracks content - files and directories. It is at it's heart a collection of simple tools that implement a tree history storage and directory content management system. It is simply used as an SCM, not really designed as one.
Git tracks content - files and directories. It is at its heart a collection of simple tools that implement a tree history storage and directory content management system. It is simply used as an SCM, not really designed as one.

note. "In many ways you can just see git as a filesystem — it's content-addressable, and it has a notion of versioning, but I really really designed it coming at the problem from the viewpoint of a filesystem person (hey, kernels is what I do), and I actually have absolutely zero interest in creating a traditional SCM system." - "Linus":http://marc.info/?l=linux-kernel&m=111314792424707

When most SCMs store a new version of a project, they store the code delta or diff. When Git stores a new version of a project, it stores a new _tree_ - a bunch of blobs of content and a collection of pointers that can be expanded back out into a full directory of files and subdirectories. If you want a diff between two versions, it doesn't add up all the deltas, it simply looks at the two trees and runs a new diff on them.

This is what fundamentally allows the system to be easily distributed - it doesn't have issues figuring out how to apply a complex series of deltas, it simply transfers all the directories and content that one user has and another does not have but is requesting. It is efficient about it - it only stores identical files and directories once and it can compress and transfer it's content using delta-compressed packfiles - but in concept, it is a very simple beast. Git is at it's heart very stupid-simple.
This is what fundamentally allows the system to be easily distributed - it doesn't have issues figuring out how to apply a complex series of deltas, it simply transfers all the directories and content that one user has and another does not have but is requesting. It is efficient about it - it only stores identical files and directories once and it can compress and transfer its content using delta-compressed packfiles - but in concept, it is a very simple beast. Git is at it's heart very stupid-simple.
2 changes: 1 addition & 1 deletion text/s1-c02-focus-design.textile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Git is very efficient. Compared to many popular SCM systems, it seems downright

note. The Ruby on Rails Git repository download, which includes the full history of the project - every version of every file, weighs in at around 13M, which is not even twice the size of a single checkout of the project (~9M). The Subversion server repository for the same project is about 115M.

Git also is efficient in it's network operations - the common Git transfer protocols transfer only packed versions of only the objects that have changed. It also won't try to transfer content twice, so if you have the same file under two different names, it will only transfer the content once.
Git also is efficient in its network operations - the common Git transfer protocols transfer only packed versions of only the objects that have changed. It also won't try to transfer content twice, so if you have the same file under two different names, it will only transfer the content once.

h3. A Toolkit Design

Expand Down
8 changes: 4 additions & 4 deletions text/s1-c05-the-data-model.textile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
h2. The Git Data Model

In computer science speak, the Git object data is a Directed Acyclic Graph. That is, starting at any commit you can traverse it's parents in one direction and there is no chain that begins and ends with the same object.
In computer science speak, the Git object data is a Directed Acyclic Graph. That is, starting at any commit you can traverse its parents in one direction and there is no chain that begins and ends with the same object.

All commit objects point to a tree and optionally to previous commits. All trees point to one or many blobs and/or trees. Given this simple model, we can store and retrieve vast histories of complex trees of arbitrarily changing content quickly and efficiently.

Expand All @@ -10,7 +10,7 @@ h3. References

In addition to the Git objects, which are immutable - that is, they cannot ever be changed, there are references also stored in Git. Unlike the objects, references can constantly change. They are simple pointers to a particular commit, something like a tag, but easily moveable.

Examples of references are branches and remotes. A branch in Git is nothing more than a file in the _.git/refs/heads/_ directory that contains the sha of the most recent commit of that branch. To branch that line of development, all Git does is create a new file in that directory that points to the same sha. Then, as you continue to commit, one of the branches will keep changing to point to the new commit shas, while the other one can stay where it was. _(Don't worry, we'll go over this again a bit later...)_
Examples of references are branches and remotes. A branch in Git is nothing more than a file in the _.git/refs/heads/_ directory that contains the SHAof the most recent commit of that branch. To branch that line of development, all Git does is create a new file in that directory that points to the same sha. Then, as you continue to commit, one of the branches will keep changing to point to the new commit shas, while the other one can stay where it was. _(Don't worry, we'll go over this again a bit later...)_

h3. The Model

Expand All @@ -34,15 +34,15 @@ When we first commit this tree, our Git model may look something like this:

We have 3 trees, 3 blobs, 1 commit that points to the top of the tree, the current branch pointing to our last commit and the HEAD file pointing to the branch we're currently on to let Git know which commit will be the parent for the next commit.

Now let's assume that we change the _lib/base/base_include.rb_ file and commit again. At this point, a new blob is added, which changes the tree that points to it, which changes the tree that points to that tree and so on to the top of the entire directory. Then a new commit object is added which points to it's parent and the new tree, then the branch reference is moved forward.
Now let's assume that we change the _lib/base/base_include.rb_ file and commit again. At this point, a new blob is added, which changes the tree that points to it, which changes the tree that points to that tree and so on to the top of the entire directory. Then a new commit object is added which points to its parent and the new tree, then the branch reference is moved forward.

Let's also say at this point we tag this commit as a release, which adds a new tag object. At this point, we'll have the following in Git:

!vector/Object_DAG_Tree2.eps!

Notice how the other two blobs that were not changed were not added again. The new trees that were added point to the same blobs in the data store that the previous trees pointed to.

Now let's say we modify the _init.rb_ file at the base of the project. The new blob will have to be added, which will add a new top tree, but all the subtrees will not be modified, so Git will re-use those references. Again, the branch reference will move forward and the new commit will point to it's parent.
Now let's say we modify the _init.rb_ file at the base of the project. The new blob will have to be added, which will add a new top tree, but all the subtrees will not be modified, so Git will re-use those references. Again, the branch reference will move forward and the new commit will point to its parent.

!vector/Object_DAG_Tree3.eps!

Expand Down
6 changes: 3 additions & 3 deletions text/s1-c06-branching-and-merging.textile
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ h2. Branching and Merging

Here we come to one of the real strengths of Git, cheap inline branching. This is a feature that truly sets it apart and will likely change the way you think about developing code once you get used to it.

When you are working on code in Git, storing trees in any state and keeping pointers to them is very simple, as we've seen. In fact, in Git the act of creating a new branch is simply writing a file in the '.git/refs/heads' directory that has the sha of the last commit for that branch.
When you are working on code in Git, storing trees in any state and keeping pointers to them is very simple, as we've seen. In fact, in Git the act of creating a new branch is simply writing a file in the '.git/refs/heads' directory that has the SHAof the last commit for that branch.

note. Creating a branch is nothing more than just writing 40 characters to a file

Switching to that branch simply means having Git make your working directory look like the tree that sha points to and updating the HEAD file so each commit from that point on moves that branch pointer forward (in other words, it changes the 40 characters in '.git/refs/heads/[current_branch_name]' be the SHA of your last commit).
Switching to that branch simply means having Git make your working directory look like the tree that SHApoints to and updating the HEAD file so each commit from that point on moves that branch pointer forward (in other words, it changes the 40 characters in '.git/refs/heads/[current_branch_name]' be the SHA of your last commit).

Merging is also easy, compared to most SCM systems - is simply merging the trees that the commits you are telling it to merge are pointing to, which is much simpler than resolving a bunch of deltas.

Expand Down Expand Up @@ -89,7 +89,7 @@ So why is this helpful, exactly? It means that you can keep your development cy

You have a 'master' branch that is _always_ stable - you never merge anything into it that you wouldn't put into production. Then you have a 'development' branch that you merge any experimental code into before you imagine pulling it into the 'master' branch.

You create a new branch each time you begin to work on a story or feature, branching it off your current 'development' branch each time, so if you get blocked and need to put it on hold, it doesn't effect anything else. When you do get back to them, you rebase them to the current 'development' and it's just like you started from there. Often times you merge the branch back into 'development' and delete it the same day that you created it.
You create a new branch each time you begin to work on a story or feature, branching it off your current 'development' branch each time, so if you get blocked and need to put it on hold, it doesn't effect anything else. When you do get back to them, you rebase them to the current 'development' and it is just like you started from there. Often times you merge the branch back into 'development' and delete it the same day that you created it.

If you get a huge project or idea - say refactoring the entire code base to the newest version of your framework or switching database vendors or something, you create a long-term branch, continuously rebase it to keep it in line with other development, and once everything is tested and ready, merge it in with your master.

Expand Down
2 changes: 1 addition & 1 deletion text/s1-c07-treeish.textile
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ h4. Tree Pointer

code. e65s46^{tree}

This points to the tree of that commit. Any time you add a ^{tree} to any commit-ish, it resolves to it's tree.
This points to the tree of that commit. Any time you add a ^{tree} to any commit-ish, it resolves to its tree.

h4. Blob Spec

Expand Down
2 changes: 1 addition & 1 deletion text/s1-c07a-git-directory.textile
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ h3. .git/refs/

This directory normally has three subdirectories in it - _heads_, _remotes_ and _tags_. Each of these directories will hold files that correspond to your local branches, remote branches and tags, respectively.

For example, if you create a 'development' branch, the file .git/refs/heads/development will be created and will contain the sha of the commit that is the latest commit of that branch.
For example, if you create a 'development' branch, the file .git/refs/heads/development will be created and will contain the SHAof the commit that is the latest commit of that branch.

h3. .git/HEAD

Expand Down
4 changes: 2 additions & 2 deletions text/s1-c10-non-scm-uses.textile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Imagine you are a retail chain or university campus and have a network of digita

You need to build a content distribution framework that will easily and efficiently transfer all the necessary content to the machines on your network. You need to constantly determine what content each machine has and what it needs to have and transfer the difference as efficiently as possible, because networking to these units may be spotty.

It turns out that Git is an excellent solution to this problem. You can simply check all the needed content into Git, create a branch for each unit and point that branch to the exact subtree of content it needs. Then at some interval, you have the unit fetch it's branch. If nothing has changed, nothing happens - if content has changed somehow, it gets only the files it does not already have in a delta compressed package and then expands them locally. Log and status files could even be transferred back by a push.
It turns out that Git is an excellent solution to this problem. You can simply check all the needed content into Git, create a branch for each unit and point that branch to the exact subtree of content it needs. Then at some interval, you have the unit fetch its branch. If nothing has changed, nothing happens - if content has changed somehow, it gets only the files it does not already have in a delta compressed package and then expands them locally. Log and status files could even be transferred back by a push.

An example of a media company actually using this approach is "Reactrix":http://reactrix.com/, which also happens to be where I work.

Expand Down Expand Up @@ -40,7 +40,7 @@ Examples of projects trying to do this are

h3. Backup Tool

Let's say you want to build something like a distributed Time-Machine (Apple all rights reserved) that efficiently packs up it's backups and transfers them to multiple machines. I'm hoping by now that you could see the benefits of using the Git toolkit to accomplish this, but this particular problem is interesting because of something that Git doesn't do, which is permissions. Git stores the mode of it's content in the tree, but it doesn't store any permissions data, which means it's not good for backing up directories in which permissions are important, like '/etc' for example.
Let's say you want to build something like a distributed Time-Machine (Apple all rights reserved) that efficiently packs up its backups and transfers them to multiple machines. I'm hoping by now that you could see the benefits of using the Git toolkit to accomplish this, but this particular problem is interesting because of something that Git doesn't do, which is permissions. Git stores the mode of its content in the tree, but it doesn't store any permissions data, which means it's not good for backing up directories in which permissions are important, like '/etc' for example.

One project that has tackled this is "Gibak":http://eigenclass.org/hiki/gibak-backup-system-introduction, by implementing a metastore in OCaml, and it's worth a look if this topic interests you.

Expand Down
2 changes: 1 addition & 1 deletion text/s2-c02-getting-a-git-repo.textile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This will add all of your current files into your new repository and index and t

h3. Cloning a Repository

Many times you will be _cloning_ a repository, however. This means that you are creating a complete copy of another repo, including all of it's history and published branches.
Many times you will be _cloning_ a repository, however. This means that you are creating a complete copy of another repo, including all of its history and published branches.

note. A clone is, for all intents and purposes, a full backup. If the server that you cloned from has a hard disk failure or something equally catastrophic, you can basically take any of the clones and stick it back up there when the server is restored without anyone really the worse for wear.

Expand Down
2 changes: 1 addition & 1 deletion text/s2-c04-log-commit-history.textile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ If you just run _git log_, you will get output like this:

shell. git-log.txt

This will show you the sha of each commit, the committer and date of the commit, and the full message, starting from the last commit on your current branch and going backward in reverse chronological order (so if there are multiple parents, it just squishes them together, interleaving the commits ordered by date)
This will show you the SHAof each commit, the committer and date of the commit, and the full message, starting from the last commit on your current branch and going backward in reverse chronological order (so if there are multiple parents, it just squishes them together, interleaving the commits ordered by date)

h3. Formatting Log Output

Expand Down
Loading

0 comments on commit ad2fd3c

Please sign in to comment.