Posts Tagged ‘version control’

Git Workflow with SVN

Tuesday, April 28th, 2009

The best way to get started with Git and have a better experience at work if you have to use SVN is to use git svn as a client to Subversion. You can take advantage of Git’s awesomeness while not requiring your team or infrastructure to change immediately.


git svn clone -t tags -T trunk -b branches svn+ssh://
(This may take a while for a large or old svn repo)

Working on Trunk

The initial clone should leave you on git’s master branch, which is connected to svn’s trunk.

  1. git svn rebase # Optional: only if you want to get work from svn; you don’t have to
  2. Hack some code
  3. git add any new files you created.txt
  4. git commit -a
  5. Repeat from step 2 until done

Sharing Your Changes

You will rebase your changes against’s SVN’s (this means git will pretend you made all your changes from SVN’s current HEAD, not the HEAD you started with [you do this to avoid conflicts and merges, which SVN cannot handle]).

  1. git svn rebase
  2. git svn dcommit

If you got Conflicts

  1. Git will tell you about them, so go and resolve them
  2. For each file you had to resolve, git add the_filename
  3. git rebase --continue
  4. Repeat until done

Working with SVN’s branches

Suppose you need to do some work on a branch called 1.3.x in your SVN repo:

  1. git svn fetch # This updates your local copy of remote branches
  2. git checkout 1.3.x# This checks out a remote branch, which you shouldn’t work directly on
  3. git checkout -b 1.3.x-branch # This creates a local branch you can work on, based on the remote 1.3.x branch
  4. Hack some code
  5. git add and git commit -a as needed
  6. Follow same procedure as above for Sharing Your Changes. Git will send your changes to the 1.3.x branch in SVN and not the trunk

Merging the Changes You Made

Due to the way git interacts with SVN, you shouldn’t automatically just merge your branch work onto the trunk. This may create strange histories in SVN.

So What?

So, this isn’t buying you much more than you get with SVN. Yes, when you git checkout 1.3.x-branch it’s lightning fast, and you can work offline. Here’s a few things that happen to me all the time that would be difficult or impossible to do without Git.

Gotta Fix a Bug Real Quicklike

You are in the middle of working on a new feature and you need to to push out a bugfix in production code. Your in-development code can’t be checked into trunk:

  1. git stash
  2. git checkout production-branch-name
  3. git checkout -b bugfix-xyz
  4. Fix bugs
  5. git commit -a
  6. git svn dcommit
  7. git checkout master
  8. git stash apply

You are now back where you started, without a fake revision just to hold your code and you didn’t have to go checkout the branch elsewhere.

Can’t commit to SVN due to a release

Often, teams restrict commit access to SVN while a release is being prepared. If the team is releasing version 1.5 and I’m working on 1.6 features, there can be some period of time where I’m not supposed to commit, because the 1.5 release is being prepared and under feature freeze.

  1. git commit -a
  2. Continuing working

When feature freeze is over, then I’ll git svn dcommit to send my changes to the SVN server

Blocked on Feature X, Want to work on Feature Y

This happens to me quite frequently: I’m slated to work on a few features that aren’t interdependent. I start hacking away on Feature X and hit a roadblock and can’t continue working. I’ve got a half-implemented feature and I can’t make any forward motion until a meeting next week. Feature Y, on the other hand, is ready to go. This requires some planning ahead:

  1. git checkout master
  2. git checkout -b feature-X
  3. Work on Feature X
  4. git commit -a etc. as I work
  5. Get blocked; meeting next week. D’oh!
  6. git checkout master
  7. git checkout -b feature-Y
  8. Work on Feature Y

At this point, X and Y are on two local branches and I can switch back and forth as needed. Don’t underestimate how powerful this is, especially when you have certain features that are priorities, but can become blocked frequently. I can now easily put aside Feature Y once I have my meeting and start back up on Feature X. When I’m done, I git merge everything back to master and dcommit to SVN.

Type your log message, save it, realize you forgot to reference a bug ticket #

You have a bug tracker set up that links tickets and revisions; all you have to do is put the ticket # in your log message. It’s a nice feature, but I forget to do it frequently. As long as you haven’t done git svn dcommit, you can fix this:

  1. git commit --amend

Your editor will pop up and you can change the log message! Awesome.

Advanced Stuff

Once you get used to this, you will feel more comfortable doing some more advanced things.

Topic Branches

The most obviously beneficial was touched on above, but it boils down to: make every new feature on its own branch. This means you never work on master and you never work on an SVN branch. Those are only for assembling what you will send to SVN. This gives incredible flexibility to work on code when its convenient and not worry about checking in bad things. Git calls this topic branches.

Save your Experiments

If you do everything on a branch, you don’t have to delete your work, ever. You can go back and revisit experiments, or work on low-priority features over a long period of time with all the advantages of version control, but without the baggage of remote branches you have to share with the world.

Cherry Pick

With Git, you typically commit frequently and you restrict the scope of each revision. A commit in git is more like a checkpoint, and a push in Git is more like a commit in SVN. So, commit in git like crazy. What this lets you do is move diffs around. On several occasions, I’ve had some code on a branch that I needed to use, but didn’t want to pull in the entire branch. git cherry-pick lets me do that.

Mindlessly Backup Your Repo

  1. ssh
  2. mkdir awesome_project
  3. cd awesome_project
  4. git init
  5. exit
  6. git remote add other-box
  7. git push --all other-box
  8. echo "git push --force --all other-box" > .git/hooks/post-commit && chmod +x .git/hooks/post-commit

You now will back up your repository on every commit to the other box. Or, use GitHub!

Interviewing the Interviewer: A Rubric

Tuesday, October 7th, 2008

Sad to say, my time at Gliffy is at an end (:sniff:), so I’m heading back into the job pool. I was lucky to get some time in at Gliffy, because, living in Washington, DC, my opportunities for sexy cutting-edge jobs are about zilch. Instead, I’m facing a huge market of “Senior JBoss Portal Maintenance Archiect” type jobs.

I guess I should feel lucky that there’s lots of positions out there, but I really don’t want to be the cog in a huge machine. Gliffy has shielded me from the Horror That Can Be Consulting, so I need to keep my perspective in such trying times. So, calling on my experiences before Gliffy, I’ve made this handy rubric to make sure I explore all facets of a potential position.

I don’t expect anyone to get all positives and no negatives (I’ve certainly never been anywhere that perfect), bit it’s always good to know. For example, if I have to put up with PVCS, I better be sitting on an Aeron chair and be using a normalized database. Further, this is obviously in addition to standard questions regarding what the project is about (i.e. is it interesting) and what the people are like. A whole lot of this can be forgiven by being part of a great team or working on a really cool product.

Question Points for Points against
How do you fare on the Joel Test? · High score
· Good explanations for missing items
· Low score
· Never heard of it
Describe your development process · Structured
· Easily described
· Overly draconian
· Lack of
· Not easily described
What kind of computer will I be using to develop? · Two monitors
· Mac
· Linux
· Administrator access
· Vague answer
· Small monitor
· Windows
· Locked-down
Do you block certain sites or applications on your network? · Open network · Closed network
What is the physical environment like? · Good Chairs
· Private office
· Natural light
· Reasonable Temperature
· Old, crappy
· Bullpen style
Am I required or encouraged to use Windows? · No, few devs use Windows · Yes
What collaboration tools do you use? · Wikis
· IM
· Bug tracking
· Sensible PM
· Email word documents
· MS-project
· SharePoint
· Other proprietary crap (e.g. eRoom, Documentum)
· No tools
What are some of your HR policies? · Few, if any
· Loose dress code
· Flexible schedule
· Draconian
· Dress code
· Core hours
Can I see the code I will be working with? · Letting me see it
· Meaningful javadocs/API documentation
· Structured, consistent style
· Sensible class names and file organization
· Sane build process
· Not letting me see it
· Mix of styles
· No javadocs
· Empty javadocs
· Convoluted file organization
· Broken build file
How do you do testing? · Have testers
· Do unit tests
· Maintain tests
· Bug tracker
· Ad hoc
· Lip service to test-first
· No unit tests
What is your approach to configuration management? · Having an approach
· Git
· Database migrations
· Know the versions of 3rd party software/have a baseline configuration
· Organized
· No approach
· Perforce, ClearCase, other closed crud
· Shared drives
· Can’t describe versions/configuration
Can I see the database schema I’ll be working with? · Letting me see it
· Normalized
· Sane names (no TBL_* bullshit)
· Synthetic numeric keys
· Referential integrity
· Documented!
· Versioned!
· Not letting me see it
· Unnormalized
· Dumb names
· Incorrect types
· String-based keys
· No constraints
· Undocumented

What else am I missing?

Time Machine almost saved me, but git won out in the end

Friday, May 9th, 2008

So, I’m working on a project that’s using Subversion for version control. My network connection isn’t great, plus subversion is slow, plus git is (so far) pretty awesomely awesome. The way to interact with an SVN repository is via git-svn, that I talked about setting up previously. Everything’s been going great, however I don’t frequently commit to subversion. This week, we started setting up continuous integration for my work, so I did an git-svn dcommit, committing two days worth of changes. I had forgotten that I had made so many changes (including adding hibernate support). I misread the commit messages and thought something bad was happening. Control-C. git log. HEAD is recent. Last commit was….yesterday. Oh. Fuck.

I figure git-svn borked something, so I git-rest --hard. No effect. I’m starting to panic, now. almost 2 days of work lost is not something I’m looking forward to. I hasitly go into Time Machine and get the previous hours’ backup. But, I just hate that solution. I have no idea what happened, and my trust in Git (or my ability to use it) has to be restored. After IM’ing with a co-worker, I got to the bottom of it.

It turns out that I wasn’t paying attention to how git-svn works. What it does when you do a rebase or dcommit (which implicitly does a rebase), is to first undo all your changes since your last rebase/dcommit, and get the changes made to the SVN repository (it even says as much as the first line of the output). It then “replays” your commits to make sure there’s no conflicts.

By hitting Control-C in the middle of that, I manually caused the same situation that would happen if there were conflicts. Git stops, tells you to resolve conflicts, and asks you to git-rebase --continue. If I had just git-rebase --continue‘ed, I would be fine. Since I did a hard rest, I figured I was fucked. Enter the log.

.git/logs/HEAD contained information about all activity, including my missing commits. I grab the version numbers (which, in Git, are hashes of the entire repository), do a git-reset --hard big.honkin.git.hash.version and viola! everything’s back to how it was (the command ran instanteously, to boot).

Git and SVN: connecting git branches to svn branches

Monday, April 28th, 2008

Currently working on a project where Subversion is the CM system of choice. I’d like to use git, as it’s faster and doesn’t require so much network access. Plus, I’m hoping when it comes time to merge, I can simplify the entire process by using git’s allegedly superior merging technique. At any rate, I’ve got a branch on SVN to work on, and I want to track both that branch and the entire svn tree.

Saturday morning, I did a git-svn init from their repository. Today, after lunch, it finished. After doing a git-gc to clean up the checkout, it wasn’t clear how to connect branches. Following is what I did (assume my subversion branch is branches/FOO):

git-checkout -b local-trunk trunk
git branch local-foo FOO

The first thing creates a new branch called “local-trunk” started at “trunk” (which is the remote branch mapping to the subversion main trunk). The second command creates a new branch called “local-foo”, which is rooted at remote branch “FOO”. I have no clue why I couldn’t do the same thing twice, as both commands seem to do the same thing (the first switches to the branch “local-trunk” after creating it). But, this is what worked for me.

Now, to develop, I git checkout local-foo and commit all day long. a git-svn dcommit will send my changes to subversion on the FOO branch. I can update the trunk via git checkout local-trunk and git-svn rebase. My hope is that I can merge from the trunk to my branch periodically and then, when my code is merged to the trunk, things will be pretty much done and ready to go. We’ll see.

On a side note, the git repository, which contains every revision of every file in the subversion repository is 586,696 bytes. The subversion checkout of just the FOO branch is 1,242,636 bytes; over double the size, and there’s still not enough info in that checkout to do a log or diff between versions.

Distributed version control with Git for code quality and team organization

Tuesday, April 15th, 2008

In my previous post, I outlined a code review process I’ve been using with reasonably effectiveness. It’s supported, in my case, by the Git source code management tool (most known for it’s use in managing the Linux kernel). Git or, more generally, distributed development, can encourage some good quality control procedures in teams working on enterprise software. The lessons learned from the open source world (and the Linux kernel, in particular) can be applied outside the world of OSS and to the consultant-heavy world of enterprise/in-house software development.

The project I’ve been working on for the past several months has undergone what I believe to be a common change on in-house/enterprise software, which is that several new developers are being added to the project. Outside of the learning curve required with any new system, many of them are not seasoned Java developers, or are otherwise missing experience in some key technologies in use. While code reviews are a great way to ensure these developers are doing things the right way, there is still concern that their ability to commit to source control could be problematic for the entire team.

Consider a developer breaking the build, or incorrectly refactoring a key piece of shared code. A review of their commit and some continuous integration can help identify these problems, but, once identified, they must be removed from the codebase. In the meantime, the development team could be stuck with an unusable build. This can lead to two bad practices:

  • Commit very rarely
  • Get new changes from the repository only when absolutely needed

These “anti-practices” result in unreadable commit logs, difficult (or skipped) code reviews, duplication of code, and a general discoherence of the system. This is primarily due to the way most common version control systems work.

In reserved-checkout systems (e.g. PVCS, StarTeam) and concurrent systems (CVS, Subversion), there is the concept of the one true repository of code that is a bottleneck for all code on the project. The only way Aaron can use Bill’s code is for Bill to commit it to the repository and for Aaron to check it out (along with anything else committed since the last time he did so). The only way Carl can effectively review Dan’s code, or for the automated build to run his test cases, is to checkout code from the repository and examine/run it. This reality often leads to situations where each developer is operating on his own branch. The problem here is that CVS and Subversion suck at merging. This makes the branching solution effectively useless.

Enter Git. With Git, there is no central repository. Each developer is on his own branch (or his own copy of someone’s branch) and can commit to their heart’s content, whenever they feel they have reached a commit point. Their changes will never be forced upon the rest of the team. So, how does the code get integrated?

Developer’s submit their code to the team lead/integrator (who is the ultimate authority on what code goes to QA/production/the customer), who then reviews it and either accepts or rejects it. If code is rejected, the team lead works with the developer to get it accepted (either via a simple email of the issues, or more in-depth mentoring as needed). Git makes this painless and fast, because it handles merging so well.

Consider how effective this is, especially when managing a large (greater than, say, five) team of developers working concurrently. The only code that gets into the production build will have been vetted through the team lead; he is responsible for physically applying each developer’s patches (an action that takes a few minutes or even seconds in Git). Further, developers get instant feedback on their code quality. In most cases, bad commits are the result of ignorance and lack of experience. A code review, with instant feedback, is a great way to address both of those issues, resulting in a better developer and a better team, based on open, honest, and immediate communication.

Here’s how to set this up:

  1. Assign a team lead to integrate the code – this is a senior developers who can assess code quality, provide mentoring and guidance and can be trusted to put code into the repository destined for QA and production
  2. Each developer clones the team lead’s repository – This is done to baseline the start of their work
  3. Developers commit, branch, merge, and pull as necessary – Since Git makes merging simple, developer’s can have full use of all features of version control and can do so in their environment without the possibility of polluting the main line of development. They can also share code amongst themselves, as well as get updates from the team lead’s repository of “blessed” code1
  4. Developer’s inform the lead of completion
  5. Lead pulls from their repository – The lead reviews the developer’s changes and applies the patch to his repository. He can then exercise whatever quality control mechanisms he wishes, including automated tests, manual tests, reviews, etc2.
  6. Lead rejects patches he doesn’t agree with – If the patch is wrong, buggy, or just not appropriate in some way, the lead rejects the patch and provides the developer with information on the correct approach
  7. Lead accepts patches he does agree with – If the lead agrees with the patch, he applies it to his repository, where it is now cleared for QA

This may seem convoluted, but it actually carries little overhead compared to a junior developer performing a “nuclear bomb” commit that must then be rolled back. For much larger teams, the approach can be layered, with the primary team lead accepting patches only from lieutenants, who accept patches from the primary developers.

Unlike a lot of hand-wavy processes and practices, this model has been demonstrated effective on virtually every open source project. Even though the Linux kernel is one of the few to use technology to support this process (Git), every other large OSS project has the concept of “committers” who are the people allowed to actually commit. Anyone else wishing to contribute must submit patches to a committer, who then reviews and approves of their patch (or not).

I belive this would be highly effective in a professional environment developing in-house or enterprise software (especially given the typical love of process in those environments; this process might actually help!). I have been on at least three such projects where it would’ve been an enormous boon to quality (not to mention that the natural mentoring and feedback built into the process would’ve been hugely helpful for the more junior developers).

1 Git even allows a developer to merge certain commits from one branch to another. Suppose Frank is working on a large feature, and happens to notice a bug in common code. He can address that bug and commit it. Gary can then merge only that commit into his codebase to get the bugfix, without having to also take all of Frank’s in-progress work on the large feature. Good luck doing that with StarTeam.
2 A CI system could be set up in a variety of ways: it could run only against the lead’s “blessed” repository, or it could run against an intermediate repository created by the lead (who then blesses patches that pass), or it could be totally on its own and allow developers to submit against it prior to submitting to the lead.

Quick and Dirty Code Reviews: Check commit logs

Thursday, April 3rd, 2008
             Large maintenance
+          aggressive schedule
+       lots of new developers
+ minimal system documentation
 Need for highly efficient and
       effective QA procedures

Where I’ve been working for the past few months, we’ve been under the gun to meet an aggressive deadline. As management is want to do, they’ve added several new developers to the project. One of the many reasons why adding developers is ultimately a bad thing is that, in addition to the complexity in communication, there is a risk of innocent well-developed code being added to the codebase that is Just Plain Wrong. Our system has been in development for many years and contains a wide variety of coding styles, implementation patterns and idioms. Some of them should Never Be Followed Again, and some are the Correct Way of Doing Things. There’s really no easy way for a new developer to know what is what.

Now, outside of going back in time, creating pedantic test cases, gathering requirements and incessantly refactoring, we need an option to make sure bad code doesn’t get into the codebase. By “bad” I don’t mean poorly written or buggy code, I mean code that does not fit into the system as a whole. For example, a developer was writing some code to generate printable reports. His implementation resulted in a very nice report popping up in a Swing window in our application (our application is a Swing front-end to a JEE back-end). It was very well-implemented and looked great. However, everywhere else in the application, reports are generated by writing a PDF to disk and asking Windows (via JDIC) to “view” the file. This is the code I’m talking about.

Finding bad code

At the start of the project, we went through the arduous process of identifying each bit of work in an MS-Project file and assigning developers to it. New developers were given tasks that didn’t touch core components, while experienced developers got tasks involving major refacotrings or database changes. Our project lead suggested that each module undergo a code review. It sounds reasonable, however all of us let out a collective groan at the thought of wrangling together 3-4 developers once a week for an hour or two to go over printouts or screenfulls of code, much of which was simply being modified.

One of the senior developers proposed the solution we ultimately went with: senior developers get emailed the diffs of all commits and make sure to spend some time reading and reviewing those commits. Coupled with our policy of “commit early, commit often”, this has worked out great.

Diff-based code review

Here’s what you need:

  • A concurrent version control system developers trust. Recommed Git or Subversion if you must.
  • A simple script to email diffs on every commit. Usually included as an example hook for must version control systems.
  • IM clients (Google talk within GMail works in even the most oppressive environment)
  • A sane version control policy: committed code must:
    • Compile
    • Not prevent application deployment/startup
    • Not horribly break someone else’s code (optional)

    Developers should commit as frequently as they want (and preferably frequently). I typically commit code I feel is “done” but that might not add up to an actual feature. This requires accepting that head is not a real verison. Most real version control systems have the ability to tag, branch, etc. These features are for “real working versions”. The head of the trunk is not.

  • A sane coding style policy: if you must re-indent or change bracing style, do it in its own commit, outside of actual code changes. Better yet, don’t do it at all. Formatting changes can obscure the history of a piece of code and should be made minimally, if at all.

The “process” (if you want to even call it that) is:

  1. Diffs get emailed to the senior developers as soon as they happen
  2. Senior Developers read the diffs, using IM to discuss any issues
  3. If code does have issues, the diff is forwarded to the developer who committed, with comments on what to change and why (senior developers decide amongst themselves who will send the feedback, or a “lead developer” can, if one is identified)

Part of this requires some level of diplomacy, however a plain, to-the-point email on what the issues are with a piece of code, why the changes should be made, and a suggestion on how to make them should be digestible by anyone.

I’ve had great success with this, having caught a wide variety of problems (even in my code, by others) without having to have one meeting or to print out one sheet of code. The fact is, on a maintenance project, you aren’t reviewing the codebase, but changes to that codebase. Diffs are the way to understand what changes are being made.