How-to branch and patch SVN- or TFS/Codeplex-based open source projects with GIT (Part 1)

The last couple of weeks I worked a lot with NPanday which brings maven to .NET. But this post is not about NPanday but rather about the workflow for maintaining a personal branch and uploading patches.

I’m not a committer, but I had to do some changes to NPanday. I still want to version my changes. I also want to contribute patches, when I fix bugs that apply to the current trunk.

NPanday is hosted on Codeplex, and though accessible via the SvnBridge. So I chose to give git svn a try.

I also want to host my changes on github.

The Workflow

  • Once
    • Clone a svn repo to a local-git
    • Create a branch for svn updates, say codeplex
    • Create a GitHub repo and push both branches
  • Repeatedly
    • Do your work on master
    • Commit and push your work
    • Update codeplex from subversion and push it
    • Merge codeplex to master
  • Contribute and Commit
    • How to create and submit patches from our master
    • How to commit changes back to codeplex

The How-To

I don’t want to mess around with npanday, so I created a empty test-project on codeplex.: [SAMPLE PROJECT] Branch and Patch with GIT

Setup (Windows)

On mac you simply use mac ports, on linux I have no clue.

  • Install SVN binaries and add to %PATH% (download)

  • The binary mysys doesn’t come with git svn anymore, so you have to compile it yourself. Is not as hard as it sounds. Just download the fullinstall – it will run the compile for you. (download 1.7.0.2 , downloads) Don’t forget to add it to the path, too.

I’ll do everything on the command line. Red is SVN, green is GIT – they are in separate folders.

I use pictures. Type it yourself! It will help you to learn it.

Don’t be afraid reading the command line either. It’s like code! You’ll get used to it, so did I.

Once

Checkout and commit a text fileimage

Init and fetch the commited revision using svn git:

image

Now, since we want to do changes, lets create a branch codeplex but remain on master and do changes.

image 

We should make sure, that the branch codeplex is untouched:

image

I also created a project on github. So lets push it over there. Read how to setup your private key and connect here.

image

Now we have the two commits on the master:

tmp2E0B

And only one on codeplex:

tmpE826

Now we change our “subversion”-file:

image

And then we update it using git svn rebase on the codeplex branch. Sorry for not having accepted the certificate before. But its live! 🙂

image Result:

tmpD8EB

Now we want to have that change over on our work branch.

image

Voilá:

tmpF6A9

Now, this is our repo:

image

Enough for today. Let’s se what we covered:

Recap

  • Once
    • We initialized the empty codeplex repo with a single-line file.
    • Clone a svn repo to a local-git
      We used git svn init and fetch to get the codeplex contents in a local git repository.
    • Create a branch for svn updates, say codeplex
      We created the branch
    • Create a GitHub repo and push both branches
      We pushed both master and codeplex
  • Repeatedly
    • Do your work on master
      We added a file to our master.
    • Commit and push your work
      We pushed it to master. Both the source svn and codeplex branch remain untouched.
    • Update codeplex from subversion and push it
      We added a line and checked it into svn and then we got it into our codeplex branch did that.
    • Merge codeplex to master
      We did that too.

More soon! Probably next Friday:

  • Contribute and Commit
    • How to create and submit patches from our master
    • How to commit changes back to codeplex
Advertisement

Lillesand #1: Introducing the Work Breakdown Structure for Task Estimation

In a dream world a developer should code all the day, right? Well, a very common task beside coding though is to estimate tasks of manifold kinds. Estimating though, sucks.  For badly educated product managers and for many of our customers a estimate equals to a personal commitment. Therefore you better get it right the first time!

In many projects at itemis we use something we call WBS which stands for work breakdown structure. The term itself is not new, but rather PMI vocabulary. But the combination with three-point-estimates and some statistics makes it a very powerful tool.

Work Breakdown Structure

WBS is not more than breaking your tasks down to small units that can easily be estimated. It’s just a estimation technique and can be applied in any traditional or agile process.

Three-point Estimates

Three-point estimates help you to incorporate the risk into your estimate. Lets say I discovered a simple task, by breaking down the big project of writing a particular blog entry about WBS.

Example Task: Graphically illustrate the WBS domain model

Now think about how you potentially would solve this. The simplest solution would be to create a diagram of the classes I already have in Visual Studio. That would be almost no work. But those look to cheap and they contain the wrong details. So I’ll do it manually. Probably using OmniGraffle – a diagramming tool for Mac only.

Now, ask yourself three questions. How many hours would it take to finish this task, if…

  • … I’m highly motivated, in the middle of a diagramming rush, and no tools crash, no one calls me, nobody asks for help on some other task, …? You see where this goes. Since there is a great stencil available that I’ve used earlier, I’d just drag the classes on the canvas and the auto-layout will probably be good enough. Lets say, in the best case I’ll need about 1 hour to illustrate the domain model.
  • … I’ve done this before, and I know, visualizing is always tricky. Based on my experience I’d expect a little bit of trouble for the layout. It will probably also be necessary to adjust the stencil a little bit. Also I’d ask some of my colleagues for their opinion and then do some adjustments. In the average case I should be done in 4 hours.
  • … I’ve had a busy week, it’s Monday morning 9 am and I am absolutely not in the mood to work at all let alone do anything creative. OmniGraffle crashes all the time, and the stencil turns out to not do the job. The telephone rings all the time, and my room mate happens to have a loud discussion with his PM because he just spent 3 days on a task he estimated with 2 hours. Also I have to start from scratch once because I forgot to save, and another time because I just messed up the layout. This is not really likely to happen all at once, but in the worst case I’d be on the tasks for 12 hours. I couldn’t imagine how it could take longer.

Now I’ve got three numbers. In the best case 1 hour, at average 4 and in the worst case 12 hours.

Statistics

Now, you can neither tell your customer nor your PM that you need one and up to 12 hours to finish the task. For one single task this might even be correct, but when you have a set of many tasks it’s not very likely that the worst case applies for every task.

I’m not really into the details of the statistics, but I’ll try to explain them from what I’ve understood.

There is two interesting values that can be calculated per task. Let us just name

  • The pert is an educated guess. It considers the average case (A) for times more reliable than best (B) and worst (W) case.
    The pert formula: (B + 4A + W)/6
  • The variance is a value that indicates how different best and worst case are. There is a magic number involved assuming that in 50% of the cases, the actual hours will less or equal to the pert. This is based on a suggestion from the book Software Estimation by Steve McConnell, page 121 ff.
    The variance formula: (W–B / 1.4)^2

Applying these values to our task, we will result in following table. I also added another simple task

Task Best Average Worst Pert Variance
Graphically illustrate the WBS domain model

1

4

12

4.83

61.73

Embed the domain model illustration in the post*

0.08

0.25

0.50

0,26

0.09

Sums

1.08

4.25

12.5

5.10

61.82

Standard Deviation**        

7.86

* A small task with a small variance has little impact. As you would expect. But if the variance grows, even small tasks have great impact. As in reality.

** The standard deviation is calculated by powering the sum of all variances with 0.5.

The first interesting result is, that by a certainty of 50% I’d manage to fulfill the two tasks within 5.1 hours.

Based on the standard deviation combined with the normal distribution (Gaussian curve) we can create a table indicating by which certainty (in %) we are done within X days.

Since the variance in our case is higher than the expected value, some of the numbers won’t make sense.

Each of the percentages have a factor of the variance with which the result deviates from the pert value.

Formula: Pert + (Variance * Variance Factor)

by a certainty of

variance factor

done within (hours)

10% -1.28 (negative number)
16% -1 (negative number)
20% 0.84 (negative number)
25% 0.67 (negative number)
30% 0.52 1.01
40% 0.25 3.13
50% 0 5.10
60% 0.25 7.06
70% 0.52 9.19
75% 0.67 10.37
80% 0.84 11.70
84% 1 12.96
90% 1.28 15.16
98% 2 20.82

Result

No I can assure, that the task will be done within 20 hours by a certainty of 98%. 

This number is just to high, isn’t it? But did it never happen to you that a task took 4 times longer than expected? Not so uncommon, I’d say.

Maybe I wouldn’t put this number on the customer offer, but that is not what it is about either.

What it is about is knowing you risk and make educated decisions.

But not yet enough.

Team estimates

All those calculations still rely on one person doing expert estimates. So what about having multiple persons that estimate the tasks and then combining the results?

The concept is simple.

  1. Most likely two persons do the breakdown together. Either one does a first draft and the other reviews it, or what I rather prefer is that they pair on this step.
  2. The two or more do the estimations separately.

Why separately? Well, it is not about blaming people but about gathering quality data. And having a high variance between to peoples estimates on a certain task indicates a high level of uncertainty about what this task means in practice. The earlier you identify those spots, the better.

Just get the two to talk and eliminate the uncertainty. They can then either agree on a estimate, or what at least should happen they redo the estimates with a lower variance. Both results raise the quality of your data.

There is different ways of combining multiple estimates. The one that makes the most sense is taking the lowest best case, the average of the average cases and the highest worst case. This would give you a quite reliable result.

Checklists

Especially meant to improve the quality of the worst case estimates the estimating person is guided by a checklist about typical things to consider when estimating software projects.

The current ones we use at itemis now include around 50 entries four categories: Preconditions, Things typically forgotten, Non-functional requirements and Other crosscutting concerns

The Experiment codenamed “Lillesand”

3962721542_195313c680[1] “Lillesand” is beautiful a city in the south of Norway. I just spent my Christmas holidays there. But for this project it’s just a codename.

Currently we use a Excel sheet that helps us creating the WBS. This works fine. But its limited. It lacks support for gathering multiple WBS in a project and the calculations for team estimates are still done manually.

Since this is much about data and functions, I thought I’d try to build this using the technologies leveraged by SQL Server Modeling formerly called Microsoft Codename “Oslo”.

I already posted on the migration to the latest CTP. I’ll publish the sources soon.

I don’t really know where this will go, but I think it will be interesting to follow.