Archive for the ‘Computers’ Category

Intro to Scala for Java Developers – slides

Monday, August 17th, 2009

Thought I’d post the slides of a talk I gave at work on Scala. We’re primarily a Java shop, and every week we do either a code review or a tech-related presentation.

Our domain at work is analyzing residential energy data, so the examples herein are tailored to that:

  • Read or Meter Read – Some amount of energy used over a period, e.g. “100kwh in the month of June”
  • Service Point – meta-data about an electric meter (the “point” at which “service” is available).

I also omitted a code demo where I refactored part of our codebase into Scala to show the difference (trust me, it was awesome!).

Simple Metrics for Team and Process Improvement

Monday, June 29th, 2009

Recently, the development team where I work has started collecting bona-fide metrics, based on our ticketing system. So few development shops (especially small ones) collect real information on how they work that it’s exciting that we’re doing it.

Here’s what we’re doing:

  • Number of releases during QA (we do a daily release, so more than daily is an indicator)
  • Defects found, by severity and priority
  • Average time from accepting a ticket (starting work) to resolving it (sending it for testing)
  • Number of re-opens (i.e. a defect was sent to testing, but not fixed)
  • Average time from resolving to closing (i.e. testing the fix)
  • Defects due to coding errors vs. unclear requirements (this is really great to be able to collect; with our company so new and small, we can introduce this and use it without ruffling a lot of feathers)

The tricky thing about metrics is that they are not terribly meaningful by themselves; rather they indicate areas for focussed investigation. For example, if it takes an average of 1 day to resolve a ticket, but 3 days to test and close it, we don’t just conclude that testing is inefficient; we have to investigate why. Perhaps we don’t have enough testers. Perhaps our testing environment isn’t stable enough. Perhaps there are too many show-stoppers that put the testers on the bench while developers are fixing them.

Another way to interpret these values is to watch them over time. If the number of critical defects is decreasing, it stands to reason we’re doing a good job. If the number of re-opens is increasing, we are packing too much into one iteration and possibly not doing sufficient requirements analysis. We just started collecting these on the most recent iteration, so in the coming months, it will be pretty cool to see what happens.

These metrics are pretty basic, but it’s great to be collecting them. The one thing that can make hard-core analysis of these numbers (esp. over time as the team grows and new projects are created) is the lack of normalization. If we introduced twice as many critical bugs this iteration than last, are we necessarily “doing worse”? What if the requirements were more complex, or the code required was just…bigger?

Normalizing factors like cyclomatic complexity, lines of code, etc, can shed some more light on these questions. These normalizing factors aren’t always popular, but interpreted the right way, could be very informative. We’re the same team, using the same language, working on the same product. If iteration 14 adds 400 lines of code, with 3 critical bugs, but iteration 15 adds 800 lines of code with 4 critical bugs, I think we can draw some real conclusions (i.e. we’re getting better).

Another interesting bit of data would be to incorporate our weekly code review. We typically review fresh-but-not-too-fresh code, mostly for knowledge sharing and general “architectural consistency”. If we were to actively review code in development, before it is sent to testing, we could then have real data on the effectiveness of our code reviews. Are we finding lots of coding errors at testing time? Maybe more code reviews would help? Are we finding fewer critical bugs in iteration 25, than in iteration 24 and 23, where we weren’t doing reviews? Reviews helped a lot.

These are actually really simple things to do (especially with a small, cohesive team), and can shed real light on the development process. What else can be done?

Stand While You Work!

Saturday, June 20th, 2009

After experiencing some back troubles recently, I was encouraged to work standing up. The pain relief was immediate, and for the past several months, it’s been great. I work most of the time standing, sitting for a few minutes if I get a bit tired. Not only is this great for my back, but it ensures I don’t work insane hours…I simply can’t stand for more than 8 hours a day. When I first brought the subject of standing up with my company’s office manager, she was open to whatever I wanted to do; I figured since it’s my issue to solve (and since I wasn’t yet sold on the idea), I’d make do with something and bring it in.

While Joel Spolsky outfits his offices with super fancy motorized desks that can go from standing to sitting with the flick of a switch, those desks were way out of my price range. Further, fixed height desks were also quite expensive (much like the word “wedding”, attaching the word “ergonominc” to something seems to double its price). Enter the Ikea Utby! The perfect size and perfect height, it looks great and was under $200!

Some might think it’s a bit small, but I find the more space I have, the bigger mess I make. The Utby is, for me, the perfect amount of space. Though, it’s so cheap, you could get two of them and make an awesome corner desk. I work from home on occasion and also work on side projects after work. Until recently I enjoyed the venerable (and, sadly, discontinued), Ikea Jerker. Last week, however, I was home recovering from back surgery, and was forbade by the doctor from sitting down. I had to use my own makeshift stand up desk out of a keyboard stand and ironing board. Pretty ghetto.

So, the Jerker is now in pieces and has been replaced by a second Utby at home. The sitting problem, both at home and at work is simple: a bar chair. I’ve got some plush comfy ones at home and bought a (reasonably) cheap Henriksdal for work. So, for less than $300, I have a nice looking desk at which I can stand or sit, and should have continued good back health. Even if you don’t have back problems, I highly recommend standing; it keeps me alert and focused and feels great. You just have to make sure you have comfortable shoes.

Lead or Bleed

Monday, May 25th, 2009

After reading all of The Passionate Programmer over a week or so, I’m going back through and looking at some of the “Act On It!” sections, where Chad Fowler recommends specific actions to kickstart/sustain/boost your career. The very first one, titled “Lead or Bleed?” suggests making a map of technologies, with “on the way out” on the left side and “bleeding edge” on the right side, then highlighting how well you know each thing. Here’s my stab at it:

Technololgies: Lead or Bleed

Green are things are know really well; yellow are things I could do at a job but am by no means an expert.

Obviously this is shaped by my own reality and what I perceive on the ‘net, and I omitted things like “C”, “UNIX” and “Windows”, because those are not really “on the way out” in the same way that C++ is (or that COBOL was, etc.).

Learning Cucumber – With Dynamic types must come documentation

Thursday, May 21st, 2009

Finally pulled the trigger on Cucumber, which allows one to write human-readable “features” that are essentially acceptance test cases. You can then execute them by adding some glue code in a mish-mash of Ruby DSLs and verify functionality.

It took me quite a while to decide to get going on this because the examples and documentation that are available are extremely hypothetical and very hand-wavy. A lot of the information glosses over the fact that you are still writing code and you still need to know what the keywords are and what data you will be given. Arcane non-human-readable symbols are almost preferable when getting started, because you don’t get distracted by English. This is why Applescript drives me insane.

At any rate, I found this page, which was pretty helpful. It shows testing a rails app using, among other things, webrat (another tool severely lacking in documentation but that is, nonetheless, pretty awesome).

I’m writing a basic wiki (for science) and so I thought a good feature would be “home page shows all pages in sorted order”, so I wrote it up:

Feature: pages are sorted
 As anyone
 The pages should be listed on the home page in sorted order

 Scenario: Visit Homepage
   As anyone
   When I visit the homepage
   Then The list of pages should be sorted

Now, the webrat/cucumber integration provided by rails means that the “plain English” has to actually conform to a subset of phrasing and terminology or you have to write the steps yourself (the features are everything under “Scenario”). It’s not hard to do that, and it’s not hard to modify the default webrat steps, but it was a distraction intially.

Next up, you implement the steps and here is where the crazy intersection of Ruby DSLs really made things difficult. The first two steps were pretty easy (”anyone” didn’t require any setup, and webrat successfully handled “When I visit the homepage”):

Then /The list of pages should be sorted/ do
  response.should # I have no clue wtf to do here
end

A puts response.class and a puts response.methods gave me no useful information. I eventually deduced that since Cucumber is a successor/add-on/something to RSpec, perhaps should comes from RSpec. This takes a Matcher and webrat provides many. Specifically, have_selector is available that allows selecting HTML elements based on the DOM.

Then /The list of pages should be sorted/ do
  response.should have_selector("ul.pages")
end

Success! (sort of). My feature executing is all green, meaning the home page contains <ul class=”pages”>. have_selector also takes a block (totally undocumented as to what it is or does in the webrat documentation):

Then /The list of pages should be sorted/ do
  response.should have_selector("ul.pages") do |pages|
    # WTF is pages and what can I do with it?
  end
end

A puts pages.class later and I realize this is a Nokogiri NodeSet. Now, I’m in business, though it would’ve been nice to be told WTF some of this stuff was and what I can/should do with it. At this point it was pretty trivial to select the page names from my HTML and check if they are sorted:

response.should have_selector('ul.pages') do |pages|
   page_names = []
   pages.should have_selector('li') do |li|
     li.should have_selector('.pagename') do |name|
       name.each do |one_name|
         page_names << one_name.content
       end
     end
   end
   assert_equal page_names,page_names.sort
   true
 end

(For some reason assert_equal doesn't evaluate to true when it succeeds and the block evaluates to false and then Cucumber/RSpec/Webrat/Ruby Gods claim my page is missing the ul tag). My initial implementation walked the DOM using Nokogiri's API directly, because I didn't realize that should had been mixed in (on?) to the objects I was being given. I'm still not sure if using that is the intention, but it seemed a bit cleaner to me.

So, this took me a couple of hours, mostly because of a combination of dynamic typing and lack of documentation. I'm all for dynamic typing, and I totally realize that these are free tools and all that. I think if the Ruby community (and the dynamic typing community in general) wants to succeed and make a case that dynamic typing, DSLs, meta-programming and all this (admittedly awesome and powerful) stuff enhance productivity, there has to be documentation as to the types of user-facing objects.

Now, given Github's general awesomeness, I'm totally willing to fork a repo, beef up the rdoc and request a pull, however I'm not even sure whose RDoc I could update to make this clear. Just figuring out that the have_selector in response.should have_selector is part of webrat was nontrivial (I had to just guess that should was part of RSpec and that the Webrat::Matchers module was mixed in). This is a problem and it's not clear to me how to solve it.

That being said, I was then able to create three more features using this system in about 10 minutes, so overall, I'm really happy with how things are working. Certainly if this were Java, I'd still be feeding XML to maven or futzing with missing semicolons. So, it's a net win for me.

Why maven drives me absolutely batty

Wednesday, May 13th, 2009

Although my maven bitching has been mostly snarky, I have come to truly believe it is the wrong tool for a growing enterprise and, like centralized version control, will lead to a situation where tools dictate process (and design).

But, what is maven actually good at?

  • Maven is great for getting started — you don’t have author an ant file (or copy one from an existing project)
  • Maven is great for enforcing a standard project structure — if you always use maven, your projects always look the same

This is about where it ends for me; everything else maven does – manage dependencies, automated process, etc., is done much better and much more quickly by other technology. It’s pretty amazing that someone can make a tool worse than ant, but maven is surely it

Dependency management is not a build step

Maven is the equivalent of doing a sudo gem update everytime you call rake, or doing a sudo yum update before running make. That’s just insane. While automated dependency management is a key feature of a sophisticated development process, this is a separate process from developing my application.

Maven’s configuration is incredibly verbose

It requires 36 lines of human-readable XML to have my webapp run during integration tests. Thirty Six! It requires six lines just to state a dependency. Examining a maven file and tying to figure out where you are in its insane hierarchy is quite difficult. It’s been pretty well-established outside the Java community that XML is horrible configuration file format; formats like YAML have a higher signal to noise ration, and using (gasp) actual scripting language code can be even more compact (and readable and maintainable).

The jars you use are at the mercy of Maven

If you want to use a third-party library, and maven doesn’t provide it (or doesn’t provide the version you need), you have to set up your own maven repo. You then have to include that repo in your pom file, or in every single developer’s local maven settings. If you secure your repo? More XML configuration (and, until the most recent version, you had to have your password in cleartext…in a version 2 application). The fallout here is that you will tend to stick with the versions available publicly, and we see how well that worked out for Debian.

Modifying default behavior is very difficult

Since maven is essentially a very, very high-level abstraction, you are the mercy of the plugin developers as to what you can do. For example, it is not possible to run your integration tests through Cobertura. The plugin developers didn’t provide this and there’s no way to do it without some major hacking of your test code organization and pom file. This is bending your process to fit a tool’s shortcoming. This is limitation designed into maven. This is fundamentally different that “opinionated software” like Rails; Rails doesn’t punish you so harshly for wanting to tweak things; maven makes it very difficult (or impossible). There was no thought given in Maven’s design to using non-default behavior.

Extending Maven requires learning a plugin API

While you can throw in random Ant code into maven, the only way to create re-usable functionality is to learn a complex plugin API. Granted, this isn’t complex like J2EE is complex, but for scripting a build, it’s rather ludicrous.

Maven is hard to understand

I would be willing to bet that every one of my gripes is addressed through some crazy incantation. But that’s not good enough. The combined experience of the 7 developers at my company is about 70 years and not one of us can explain maven’s phases, identify the available targets, or successfully add new functionality for a pom without at least an hour on the net and maven’s documentation.

A great example is the release plugin. All five developers here that have used it go through the same cycle of having no idea what it’s doing, having it fail with a baffling error message, starting over and finally figuring out the one environment tweak that makes it work. At the end of this journey each one (myself included) has realized all this is a HUGE wrapper around scp and a few svn commands. Running two commands to do a source code tag and artifact copy shouldn’t be this difficult.

Maven’s command line output is a straight-up lie

[INFO] ------------------------------------------------------------------------
[ERROR] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Compilation failure

“Compilation failure”, but it’s own definition is a failure and therefore an error (not an informational message). Further, most build failures do not exit with nonzero. This makes maven completely unscriptable.

Maven doesn’t solve the problems of make

Ant’s whole reason for being is “tabs are evil”, and that tells you something. While maven’s description of itself is a complete fabrication, it at least has its heart in the right place. However, it STILL fails to solve make’s shortcomings wrt to java:

  • Maven doesn’t recompile the java classes that are truly out-of-date
  • Maven recompiles java classes that are not out-of-date
  • Maven doesn’t allow for sophisiticated behavior through scripting
  • Maven replaces arcane magic symbols with arcane magic nested XML (e.g. pom files aren’t more readable than a Makefile)

Maven is slow

My test/debug cycle is around a minute. It should be 5 seconds (and it shouldn’t require an IDE).

Conclusion

Apache’s Ivy + Ant is probably a better environment than maven for getting things done; a bit of up-front work is required, but it’s not an ongoing cost, and maintenance is much simpler and more straightforward. Tools like Buildr and Raven seem promising, but it might be like discussing the best braking system for a horse-drawn carriage; utterly futile and irrelevant.

Git Workflow with SVN

Tuesday, April 28th, 2009

The best way to get started with Git and have a better experience at work if you have to use SVN is to use git svn as a client to Subversion. You can take advantage of Git’s awesomeness while not requiring your team or infrastructure to change immediately.

Setup

git svn clone -t tags -T trunk -b branches svn+ssh://your.svn.com/path/to/svn/root
(This may take a while for a large or old svn repo)

Working on Trunk

The initial clone should leave you on git’s master branch, which is connected to svn’s trunk.

  1. git svn rebase # Optional: only if you want to get work from svn; you don’t have to
  2. Hack some code
  3. git add any new files you created.txt
  4. git commit -a
  5. Repeat from step 2 until done

Sharing Your Changes

You will rebase your changes against’s SVN’s (this means git will pretend you made all your changes from SVN’s current HEAD, not the HEAD you started with [you do this to avoid conflicts and merges, which SVN cannot handle]).

  1. git svn rebase
  2. git svn dcommit

If you got Conflicts

  1. Git will tell you about them, so go and resolve them
  2. For each file you had to resolve, git add the_filename
  3. git rebase --continue
  4. Repeat until done

Working with SVN’s branches

Suppose you need to do some work on a branch called 1.3.x in your SVN repo:

  1. git svn fetch # This updates your local copy of remote branches
  2. git checkout 1.3.x# This checks out a remote branch, which you shouldn’t work directly on
  3. git checkout -b 1.3.x-branch # This creates a local branch you can work on, based on the remote 1.3.x branch
  4. Hack some code
  5. git add and git commit -a as needed
  6. Follow same procedure as above for Sharing Your Changes. Git will send your changes to the 1.3.x branch in SVN and not the trunk

Merging the Changes You Made

Due to the way git interacts with SVN, you shouldn’t automatically just merge your branch work onto the trunk. This may create strange histories in SVN.

So What?

So, this isn’t buying you much more than you get with SVN. Yes, when you git checkout 1.3.x-branch it’s lightning fast, and you can work offline. Here’s a few things that happen to me all the time that would be difficult or impossible to do without Git.

Gotta Fix a Bug Real Quicklike

You are in the middle of working on a new feature and you need to to push out a bugfix in production code. Your in-development code can’t be checked into trunk:

  1. git stash
  2. git checkout production-branch-name
  3. git checkout -b bugfix-xyz
  4. Fix bugs
  5. git commit -a
  6. git svn dcommit
  7. git checkout master
  8. git stash apply

You are now back where you started, without a fake revision just to hold your code and you didn’t have to go checkout the branch elsewhere.

Can’t commit to SVN due to a release

Often, teams restrict commit access to SVN while a release is being prepared. If the team is releasing version 1.5 and I’m working on 1.6 features, there can be some period of time where I’m not supposed to commit, because the 1.5 release is being prepared and under feature freeze.

  1. git commit -a
  2. Continuing working

When feature freeze is over, then I’ll git svn dcommit to send my changes to the SVN server

Blocked on Feature X, Want to work on Feature Y

This happens to me quite frequently: I’m slated to work on a few features that aren’t interdependent. I start hacking away on Feature X and hit a roadblock and can’t continue working. I’ve got a half-implemented feature and I can’t make any forward motion until a meeting next week. Feature Y, on the other hand, is ready to go. This requires some planning ahead:

  1. git checkout master
  2. git checkout -b feature-X
  3. Work on Feature X
  4. git commit -a etc. as I work
  5. Get blocked; meeting next week. D’oh!
  6. git checkout master
  7. git checkout -b feature-Y
  8. Work on Feature Y

At this point, X and Y are on two local branches and I can switch back and forth as needed. Don’t underestimate how powerful this is, especially when you have certain features that are priorities, but can become blocked frequently. I can now easily put aside Feature Y once I have my meeting and start back up on Feature X. When I’m done, I git merge everything back to master and dcommit to SVN.

Type your log message, save it, realize you forgot to reference a bug ticket #

You have a bug tracker set up that links tickets and revisions; all you have to do is put the ticket # in your log message. It’s a nice feature, but I forget to do it frequently. As long as you haven’t done git svn dcommit, you can fix this:

  1. git commit --amend

Your editor will pop up and you can change the log message! Awesome.

Advanced Stuff

Once you get used to this, you will feel more comfortable doing some more advanced things.

Topic Branches

The most obviously beneficial was touched on above, but it boils down to: make every new feature on its own branch. This means you never work on master and you never work on an SVN branch. Those are only for assembling what you will send to SVN. This gives incredible flexibility to work on code when its convenient and not worry about checking in bad things. Git calls this topic branches.

Save your Experiments

If you do everything on a branch, you don’t have to delete your work, ever. You can go back and revisit experiments, or work on low-priority features over a long period of time with all the advantages of version control, but without the baggage of remote branches you have to share with the world.

Cherry Pick

With Git, you typically commit frequently and you restrict the scope of each revision. A commit in git is more like a checkpoint, and a push in Git is more like a commit in SVN. So, commit in git like crazy. What this lets you do is move diffs around. On several occasions, I’ve had some code on a branch that I needed to use, but didn’t want to pull in the entire branch. git cherry-pick lets me do that.

Mindlessly Backup Your Repo

  1. ssh your_user@some.other.box.com
  2. mkdir awesome_project
  3. cd awesome_project
  4. git init
  5. exit
  6. git remote add other-box your_user@some.other.box.com:/home/chrisb/awesome_project
  7. git push --all other-box
  8. echo "git push --force --all other-box" > .git/hooks/post-commit && chmod +x .git/hooks/post-commit

You now will back up your repository on every commit to the other box. Or, use GitHub!

REST Compliance Officer

Tuesday, March 17th, 2009

With regard to this blog on REST compliance

Me: The Gliffy API is RESTFul
REST Compliance Officer: Does a “PUT” update the data at the given URL?
Me: Yes.
RCO: Trick Question! It’s “URI”. Is the only way to create a new resource done with a “POST”?
Me: Yes.
RCO: Is there exactly one endpoint, from which any and all resource locators are discoverable?
Me: Um, no, that puts undue burden on the client libraries, and over-complicates what we were trying to accomp….
RCO: YOU ARE NOT RESTFUL! READ FIELDING’S DISSERTATION, THE HTTP SPEC AND IMPLEMENT AN RFC-COMPLIANT URI PARSER IN THREE DIFFERENT LANGUAGES. NEXT!

Thank GODS that REST doesn’t have a spec. If it did, it would still be in development.


P.S. If you are going to coin a term and you want to bitch about it being misused, maybe calling it a “style” isn’t the best idea.

Java Annotations – Java’s love of configuration over convention

Wednesday, March 11th, 2009

In the beginning, EJB was a bloated mess of XML configuration files that allowed some sort of ultimate flexibility that absolutely no one needed nor cared about. And it sucked. So developers started using conventions to keep track of the four classes required to make a remote method call, and XDoclet was created to automate the creation of the XML configuration files. And it sucked less. Following in EJB’s footsteps, Hibernate did the same thing. And XDoclet followed. And it still sucked.

So, annotations were created to essentially formalize what XDoclet was doing, instead of considering how horribly broken the implementation of J2EE or Hibernate was. And now that we have annotations, the “implementation pattern” of “ultimate flexibility through annotations” has made its way into numerous Java frameworks, such as JAX-RS and JPA.

Regarding JPA:

@Id
@GeneratedValue
@Column(name="person_id")
public int getPersonId() { return personId; }

This is not a significant improvement over XDoclet; the only benefit is if you mistype “GeneratedValue”, the compiler will catch it. I shouldn’t have to type “GeneratedValue” in the first place. Unless I’m doing something non-standard. Which I almost never do.

I have a Person class with a getPersonId method. Can JPA just assume that it maps to the PERSON table, and the PERSON_ID, respectively. Further, couldn’t it figure out that it’s the auto-generated primary key since the schema says primary key auto increment. All the information is there and available to the framework to figure this out.

The same goes for EJB. I have a class named FooStatelessBean. How about we assume it’s a stateless session bean, and it’s interface is defined by its public methods? It can then provide FooRemote and FooLocal for me, and I don’t need to configure anything or keep three classes in sync.

Just because Java doesn’t have all the Ruby dynamic magic doesn’t mean we can’t make things easy. In reading Surya Suravarapu’s blog post about CRUD via JAX-RS I can’t help wondering why it takes so much code to call a few methods on a class?

Did the designers of JAX-RS not even look at how Rails does things? I get a PUT to the url /customers/45. We should default to calling put(45) on the class CustomersResource. Only if I want to obfuscate what’s going (e.g. by having FooBar.frobnosticate() handle the request) should I be required to provide configuration.

Even in Surya’s example code, he’s following some conventions: His resource class is suffixed with Resource and his POST method is prefixed add. This should be baked into the spec. It’s like EJB all over again with the common conventions that aren’t supported by the framework because of too much useless flexibilty.

Supporting convention over configuration is easy in Java. In just a few hours, I had a tiny web framework that proves it1. It wouldn’t take much more effort to allow the default behavior to be overridden, but, unlike JAX-RS, EJB, or even the Servlet spec itself, it doesn’t punish developers who follow conventions. It makes their lives easier and thus encourages good behavior.

So, the point of all this is that annotations encourage bad framework design; unnecessary configuration is a major part of many Java frameworks and specs. And I have no idea why.


1it unfortunately breaks down at the UI layer, due to a statically typed and compiled language not being a great choice for implementing web UIs, but that’s another issue.

Git, GitHub, forking: the new hotness

Thursday, February 5th, 2009

While working on my Gliffy Ruby Client, I decided I wanted a better way to describe the command line interface. Finding nothing that was any good, I whipped up GLI and refactored my Giffy command line client to use it. While doing that, I finally got annoyed at technoweenie’s version of rest-client, and also noticed that the original author’s had totally changed interfaces. So, clicked the nice “Fork” button on GitHub to get my own copy and fixed the issues. But that’s not the cool part. The cool part is that I can change my Gliffy gem to depend on my rest-client implementation and, viola! No special instructions, no hacks, no nothing. This is a really cool thing that would be difficult with Subversion, impossible without RubyGems, and downright painful without GitHub.