Integrating Heroku addon SSO in Rails using Devise

I was going through the process of integrating Heroku SSO with the addon I am developing and although it was pretty straightforward I was surprised that I couldn’t find an existing gem that would add this functionality to Devise.

Once I had it all working in my rails app via a custom Warden strategy I decided that I would extract it into a Rails engine and commit it to Github so that maybe others can use it.

It’s still pretty rough and there are a number of things I’m not happy with in the code, but it works for my purposes. I will continue to update and maintain it to keep it working in my projects and if there is some positive response for it in the community I’d love to add more features and clean things up to make it more useful to everyone.

So here is the github: devise-heroku

A few things I don’t like that i would love some feedback on or help in fixing:

  • Integration feels like its a bit too hard.. i would love to be able to find a way to avoid making the user edit their devise.rb file and specify the scope
  • would love to be able to add this to the model being used in devise and just specify :heroku_sso_authenticable

 

Adding to a Devise model after creation (Rails 3.1)

This blog post is going to be fairly light on content, but it was something I googled quite a bit this weekend until i realized that Rails 3.1 provides a pretty awesome option.

The problem I had is that i have an existing Devise model and I wanted to add the :confirmable trait to that model so that when a user registers they have to provide a valid email address in order to use the site. One suggestion online was to just regenerate the model and use the newer migration but that seemed like overkill.  Another answer was to add the columns yourself.  The “up” migration was easy and just involved modifying the table to add “t.confirmable” which would generate the proper columns. The “down” migration was ugly though because you had to have knowledge of each column that devise would add when you put confirmable.

A better solution? Rails 3.1 Reversible Migrations to the rescue! If you are using rails 3.1 instead of having “up” and “down” methods you can just have a “change” method and for the most part, Rails and ActiveRecord are smart enough to figure out how to migrate that up and down (there are a few caveats, such as removing columns when migrating up.) So in the end, the code to add confirmable columns to the DB was quite simple:

 

Why I Go Home: A Developer Dad’s Manifesto

TL;DR I love my job, I love my career, I love solving hard problems, and I love crafting great software. Just not as much as a I love my daughter.

When I was younger, i was one of the developers who would get to work early, code all day, leave the office after everybody else, and then get back online and code at night. It didn’t matter what I was coding on, I just wanted to be coding. Then a funny thing happened. My wife and I had a baby.

When I found out that Jen was pregnant I was ecstatic. Not only because getting to that point was a difficult process, but because I have always known that I wanted children. I wanted to be a father. No, I wanted to be a great father. I made a promise to myself and my unborn child that I would, at the very least, be a father who was present, and around, and available.

At this point you may be thinking “How is that going to work? The caffeine-fueled, crunch-time, death march prone careers of developers don’t exactly jive with being home for dinner.” This is true. For me this came down to priorities and a simple realization: If you screw up at your job you can always get another one, but if you screw up your family, especially your relationship with your children, it will stay with you and stay screwed up forever.

So I made the choice that I would be home to spend time with my daughter everyday, even if it was going to adversely affect my career. So I get to the office around 6:30-6:45am every day, I put in about 9 solid hours of work, and I take off to head home around 4pm. The hours between 4:30 and 7:30pm are sacred. They belong to my daughter. The hardest part about this is that I work with a lot of people on the west coast (who, as a function of culture tend to start their days later) and its very natural for them to schedule meetings at 2pm or 3pm PST. This directly conflicts with the time I have set aside for playing with my daughter, so I try to reschedule or decline most of these meetings. Of course, I try to be pragmatic and if something incredibly important comes up on the job I will be here, no matter what the time. The bar is pretty high though, and the reason for that goes back to my earlier realization. Once my daughter is in bed I am free to spend time with my wife, code on something else, or work if its necessary. Calling into an 8pm or 9pm meeting with west-coast teams is not unheard of.

At first I had a lot of guilt about leaving my comrades behind to suffer during hard times. That was reinforced during my performance appraisal when one of the pieces of feedback I got through our peer review process was that as a team leader, it would be nice if i was around when the team had to stay late. But if you look at the breakdown of my time, I spend 9+ hours working each day, and only 3 with my daughter. If that’s unfair to anyone, its unfair to my daughter.

By doing this I’ve actually discovered that I can be more productive when I get away from the code for a while. I drive home, play with my daughter, eat dinner, bathe her, read her books, and put her to bed. All this time my brain is still spinning. My subconscious is still tossing problems around and searching for solutions. There is plenty of research that shows the benefits of taking breaks from hard problems if you want to solve them. Plus, few things keep you on your toes like playing with a precocious two year old.

Sure, I don’t pump out the same sheer volume of code that I used to partly because these days I spend more time mentoring, but also because the lines of code I do write are better.)

Another ancillary benefit I discovered is that I don’t feel as burned out. Death marches and late nights take a lot out of you. As a result you . Nowadays I come into the office energized with my thoughts organized and ready to put in solid hours at work. When you work crazy hours you yo-yo between 20 hour days and 8 hour days that really only have a few hours of productivity (or none at all!) Decision making suffers when you are overtired, and you fall behind on everything else outside of your job that needs to get done. To make matters worse, this trend is self-amplifying. You go crazy trying to finish one project because the demands were unreasonable or it was poorly scheduled or estimated (or other legitimate stuff came up, but schedule didn’t budge) and so you necessarily go easy at the beginning of the next cycle or project because you are burned out. This causes you to fall behind and dooms you to another round of late nights and misery.

Even if you don’t have a great reason like I did, kick the death marches to the curb. You may even find that prioritizing a few hours to spend on some worthy pursuit outside of work will make you even better at your job. While you’re at it, pick up a copy of Rework from the guys at 37signals. It covers a lot of this stuff.

Time well spent

Time well spent

Update 9/15/2011 8:20pm

Some great discussion on my post on Hacker News too: http://news.ycombinator.com/item?id=3001783

A Script to Update Riak Config Files

Yesterday, one of my teammates who does a lot of work in automating our deployments came to me and asked me for some in parsing and updating a JSON config file so that he could automate the setting of configuration data during automated deployments.  I took one look at the config file in front of me and said “Bro, that’s not JSON.  Those are Erlang terms.”

He was working on automting the Riak deployment we are setting up for some new features.  He had never touched erlang (or any functional language) so i offered to show him the ropes and write a script that could read and update Riak app.config. I figured i would post it here for a couple of reasons:

  • I’m no Erlang expert myself.  I’d love to get feedback on easier/better ways to accomplish this
  • Somebody else probably wants to solve this problem and the script may help.
So here it is, enjoy!

 

Break the Build!

Our token object of shame.. Given to breakers of the build.

The Scream Doll -- Our token object of shame, given to breakers of the build.

I was just having a conversation with a colleague and he was telling me about how the bug count on one of our projects skyrocketed overnight (during the last week of a cycle) because QA finally looked at the output of our static analysis (Java findbugs) and logged defects on all the medium and high priority errors.  We both lamented about how QA should do this earlier in the cycle so we don’t get swamped at the end.

My suggestion to him for the future: let’s break the build.  We already perform static analysis as part of our build process so why don’t we look at our own output and if we create findbugs warnings with severity > medium (or any findbugs warning) let’s cause the build to fail.  This is how things work with our unit tests, so why not with static analysis?  After all, static analysis is just another kind of test that we can perform at build time.  This would have some immediate outcomes:

  1. everybody on the team would start running findbugs on their changes before checking them in (or risk having our token burden of shame “The Scream Doll (pictured)” brought to their cubicle.
  2. QA would never log a findbugs defect again.
  3. The feedback loop gets shorter, developers would learn more, faster.
  4. Our code would be better.

You could extend this to failing on ANY java warning (equivalent of gcc -wall -werror).

The bottom line is that if you don’t want QA to see it and you can detect it you build needs to break when it occurs.

Drinking from a Firehose

Whenever I interview a person for an open position I always ask the person how they keep up-to-date and keep their knowledge set relevant with industry trends and new technologies.  The answer can tell you a lot about the person across the table from you, and the beauty of the question is that it gives you lots of opportunity for follow-up questions so its easy to determine if the candidate is being honest.

This makes it really easy to separate the wheat from the chaff.  Some candidates don’t do anything to stay current, or say that they a book that is 10 years old (and can’t remember anything about it.)  Others tell me that they follow tech. news, read blogs, contribute to open-source, utilize twitter.  Its easy to see when someone has passion and pride for the work they do.

That all being said, while I am trying to evaluate the candidate for the job, I’m also hoping that they will give me good ideas for how to improve my own content consumption pipeline.  I am constantly striving to become better as a person, as an engineer, and as a general technologist.  Scouring the Internet is one way to find new technology and new ways to learn, but you typically need people to reframe your vision or to open your eyes to new possibilities.  I’ve learned a lot from interviewees.

So, in the spirit of sharing, I wanted to take the time out to document how I consume content and try to stay relevant and learn new things.  Hopefully in exchange, I can get some great ideas from readers.

Step One: Content Acquisition

The first stage of the process is content acquisition.  I try to cast a pretty broad net here.  Google Reader is my savior here.  I get feeds from a ton of place including web comics (SMBC, XKCD), tech news (TNW, Wired), Lifestyle (Uncrate, Acquire), individuals blogs (Bad Astronomy, Al3x), and individual post discussions for interesting posts, or posts that I have commented on so that i can follow the conversation and continue to participate.

In addition to blogs I also keep TweetDeck running and try to dip into my Twitter stream a few times a day to see what is happening (lots of stuff comes from @hnfirehose, some of it is good, some of it is junk but you can quickly separate the wheat from the chaff.)  I really love Instapaper and how it integrates into Twitter for iPhone.  I use the Instapaper bookmarklet and hook it up on my iPhone so that when i find something good i can save it to read later when I have time to focus on reading.

I also learn about a lot of new things from co-workers and friends via word of mouth, from books (I keep a list of books I want to read (tech and otherwise) on Goodreads), and magazines (I really like Communications of the ACM.)

Step Two: Content Shaping

Step one results in a lot of data to sift through, so I try to go through my Google Reader every day and clear out everything.  I will read or skim some articles right then and there, but for the most part I take the articles that I think are interesting and I save them in Instapaper.

Step Three: Content Consumption and the Feedback Loop

Once I have pruned all of that content into something that is roughly manageable it typically ends up in Instapaper.  At first I wasn’t sold on Instapaper but, as I outlined above, i’ve come to depend on it.  I like that I can use it on almost every device I have and that on my iPhone I can download and archive articles for reading on planes, etc.

I try to find 30 minutes or 1 hour each day to actually read through the stuff that has gotten into Instapaper.  This doesn’t always happen so I end up with a sizeable backlog that I try to manage.  Some of the news related items “go stale” and just get removed but many of the technical articles will be good whenever I do get to them.  When I find a blog article I really like or I think would be interesting to people I know i share it.  Sometimes this is personal emails, sometimes it is Twitter, sometimes I note it in my Google Reader public stream.  When I find a blog I really like via a Hacker News article or through a cross-link I typically add it to my Google Reader so that I can follow that writer in the future.

When I comment on blogs I always add the blog post’s comment stream RSS to my Google Reader so that I can keep up with the conversation.  I set up a specific “Conversations” tag in my Google Reader for these.  After a few weeks (or whenever the conversation dies down) I remove these.

Step Four: I’m pretty sure that I forget a lot of good things!

Once i’ve gone through and perused the stuff that seems interesting and dug into the stuff that is really interesting some of it gets put to use immediately, some of it sticks with me, some of it embeds itself deep in my brain and comes out at random times (“hey, i saw a plugin that does that.. let me google and find what it was”) and some of it is just lost.  I think that most of what I lose are individual libraries and things that I’d like to try but have no immediate use for.  I’d love to be able to put these things in my toolbelt, but I don’t have a good place for them that I use regularly that is searchable.

So… How do you stay current?  How do you keep up to date with the latest libraries and technologies in fast growing communities?  Two place that I think would be great to learn from that I want to focus on improving are attending local user groups, and contributing to open-source.

How do you keep this stuff from slipping right in to one ear and out the other?  Do you have a repository that acts as a final resting place for things that you have no use for immediately but that you want to use in the future?  How do you search it?  How do you organize it?

Help me out!

Setting up a 3 Node Riak Cluster with EC2 Cluster Compute Instances

I’ve been doing some investigation into a few NoSQL databases lately for use in a project at Symantec.  One of the NoSQL DBs I really like and wanted to test some specific use cases for is Riak.  I already think that Riak is pretty awesome so i’m not going to get into evangelizing.  This blog is purely about the test setup I created on Amazon EC2.

For my tests I decided to create a 3 node Riak cluster.  I wanted some beefy hardware that would be roughly analogous to the machines we would be putting Riak on in our real datacenters so I opted to use EC2 Cluster Compute instances to reduce latency.  The nodes are Quad-XL (because it was either that or a GPU instance, which I don’t need.)

So without any further delay… here is how to set up a 3 node Riak cluster on EC2:

Creating the First Node

Start by spinning up an EC2 Cluster Compute Instance:

Security Groups

One very important part of this process is Security Groups.  AWS uses security groups to define who can access your instances, and how.  Here is what my Security Group looks like for my Riak cluster:

  • TCP 0-65535 (within the Riak Security Group) is the only thing here that i would justify as a hack :)  When i was trying to get my cluster to join with just 8099 open it failed.  Perhaps somebody from Basho (or a Riak expert) can fill me in on exactly which ports are needed.  I’m not too worried about this since its confined to chatter between my Security Group, but I would never have a production deployment like this.
  • TCP 8099 (within the Riak Security Group)  This is the port that Riak uses for intra-cluster data handoff
  • 22 (SSH) – So i can get to the box :)
  • 80 HTTP – Not actually necessary
  • TCP 8098 (Public) – exposing the Riak HTTP interface to the world.  In a production environment this would probably be restricted to the IP-space of the servers that would be interacting with Riak.

PS — I just learned while doing this that you can define a port to be available to all instances in a security group.  How long has that been around?!  It is an awesome feature! <3

Installing Riak

Once your instance is up and running, installing Riak is a breeze (much of this is borrowed from the excellent Riak RHEL install documentation).  The only thing I had to poke around a little bit to figure out was

wget http://downloads.basho.com/riak/riak-0.14/riak-0.14.2-1.el5.x86_64.rpm
sudo yum install openssl098e
sudo rpm -Uvh riak-0.14.2-1.el5.x86_64.rpm

Note: I got a few errors from chcon, but they didnt seem to cause any issues.

Okay!  We’ve got Riak installed.  Next I edited the Riak app.config file to bind to all IP address, just for external testing:

in /etc/riak/app.config:

{http, [{ "127.0.01", 8098 }]},

becomes

{http, [{ "0.0.0.0", 8098 }]},

Now you can run sudo riak start to start Riak.  If your Security groups are set up correctly you should be able to hit it via port 8098.

Now, shut down that instance.  In AWS create an AMI from the instance (right click the instance, choose “Create Image (EBS AMI”)  Once your AMI is created launch two more EC2 instances from the AMI.  This spares you the effort of having to install Riak on each node in your cluster.  It also makes scaling out easier in the future since you can create a new instance, edit some config (preferably via Puppet or Chef in an production environment) and run riak join to join the cluster.

Once your two instances are up and running, I literally followed the Riak documentation at http://wiki.basho.com/Basic-Cluster-Setup.html.  It was really that easy.  The was one gotcha, though. Make sure that when you choose the IP address to bind (and name) for the nodes in your cluster that you use the EC2 Private IP Address (or DNS name should be fine too.)

I do have some concerns that this will cause problems on reboots, but I haven’t tested it yet.  Anyone else done this and determined that for sure?

There you have it.  At this point you should have a fully functional 3 node Riak cluster running on AWS EC2 Cluster Compute Instances.  It should be pretty damn fast!  The only thing I added on top of this was an Elastic Load Balancer that would round robin the traffic between the three nodes so that I could test how it is really scaling.

For the record, I have made my AMI public (just a base Riak install.)  The AMI ID is: ami-981ee7f1

 

Nostalgia be damned! I’ll never go back to spinning hard drives.

I recently got myself a new Macbook Air in anticipation for a trip I was going on.  I had been wanting to get a new dev laptop for a while and was torn between the Air and a big Macbook Pro.  Since i’ve been traveling a lot more lately, I opted for the Air and I think that I made a great decision.  It’s the perfect size for coding on an airplane, its easy to pack, which is really important because i’m also lugging around a tank of a ThinkPad around with me for work.

There has been one unanticipated consequence of this purchase, however.   I noticed after the first day using the Macbook Air that I was bothered by the noise when I sat down at one of my other machines and listened to the electromechanical hard disk drives spin up and whir.  That sound had long ago been relinquished by my brain to the periphery of my hearing.  Now in a single day’s span it was back, and instead of reminding me of the good old days, learning to program QBASIC on a 286 it was a nuisance.

I’ve been spoiled.  Solid state drives are fast, quiet, and apparently prone to error.  But just like Jeff Atwood I won’t be deterred by the downsides.  The upsides are just too good!

Of Heroes and Pit Crews: Bringing techniques from the surgical world to software

Last night I had the pleasure of going to listen to Atul Gawande speak about medicine, healthcare, his latest book (The Checklist Manifesto), and more generally about making systems, processes, and procedures work as complexity increases beyond the capabilities of the individual.

As a surgeon, researcher, and writer Dr. Gawande has tackled some really difficult problems with regards to reducing deaths as a result of surgery and done an impressive amount of research across many fields (including with Boeing engineers, and people who build skyscrapers) to determine how to deal with complexity in a way that reduces error but also allows members of a team to exercise judgement in the heat of the moment.

The most important thing that I took away from his talk was his emphasis on the need to shift our thinking in terms of how we get things done.  Medicine and Software Engineering have deeply entrenched concepts of lone geniuses.  The Brilliant Doctor.  The Rockstar Developer.

Dr. Gawande emphasizes the need to shift from the “hero” culture to more of a “pit crew” mentality where there is a team of people who all have set responsibilities, where everybody knows everybody else’s responsibility, and every team member feels confident enough to speak up if something is going off the tracks.  A team where everybody is moving forward, pointed in the same direction.  I have seen this same sort of shift in software engineering.  I think that in some ways the agile movement reflects this mentality.  The best teams I have been on were the teams where everybody (regardless of level) felt like they had a say, and where we were all driving towards the same goal (not just a ship date, but a level of quality, a level of service to the teams we interacted with, and a commitment to doing what is right for the customer.)

The tool that Dr. Gawande introduced to his operating room (and others around the world) with stunning success is a simple one.  The checklist.  His argument is that while people’s intuitive reaction to the idea of a checklist is that it means shutting down your brain, but that a well-written checklist for a team of people can spark the memory and bring out better performance.  The idea of introducing checklists into some of the processes we do at work is intriguing me.  I’m analyzing where we fail (in field bugs, broken builds, failed build acceptance tests) and why the failures occur in order to find a place where a simple checklist could be effective.

Off the bat i’ve come up with a few ideas:

  • Code reviews — On my team we already have an informal checklist for code reviews:  Does the code look right (match standards, appear to be logically correct?)  Does it build?  Does it work?  Do the unit tests run?  Do our automated integration tests run?  I wonder if this can be extended or formalized, and how?
  • Deployments — When we are deploying a service and the deployment has to be coordinated with several teams does it make sense to form a “pit crew” with well known responsibilities and a checklist of things that we are all accountable for?  I’ll admit here that i’d rather try to go the devops route and automate deployments with something like Puppet but my group is just not there yet.

What do you think?  Has anyone read the book and thought of how it applies to software engineering?  Where else would a checklist be appropriate for preventing errors?  Are checklists something we can codify?  Are we already doing this to some extent with automation and unit-testing?

 

Still Bullish on the Cloud

Over the past few weeks we’ve witnessed a couple of big outages in the cloud computer world.  First there was the downtime on Amazon’s EC2 infrastructure, and then this week there was the Blogger outage on Google’s infrastructure.  These high profile outages, coupled with the slightly different but orders of magnitude more damaging Playstation Network outage have gotten some people questioning the wisdom of building or moving services into the cloud, including the New York Times.

I’m still incredibly bullish on cloud-hosted services and the power and potential of the cloud.  There are two fundamental reasons why.  First, the cloud is The Great Leveler.  Not since the advent of the personal computer has the playing field been leveled so significantly.  If you come up with the idea for the next Facebook or Google you can bootstrap it for a tiny fraction of what buying the required hardware would have cost you.  As your service grows, you can scale.  You can compete with anybody in the world for handling user traffic on day 1.

The second reason is that while I discovered that AWS Availability Zones aren’t quite as separated as I thought (EBS control plane is shared per-region, but distributed across AZs in the region) there were many websites who survived the outage came to little or no harm (namely Twilio, Netflix.)  The outage at EC2 makes the case for cloud computing, it doesn’t damn.  High Availability 101 says that you can’t have a single point of failure.  The services that went down were primarily in a single Availability Zone on AWS.  If you have everything running in a single AZ then you are asking for trouble.  You have to assume failure.  Netflix takes this to the extreme with their Chaos Monkey.

Furthermore, the cloud is still evolving.  One thing that I think that we will see developing as an extension of some current endeavors, and as a result of these recent outages, is the “meta-cloud” provider.  Companies like RightScale are already playing in this space.  I think that they should start building technology to abstract away the cloud provider entirely and create a market for compute resources.  They can run your service on EC2 if thats where they are getting the best combination of cost and performance (think: spot instances) or they could bring it up on RackSpace’s cloud if the best price and performance is there.  They could utilize many providers to get better performance (routing users to datacenters closest to them.) They could also bring your service up in multiple cloud providers to add an even higher level of availability and redundancy, or switch your provider if an anomaly or outage is detected where you are currently hosted.