Danko

Stuff.

Rails on SmartOS Part 1

| Comments

As a Rails developer, I’m often not asked questions about deployment platform. Why would anyone?

The Rails community has really easy options for deployment so it’s not always something we think a whole lot about. We have Engine Yard, Heroku, capistrano setups for deploying to VPS’s that are pretty much magic, and things like the Rubber gem that make it pretty much effortless to deploy a machine in the Amazon cloud.

All of the above options will have your deployed application running on a Linux distribution by default. Linux is great. It’s free, it’s tested, it runs on everything.

In a former career I had a Ruby app that was very very IO intensive. How intensive? 250,000,000 SNMP GETs in a 24 hour period, each and every one of them had to be persisted. A different app worked on around the same time managed massive amounts of video content with individual file sizes measured in terabytes.

Those problems of scale aren’t the ones the usual Rails folks think about when thinking of problems of scale. Linux is good enough for Google, why couldn’t I make it work on that platform?

It’s not that I couldn’t, but there were other solutions that made life incredibly simple, eliminated the need for hardware expansion, added security features as an unexpected bonus and made it so I didn’t have to create a PA’s for more stuff.

That was about six years ago. We ended up deploying those two apps on OpenSolaris for a few reasons I’ll cover later. Since then the kernel for that OS has been forked, new projects and products have arisen and one really stands out as a great application platform: SmartOS from Joyent.

In this four part series we’ll cover four features of SmartOS that are the compelling reasons for developers to choose it as a platform. Part one is a history of the project’s roots and some great developer centric features around ZFS.

In posts 2-4 we’ll go over resource and machine virtualization, DTrace features and how to deploy a Rails app on SmartOS.

So first a little history on how they got to where they are before we get into why it’s a killer Rails deployment platform.

Sun Microsystems

So in the early 2000’s, Sun Microsystems was open sourcing everything they had as well as purchasing companies that had great products that were already open sourced, like MySQL.

Licensing was complicated though. Sun didn’t have the authority to release some of their existing products under licenses like the GPL, so they based a new license of their own making off the Mozilla Public License and called it the CDDL.

They’d released Java and many other products under a public license, but there was still a behemoth in the room that was still proprietary: Solaris.

So they set about the path to create the next version of Solaris under a public license. The kernel was named Nevada and the effort to build a fully fledged OS around it was Indiana.

The community behind these projects was a who’s who of the *NIX world. I could name names, but we only have so many pixels.

In 2009, the first Live CD was released with a full gnome-based desktop. The project was on a 6 month release schedule and things were going swimmingly.

After Oracle’s purchase of Sun, the release date for the next OpenSolaris slipped, and it slipped without a word. Two months later, Oracle announced they were discontinuing the product in favor of other options and OpenSolaris was no more.

Since this was an open source project, it lived on in forks. So why bother forking something like this when upstream is no more? It’s the features.

Feature 1: ZFS

One of the most compelling features that came from later editions of Solaris 10 was the new file system: ZFS. It was designed to be future proof and had some compelling features, such as:

  • An 128 bit storage system. Mathematically, this amount of storage has been calculated as needing the amount of energy to boil the earth’s oceans to be powered. Check it for yourself.
  • Guaranteed data integrity with a focus on preventing silent data corruption.
  • Storage pools. Storage is virtualized into pools meaning adding storage to a volume is as easy as plugging in drives.
  • Snapshots. ZFS has a copy-on-write transactional model that makes it possible to capture a snapshot of an entire working filesystem at once, then storing only the differences between that and the new filesystem as it continues to change.
  • Attack of the clones! The above file system snapshots can be cloned, and the clones only consume disk space that is the diff between the working copy and the clone. This operation is nearly instant.
  • It’s network centric. Shipping a filesystem across the network to another machine is as easy as running “zfs send”
  • The ARC (FS Cache), adaptable volume block sizes, deduplication, encryption, and many, many more features.

As a developer, you may think that these are mostly ops related problems. The two do eventually meet, so let’s go over some scenarios that can make your life easier with ZFS.

Brogrammer Bob and his Magical Migrations

Brogrammer Bob is getting ready to deploy to staging using capistrano. Prior to starting, he creates a ZFS snapshot for his Postgres installation.

Something with the data set didn’t go right and it wasn’t something that could be easily tested. The migration wasn’t wrapped in a transaction and it half completed. Bob restores his snapshot and goes back to the drawing board.

Deployer Dan and his Super Security Patches

A critical security patch for the database is announced. Upgrading involves upgrading many OS packages as the problem is in a C library. One of the system daemons fails, causing chaos and mass hysteria. No problem though, as Dan made a system wide snapshot before starting. He reboots the system, chooses the old boot environment from GRUB, and he’s back to where he started within 30 seconds of what was a catastrophic event.

Patricia and her Postgres

Our app has a lot of IO-bound queries. The database is quite large at this point and performance is quite poor. The filesystem that the Postgres data is on is migrated to one with ZFS’s compression turned on. Patty sees a 3x-4x speed-up in these queries.

Even with the overhead of compression, the queries are limited by disk throughput and not the CPU. We’re reading less data from the disk’s cache at this point, so boom, performance increase. In addition, seek times are much lower because compression reduces the physical distance between logical blocks.

Donny and Dedepulication

ZFS includes an option to deduplicate data on a per-block basis. Say our app holds buckets of files for groups of users. Outside of our application, our users start emailing around a crazy cat gif and drop it in their box-in-which-they-drop-things that we host. The 100 copies of the 5Mb GIF now only exist once.

A cat GIF is one thing, how about a few thousand developers storing their projects on your service? If our block size is set to 4K and we have file level deduplication on, all the files that are inevitably similar will cut down on your storage needs considerably.

Deduplication has an overhead however and it’s something you don’t want to willy-nilly just turn on. When you benefit from it, you can really benefit from it, when your dataset does not benefit from deduplication, the overhead can be more trouble than it’s worth. Fortunately, it’s easy to turn on and off.

For more information on dedup’s advantages and drawbacks, the following two links are for the technical minded:

https://blogs.oracle.com/bonwick/entry/zfs_dedup https://blogs.oracle.com/roch/entry/dedup_performance_considerations1

Summary

The features of ZFS are pretty compelling, but it’s just the start of the whole package that is an amazing platform for Rubyists.

Next time we’ll explore some of the virtualization features of SmartOS, something that doesn’t stop at virtual machines but allows you to have fined grained control over every aspect of the machine, including network traffic, and how you can use these things as a developer to deliver great product to your users.

Stop #3 on the SmartOS train will take us to DTrace and solving production issues.

The last stop will bring us to creating and deploying an app on SmartOS that leverages all the features we’ll eventually cover.

Successful Hack Nights

| Comments

So what makes for a successful Hack Night? What is it I’m calling a Hack Night? Who is the target audience? How can you put together one of your own? Why would you want to have a hack night of your own? All questions I will answer in due time.

We’ve been running Hack Nights at Neo Columbus for about a year now and there have been things I’ve seen work well for our group and things that have been… eh.

Hack Night?

In it’s essence, it’s a pizza & BYOB night where people bring laptops and ideas.

Another take on hack nights is that they fill a void between user groups and conferences. User groups can be a lot of talking and technical talks without a lot of doing things. Conferences in my mind are more about cheerleading and morale boosting for developers and the hack night allows smaller groups of people to actively work on things in small groups.

It’s a great way to extend the reach of an existing community and to bring new people into the fold. Hands-on people don’t get much out of lectures and will get a chance to participate rather than consume and become creators themselves.

The intention when starting our hack nights was to have folks come up with great ideas and have folks around to help, but it’s turned into so much more. The learning expereiences people have shared have been very compelling stories.

Learning Experiences

Who should come and learn things? Everyone. Even the non-technical.

My last take on hack nights is that they fill yet another gap in training, the classroom vs. the real world. I’ve often thought of our hack nights as a Montessori Education for programmers.

When you run a classroom or a lab, it’s much like a traditional education. The assumption is that everyone is at the same level and we’re all starting from square one.

We’re not though. We’re not all the same age and at the same place from any sort of measurable metric. When we go to work, we’re not all of the same set of technical and life experiences, and that’s a good thing. Learning is a two way street. Teachers learn from students as much as students learn from teachers.

As someone who’s been in the field quite a while now, the benefits for me as a participant are massive. Technical people can have a habit of being focused on technical details. I’m reminded of when my six year old son asked my PhD Physics professor neighbor “Where does gravity come from?” — the professor sort of imploded. Thinking about these things can inject fresh perspective into my daily life as a programmer. Another benefit is that my own knowledge of topics is futher understood by explaining a topic, if only for the fact that I’ve recited it aloud as a learning method for myself.

The feedback we’ve gotten on mailing lists from people who are new to the field has been phenominal as far as how this sort of environment has impacted their personal growth. Fifteen minute conversations here and there have led to more people saying “I’ve never learned more than when at a hack night” than I can currently count.

Success Factors

Audience Size

Beleive it or not, some of the best feedback and most fun I’ve personally had are at the hack nights where attendance was low. People tend to get the most out of working together in small groups and tend to form little tribes anyway.

Is there an upper limit? Probably only limited by venue capacity. Standing room only hack nights can get a little chaotic, but some people thrive in that sort of atmosphere.

Keep it Unstructured

If we’re in Hacker Montessori school, the key is self direction when learning and working. Attempts to even have lightning talks have not gone well — in fact, folks were usually ignored.

Be Social

Programmers can be a little “heads down” sometimes when they’re into something. Even those with a lot of technical experience will often avoid bothering someone who looks busy. Someone who has their head down on a project might often not be able to switch gears to a conversation and perhaps not communicate well with someone who’s asking questions.

People also may not know each other well enough to have conversations with folks or to probe as to why they’re there. Other times people may know each other too well and and form cliques.

Getting everyone involved is as simple as asking “What are you working on?” to a random stranger. In the end, most people are excited to actually answer this question and you can spark some great conversation and learning experiences.

Diversity

The earliest of hack nights were an extension from our local Ruby Brigade. They were fun, but the same audience we always saw. We used a number of methods of reaching out, be it a reddit post or encouraging word of mouth. Bringing in more people from outside the fold worked out for everyone involved.

The early groups tended to all work on similar things, since we were all from the same user group. With a more diverse audience we’ve gone into topics like product development, languages outside our normal scopes, and tackling problems outside the domain of he average rubyist.

Scheduling

We’ve found three hours works well. People who want to stay less can come and go and filter in and out as they please. Any longer than three hours and folks can be a little worn out. Someone’s gotta clean up, right?

Setting a limit isn’t a bad thing, especially if people want to keep working or talking about things. It just means that they get to keep thinking, following up with each other and has people looking forward to next time.

Every other week has been a good fit with us as far as frequency. We’ve got lots of other local groups and events, and any more frequent and we’d not be able to have a consistent base of folks coming.

TL;DR

So the ingredients for a good hack night?

  • Free Pizza
  • Beverages
  • Social people
  • Unstructured Chaos
  • Inviting lots of people from diverse backgrounds
  • A consistent schedule