<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Unmaintainable &#187; best practices</title>
	<atom:link href="http://unmaintainable.wordpress.com/category/best-practices/feed/" rel="self" type="application/rss+xml" />
	<link>http://unmaintainable.wordpress.com</link>
	<description>Scripting, Software Engineering and Stuff in Between</description>
	<lastBuildDate>Sun, 22 Nov 2009 09:05:47 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='unmaintainable.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/6f90ae5619dfc90140df401ac60575d2?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>Unmaintainable &#187; best practices</title>
		<link>http://unmaintainable.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://unmaintainable.wordpress.com/osd.xml" title="Unmaintainable" />
		<item>
		<title>A Case for Guard Clauses</title>
		<link>http://unmaintainable.wordpress.com/2009/06/12/a-case-for-guard-clauses/</link>
		<comments>http://unmaintainable.wordpress.com/2009/06/12/a-case-for-guard-clauses/#comments</comments>
		<pubDate>Fri, 12 Jun 2009 13:20:32 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[opinion]]></category>
		<category><![CDATA[quality]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/?p=283</guid>
		<description><![CDATA[One of my pet peeves in programming is that few people use guard clauses. A guard clause is an almost trivial concept that greatly improves readability. Inside a method, handle your special cases right away and return immediately.

Have a look at the following example:

private int doSomething() {
    if (everythingIsGood()) {

   [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=283&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>One of my pet peeves in programming is that few people use guard clauses. A guard clause is an almost trivial concept that greatly improves readability. Inside a method, handle your special cases right away and return immediately.</p>
<p><span id="more-283"></span></p>
<p>Have a look at the following example:</p>
<pre>
private int doSomething() {
    if (everythingIsGood()) {

        /*
         * Lots and lots of code here, but that's a different story.
         */

        return SOME_VALUE;
    } else {
        return ANOTHER_VALUE;  // a special case
    }
}
</pre>
<p>You can easily rewrite it using the <a href="http://www.refactoring.com/catalog/replaceNestedConditionalWithGuardClauses.html">Replace Nested Conditional with Guard Clauses</a> refactoring from Martin Fowler&#8217;s <a href="http://martinfowler.com/books.html#refactoring">Refactoring</a>:</p>
<pre>
private int doSomething() {
    if (! everythingIsGood()) // &lt;-- this is your guard clause
        return ANOTHER_VALUE;

    /*
     * Lots and lots of code here, but that's a different story.
     */

    return SOME_VALUE;
}
</pre>
<p>Once you&#8217;ve read past the conditional(s) at the beginning of the method, you know that your world is in order and you don&#8217;t have to worry about special cases anymore.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/283/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/283/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/283/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=283&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2009/06/12/a-case-for-guard-clauses/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>Development Done Right</title>
		<link>http://unmaintainable.wordpress.com/2009/03/01/development-done-right/</link>
		<comments>http://unmaintainable.wordpress.com/2009/03/01/development-done-right/#comments</comments>
		<pubDate>Sun, 01 Mar 2009 09:53:56 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[maven]]></category>
		<category><![CDATA[opinion]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[rcs]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/?p=174</guid>
		<description><![CDATA[In my projects, I&#8217;ve always been the one who took care of infrastructure, standardization and quality assurance from the development perspective. The funny thing is that I&#8217;m no admin and no QA guy, so most of it wasn&#8217;t even my job. In this article, I&#8217;m going to list a few things that in my opinion [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=174&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>In my projects, I&#8217;ve always been the one who took care of infrastructure, standardization and quality assurance from the development perspective. The funny thing is that I&#8217;m no admin and no QA guy, so most of it wasn&#8217;t even my job. In this article, I&#8217;m going to list a few things that in my opinion as a software developer are essential to a professional software project.</p>
<p><span id="more-174"></span></p>
<p>What you need is no secret: If you read a few books and follow some technology blogs you know the bits and pieces. I&#8217;ll list a few things from a Java/Maven perspective, so some of this may or may not apply to other platforms.</p>
<p>First of all define a common <strong>coding style</strong>. I shouldn&#8217;t even have to mention this, but there will be chaos and conflict among developers if you don&#8217;t have it. Just use <a href="http://java.sun.com/docs/codeconv/">Sun&#8217;s Code Conventions</a>, make some exceptions like &#8220;maximum line length is 120 characters&#8221; or &#8220;no tabs allowed&#8221; and you&#8217;re halfway done. Provide a bit of example code (one class, one page of paper) and put it up on a wall. This style guide should include rules for things like logging and exception handling strategies as well. Too few developers are even aware that you need a strategy here, so write it down! Don&#8217;t forget to define package namespaces for your project.</p>
<p>For Java projects provide organization-wide <strong>Maven archetypes</strong> that give you a solid basis for your project. Use existing open source ones to get you started! The effort for this pays off with each new project in reduced setup costs. But wait, each of your projects is different? Oh please, then adjust your build scripts. That&#8217;s no excuse to start from zero each time!</p>
<p>On a related note, invest in a proper internal <strong>Maven repository setup</strong>. Maybe use a stripped down version of <a href="http://blogs.atlassian.com/developer/2008/02/maven_in_our_development_proce_2.html">Atlassian&#8217;s setup</a>. You don&#8217;t want your builds to break just because some external repository isn&#8217;t available. Define rules who may deploy what to your repository. I&#8217;ve seen a lot of chaos here, so remind people to actually <em>think</em> before deploying stuff and breaking builds.</p>
<p>Get yourself a proper <strong>bug tracker</strong> and define how you&#8217;re going to use it. It doesn&#8217;t have to be fancy, a <a href="http://trac.edgewall.com">Trac</a> installation will do in most cases and it will give you a wiki for your technical documentation, too. Do you just track issues or also tasks for developers? What&#8217;s the workflow? Who may open a ticket, who may close it? What amount of testing is required?</p>
<p>For god&#8217;s sake, define rules for working with your <strong>revision control system</strong>. How often should developers check in? How should a commit message look? How does your release process look like? When do you create a branch, how are releases tagged etc. Read up on the features of your system, branching may no longer hurt. Move on if you&#8217;re still using CVS.</p>
<p>A professional software project needs <strong>continuous integration</strong>. Period. Set it up (I recommend <a href="https://hudson.dev.java.net/">Hudson</a>), define the proper reports and actually read them. It&#8217;s crucial to check test coverage and other metrics right from the beginning. If you add reports later, you will typically get warnings for every other line of source code. Nobody will read those reports anymore and it&#8217;s too much work to clean up the code.</p>
<p>Provide developers with a proper <strong>workstation setup</strong> that corresponds to the platform you&#8217;re targeting. Yes, it&#8217;s a good idea if everyone on the project used the same JVM and application server versions. If that&#8217;s what your production environment uses, even better! It might improve the chance that things actually <em>work</em>. Just saying.</p>
<p>I&#8217;m barely scratching the surface here, but it&#8217;s enough for one article already. For many of the points above you can find articles on the web (check my blog&#8217;s best practices category, for example). And still, it&#8217;s frustrating how little many software companies invest in those bare essentials.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/174/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=174&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2009/03/01/development-done-right/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>Saving Session Data in Web Applications</title>
		<link>http://unmaintainable.wordpress.com/2009/01/04/session-data-in-webapps/</link>
		<comments>http://unmaintainable.wordpress.com/2009/01/04/session-data-in-webapps/#comments</comments>
		<pubDate>Sun, 04 Jan 2009 12:09:37 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/?p=141</guid>
		<description><![CDATA[There are many ways to store session data in web applications. They all differ in scalability, failover capabilities, and complexity. I&#8217;ll give you a quick rundown on the major themes.

Session Data on the Client
You can often implement simple personalization features or workflows by storing state on the client. From a scalability point of view, it [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=141&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>There are many ways to store session data in web applications. They all differ in scalability, failover capabilities, and complexity. I&#8217;ll give you a quick rundown on the major themes.</p>
<p><span id="more-141"></span></p>
<h4>Session Data on the Client</h4>
<p>You can often implement simple personalization features or workflows by storing state on the client. From a scalability point of view, it doesn&#8217;t get any better than that. Not having to keep state on your servers saves you from a lot of trouble.</p>
<p><em>Cookies</em> are a well-understood mechanism that can even be used in a client-only manner using JavaScript. They are useful for small portions of data that aren&#8217;t security relevant. Informal polls or simple preferences like language selections are often implemented via cookies. Cookies also play a major supporting role in user tracking and identification, but that&#8217;s a different story.</p>
<p>Some frameworks save workflow state in <em>hidden form fields</em> (see <a href="http://myfaces.apache.org/">Apache MyFaces</a>). That data can be encrypted by the server to prevent tampering.</p>
<h4>Session Data on Your (Web-)Servers</h4>
<p>For many developers, saving sessions on the front server (which is often a web server) is the most natural choice. That&#8217;s what Java Servlet implementations usually do.</p>
<p>As soon as you need more than one front server, things can become quite tricky. Your environment may support some kind of <em>clustering</em> that provides <em>session replication</em> and <em>failover</em>. No matter which front server gets a request, the session data is already there or will be requested on demand. Depending on the implementation, a failing front server doesn&#8217;t necessarily mean a loss of session data. Apache Tomcat has this feature, as do commercial products. As powerful as this may sound, the approach comes with a rather large price in complexity.</p>
<p>Fortunately, there&#8217;s an easier way: <em>sticky sessions</em>. Your loadbalancer (a HTTP reverse proxy) assigns a cookie to each client on its first request. The cookie determines which front server subsequent client requests are directed to. That means, each client is always directed to the same front server, so session data for a client is only needed on <em>one</em> front server. As a result, you don&#8217;t need replication anymore.</p>
<p>The sticky sessions approach is easy to use but it has two disadvantages: First, if one server crashes, you will lose part of your session data. And second, load balancing may no longer be optimal because it is done on a session basis and no longer on a request basis.</p>
<p>In the Java EE world, session state could also be stored inside the application server (as opposed to the Servlet container). For example, the Seam framework stored conversational data using stateful session beans at some point. I don&#8217;t know if this is still common practice.</p>
<h4>Session Data Inside Your Database</h4>
<p>Saving session state inside the <em>database</em> is common in lightweight web frameworks like <a href="http://www.djangoproject.com">Django</a>. That way you can add as many front servers as you like without having to worry about session replication and other difficult stuff. You don&#8217;t tie yourself to a certain web server and you get persistence and all other features databases provide for free. As far as I can tell, this works rather nicely for small to medium size websites.</p>
<p>The problem is the usual: The database server may become your bottleneck. In that case your best bet may be to take a suitcase full of money to Oracle or IBM and buy yourself a database cluster.</p>
<h4>Using a Dedicated Session Store</h4>
<p>Sometimes, especially for high traffic web sites, it makes sense to store session data on a <em>dedicated server</em> or cluster. This takes complexity out of your web servers at the cost of an increased overall system complexity. No matter what the SOA guys say, distributing your data usually won&#8217;t make things any easier.</p>
<p>If, for example, you&#8217;re currently storing sessions inside your application&#8217;s database, you could move the session store part to its own database. That&#8217;s about as simple as it gets.</p>
<p>I&#8217;m not aware of any off the shelf products, but since storing session data is typically not the most difficult problem, you could try to roll your own based on a caching product (<a href="http://www.danga.com/memcached/">memcached</a> comes to mind, as do at least a dozen Java caching solutions).</p>
<p>If you&#8217;re particularly ambitious (traffic-wise), using a <em>data grid</em> like <a href="http://www.oracle.com/technology/products/coherence/index.html">Oracle Coherence</a> is a solution to consider. They come with everything you need to implement a high-performance distributed session store. And then some.</p>
<h4>Conclusion</h4>
<p>Use cookies for small pieces of possibly long-term data where security is not an issue. Cookies can also complement other approaches.</p>
<p>Otherwise, use whatever your web development framework offers. Sticky sessions are a powerful but simple solution when you&#8217;re starting to scale out your system. I wouldn&#8217;t turn to session replication if I could get away without it.</p>
<p>If things get rough, a dedicated session server or cluster may be your last chance. But then you shouldn&#8217;t be needing my advice anyway.</p>
<p>For further information see Martin Fowler&#8217;s <a href="http://martinfowler.com/books.html#eaa">Patterns of Enterprise Application Architecture</a>. It has a short section on session state patterns that covers client, server, and database session state.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/141/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=141&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2009/01/04/session-data-in-webapps/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>Efficient Development Environment Setup</title>
		<link>http://unmaintainable.wordpress.com/2008/10/26/development-environments/</link>
		<comments>http://unmaintainable.wordpress.com/2008/10/26/development-environments/#comments</comments>
		<pubDate>Sun, 26 Oct 2008 12:10:27 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[build systems]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/?p=98</guid>
		<description><![CDATA[Development environments and their configuration can become quite complex. It&#8217;s not unusual that a complete workstation setup takes half a day or more and requires extensive help from other project members. Using virtual machines for the runtime environment can help to reduce setup and maintenance costs.

For large Java web applications, you often need to set [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=98&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Development environments and their configuration can become quite complex. It&#8217;s not unusual that a complete workstation setup takes half a day or more and requires extensive help from other project members. Using virtual machines for the runtime environment can help to reduce setup and maintenance costs.</p>
<p><span id="more-98"></span></p>
<p>For large Java web applications, you often need to set up a database server (with a suitable database snapshot), an application server, a web container, maybe a frontend Apache server and other components before writing the first line of code. Every new developer on the project has to go through this exercise and everybody has to keep the system current. More often than not, each system is a bit different, especially if developers use different operating systems or work on multiple projects with each having its unique set of requirements.</p>
<p>The idea to end this pain is simple: Create a virtual machine image that is as close to the deployment environment as possible. Install everything necessary to run your application. Set up bridged networking (as opposed to NAT) to make your network stack available from the host machine. Depending one your local network setup, you might be able to use DHCP and central user management. Things aren&#8217;t exactly trivial here, but a good sysadmin should be able to help. The good news is that you only have to do this once and the base system can be reused for other projects as well.</p>
<p>Development takes place on the host machine, so developers can work with their favorite operating system and IDE. Each developer gets a copy of the virtual machine image (the application runtime environment) and can start working in no time. You build the application locally on the host system and deploy it to the virtual machine using a network share. That way, you standardize on the runtime side and people can still work with the tools of their choice.</p>
<p>Additionally, you can install the virtual machine image on a dedicated host to act as a central testing or demonstration system. You could even make your continuous integration system deploy to it, too.</p>
<p>Just try it, the little bit of extra effort quickly pays off.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/98/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/98/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/98/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/98/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/98/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=98&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2008/10/26/development-environments/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>Professional Deployment and Operation of Web Applications</title>
		<link>http://unmaintainable.wordpress.com/2008/09/07/webapp-deployment/</link>
		<comments>http://unmaintainable.wordpress.com/2008/09/07/webapp-deployment/#comments</comments>
		<pubDate>Sun, 07 Sep 2008 08:00:17 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[build systems]]></category>
		<category><![CDATA[deployment]]></category>
		<category><![CDATA[productivity]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/?p=89</guid>
		<description><![CDATA[There are millions of web applications on the Internet that are under constant development. Paying software developers to work on bug fixes and new features is quite expensive already, but what&#8217;s often neglected is the cost for deployment and operation. Well-run organizations invest in their deployment and runtime infrastructure and are rewarded with reduction of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=89&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>There are millions of web applications on the Internet that are under constant development. Paying software developers to work on bug fixes and new features is quite expensive already, but what&#8217;s often neglected is the cost for deployment and operation. Well-run organizations invest in their deployment and runtime infrastructure and are rewarded with reduction of errors, shorter downtimes and lower costs in the long run.</p>
<p><span id="more-89"></span></p>
<p>Professional IT organizations run dozens of different applications on thousands of machines with a surprisingly small number of administrators. I&#8217;ve seen rollouts on 20+ machines during the day without any downtime; it&#8217;s just a matter of infrastructure.</p>
<p>In this article I&#8217;ll discuss some best practices I&#8217;ve learned in the field, concentrating on three important aspects: automation, monitoring, and standardization.</p>
<h4>Automation</h4>
<p>You should never need to ask yourself what it takes to roll out the latest release on production systems. Rollouts shouldn&#8217;t require arcane knowledge or lots of detailed installation instructions. In a good organization, there&#8217;s a general rollout process in place and things are automated as much as possible. A deployment process that involves admins doing an &quot;svn checkout&quot; in your web server&#8217;s htdocs directory and then manually adjusting configuration on each host is not exactly the most robust approach.</p>
<p>Your build system is the best starting point for improving things there. Provide a one-button build with different profiles for development, QA, and production systems. It should be absolutely painless to create a release artifact that relies on as little external configuration as possible. Don&#8217;t create a release from a developer&#8217;s workstation though. Use a dedicated, well-configured integration machine for that.</p>
<p>Usually it&#8217;s a good idea to store production configuration (maybe except passwords) in your source repository or configuration management database. You don&#8217;t want to lose your hand-crafted configuration if a machine crashes beyond repair. Release artifacts can be tarballs, WARs, ZIPs, even RPMs, but make sure you actually have a self-contained, versioned and installable artifact.</p>
<p>Installing the software on multiple machines has to work with as little human interaction as possible, too. Nobody should have to log into the box and fiddle with configuration settings. Write scripts and test them thoroughly so that people trust them. Robustness and transparency (in case things go wrong) are key here.</p>
<p>If you&#8217;re working with Java web applications on Tomcat, for example, why not use a fresh Tomcat installation for each release? Copy it to a new directory on the target machine, shut down the old server and then start the new server that already contains your web application. No cruft is left between installations, no long-forgotten configuration, no questions why one host works and another one mysteriously doesn&#8217;t. You might even be able to revert to an old release in case of trouble (unless there are DB schema changes or something).</p>
<h4>Monitoring</h4>
<p>No application will work indefinitely like it did when it was first installed. To provide a reliable service, you need notification when (not if) things go wrong. Fortunately, with tools like Nagios the infrastructure isn&#8217;t difficult to set up.</p>
<p>Noticing a crashed machine is good but certainly not enough. There are many things that can go wrong in your application even if all your machines are running happily. Meaningful monitoring works on the application level, too, and of course it is highly application specific. You could test if external resources are still available (like databases or web services) or if important use cases still work (like an ordering process). Careful analysis is needed here.</p>
<p>Your application has to provide interfaces to the monitoring framework so that regular checks of the application&#8217;s health are possible. The interface can be JMX-based or there might just be a web page with an easily parseable status format that&#8217;s only available from within your network. There are many ways, but the difficult part is to figure out when your application works perfectly and when it doesn&#8217;t. Typically, you want to be notified if a web application generates 4xx or 5xx pages above a given threshold. But be careful, attackers could use that knowledge to ruin your admins&#8217; Sundays.</p>
<h4>Standardization</h4>
<p>Standardization is important if you&#8217;re operating many similar applications. You typically want the application running 24/7 and every sysadmin (even one who isn&#8217;t familiar with the particular application) should be able to perform the basic tasks like figuring out the status, fixing minor problems, restarting it, finding the documentation etc.</p>
<p>Good and simple things to standardize are the location of program and data files, configuration, and logging output. Provide documentation for developers to follow, or even better, provide a project template that already contains the framework that&#8217;s necessary to make an application blend into your environment nicely.</p>
<p>Work out a general deployment process to reduce the risk of rollouts. Even if you don&#8217;t have a dedicated rollout manager, you can still cut down the stress for everyone involved. Create a rollout plan for each release that contains all required information like involved people, affected systems, step-by-step instructions, expected effects and success or failure conditions. Don&#8217;t forget to provide a rollback plan in case the new release doesn&#8217;t work out as expected.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/unmaintainable.wordpress.com/89/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/unmaintainable.wordpress.com/89/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/89/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=89&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2008/09/07/webapp-deployment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>Why Do We Build Web Applications?</title>
		<link>http://unmaintainable.wordpress.com/2008/07/26/why-web-applications/</link>
		<comments>http://unmaintainable.wordpress.com/2008/07/26/why-web-applications/#comments</comments>
		<pubDate>Sat, 26 Jul 2008 13:42:14 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[opinion]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/?p=72</guid>
		<description><![CDATA[Creating a good user interface is no trivial task, no matter if it&#8217;s running as a desktop application or inside a browser. When it comes to accessing server-side resources (a common thing in the corporate world) web applications seem to be the first choice nowadays. You have complete control over deployment and in theory platform [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=72&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Creating a good user interface is no trivial task, no matter if it&#8217;s running as a desktop application or inside a browser. When it comes to accessing server-side resources (a common thing in the corporate world) web applications seem to be the first choice nowadays. You have complete control over deployment and in theory platform independence.</p>
<p>However, there are lots of disadvantages, too, and I don&#8217;t understand why the alternatives are so often not even considered.</p>
<p><span id="more-72"></span></p>
<p>When building a sophisticated web application, you always have to work around the stateless nature of HTTP, deal with browser quirks, JavaScript&#8217;s idiosyncrasies and other limitations. Sure, frameworks exist to hide all that from you, adding layers of abstraction and complexity. In the best case, web development feels almost like traditional client-side programming. The question is, why do we put up with this?</p>
<p>Don&#8217;t get me wrong, I&#8217;m not talking about classic web sites here. Not even about community sites where users generate content. I&#8217;m talking about all those countless in-house applications that are used for data entry or management. A good example are corporate content management systems: They usually come with a more or less sophisticated web UI, but the number of people actually being allowed to edit content is small (the deployment argument doesn&#8217;t work here).</p>
<p>In many cases, a rich client is a more suitable and cheaper solution. If your users are a relatively small on-site group, there should be no logistic problem to provide them with up-to-date clients. Technologies like the <a href="http://wiki.eclipse.org/index.php/Rich_Client_Platform">Eclipse Rich Client Platform (RCP)</a> or <a href="http://java.sun.com/javase/technologies/desktop/javawebstart/">Java Web Start</a> for even easier deployment should be considered. I&#8217;m quite confident that they would help to reduce development costs quite a bit and (if done right) could result in a far more powerful and ergonomic user environment. In web applications you have to reach for Ajax (adding a second programming language to your project) and still won&#8217;t get anywhere near a classic desktop application.</p>
<p>Maybe the problem is that there are too many web developers out there &#8230;</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/unmaintainable.wordpress.com/72/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/unmaintainable.wordpress.com/72/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/72/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=72&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2008/07/26/why-web-applications/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>Thoughts on Collective Code Ownership</title>
		<link>http://unmaintainable.wordpress.com/2008/05/15/collective-code-ownership/</link>
		<comments>http://unmaintainable.wordpress.com/2008/05/15/collective-code-ownership/#comments</comments>
		<pubDate>Thu, 15 May 2008 12:35:07 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[opinion]]></category>
		<category><![CDATA[process]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/?p=54</guid>
		<description><![CDATA[Agile software development methodologies like Extreme Programming (XP) propagate collective code ownership: Every developer is allowed (and encouraged) to make changes wherever necessary. But is this really a realistic, useful approach?

The theory sounds compelling: Everybody knows their way around the code base and can work on anything. Without module owners, a potential bottleneck is eliminated [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=54&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Agile software development methodologies like Extreme Programming (XP) propagate collective code ownership: Every developer is allowed (and encouraged) to make changes wherever necessary. But is this really a realistic, useful approach?</p>
<p><span id="more-54"></span></p>
<p>The theory sounds compelling: Everybody knows their way around the code base and can work on anything. Without module owners, a potential bottleneck is eliminated and an absent programmer doesn&#8217;t block progress anymore. However, as I pointed out in the comments to <a href="http://dlinsin.blogspot.com/2008/05/do-you-care-about-your-code.html">this article</a>, it all depends on your team.</p>
<h4>When It Works</h4>
<p>Judging from my own experiences, collective code ownership needs several preconditions:</p>
<ul>
<li>Team members are on a similar skill level</li>
<li>Programmers work carefully and trust each other</li>
<li>The code base is in a good state</li>
<li>Unit tests are in place to detect problematic changes</li>
</ul>
<p>If skill levels diverge widely, more experienced programmers usually won&#8217;t trust their less experienced colleagues. When code is modified, there&#8217;s always the risk of breaking things (and unit tests can only go so far).</p>
<p>But even if the environment is suitable, true collective code ownership is hardly realistic. In practice, there are always experts who are more proficient in some areas of the code. You can make changes there, but difficult bug fixes or major new features should be coordinated with them. Otherwise you&#8217;d just waste your time or achieve less than optimal results.</p>
<p>In XP, pair programming provides real-time code reviews that work as a safety net. As long as pilot or co-pilot know the code they are working on, major damages won&#8217;t happen that easily. However, if pair programming isn&#8217;t used, a different review technique is needed to catch mistakes early on. In some critical areas (ie. performance hotspots or security related parts), it may even make sense to prohibit changes without team consent.</p>
<h4>When It Doesn&#8217;t</h4>
<p>When skill levels or areas of expertise diverge widely, collective code ownership may not be your best bet. Experts will always achieve better results in their area and you usually don&#8217;t have the time to get everyone on the team up to speed.</p>
<p>With the collective ownership model, &quot;broken windows&quot; can be fixed quickly on the fly which helps to save your code from decay. However, if your code base is in a bad shape already and you don&#8217;t have a test suite in place to protect you from accidental damages, things are different. Strict code ownership will yield better results here since it&#8217;s less dangerous. The code may not be pretty, but at least some parts work and there&#8217;s no point in taking any risks, especially when you&#8217;re in a hurry.</p>
<p>Strict code ownership protects quality code against mistakes from weaker programmers leading to an encapsulation of bad code in a limited number of modules (quarantine areas, cynically speaking). With this containment strategy, you can direct debugging and end-game cleanup towards those problematic modules first since bugs have a higher probability to show up there.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/unmaintainable.wordpress.com/54/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/unmaintainable.wordpress.com/54/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/54/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=54&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2008/05/15/collective-code-ownership/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>A Metadata Format For CSV Files</title>
		<link>http://unmaintainable.wordpress.com/2008/04/26/metadata-for-csv/</link>
		<comments>http://unmaintainable.wordpress.com/2008/04/26/metadata-for-csv/#comments</comments>
		<pubDate>Sat, 26 Apr 2008 15:41:28 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[productivity]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/2008/04/26/metadata-for-csv/</guid>
		<description><![CDATA[Using CSV files in batch processing applications has many advantages, most prominently interoperability between programming languages and tools. One of its weaker points is data integrity though. The format has no way to declare data types or additional metadata other than assigning names to data fields using a header.
The simple metadata format proposed in this [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=52&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Using CSV files in batch processing applications has many advantages, most prominently interoperability between programming languages and tools. One of its weaker points is data integrity though. The format has no way to declare data types or additional metadata other than assigning names to data fields using a header.</p>
<p>The simple metadata format proposed in this article can help to mitigate this disadvantage.</p>
<p><span id="more-52"></span></p>
<h4>A Case for CSV</h4>
<p>First of all, why would anyone use such a simple plain text format? After all it&#8217;s just a semi-structured collection of data sets.</p>
<p>Sure, CSV isn&#8217;t exactly the most sophisticated data format available, but it has many advantages that make up for this flaw:</p>
<ul>
<li>CSV is simple and well-understood.</li>
<li>There are libraries available for many programming languages or they can easily be written.</li>
<li>It can be analysed in a text editor.</li>
<li>Samples of a file can be loaded into a spread sheet application.</li>
<li>Files can be processed using standard Unix utilities.</li>
<li>It is easy to split CSV files into individual, self-contained chunks.</li>
</ul>
<p>See <a href="http://www.pragmaticprogrammer.com/the-pragmatic-programmer">The Pragmatic Programmer</a>, &quot;The Power of Plain Text&quot; (chapter 3) for a more detailed discussion.</p>
<p>Unfortunately, there are disadvantages, too. Parsing the file format is usually a bit more expensive than for carefully designed binary formats (though it&#8217;s much cheaper than parsing XML). And since the ASCII representation of numbers can easily be three times as big as the usual binary representation, CSV files tend to be rather large.</p>
<p>Fortunately, streaming compressors like GZIP typically reduces files to about 20% of their original sizes even on the lowest compression ratio. Compression saves disk space and network bandwidth but it comes at a slightly increased processing cost. In many scenarios, however, the added overhead is negligible compared to the cost of the actual processing.</p>
<h4>Using a Simple Metadata Format</h4>
<p>There are basically three places where metadata can reside:</p>
<ul>
<li>Inside the data file.</li>
<li>As a file next to the actual data files.</li>
<li>In a remote metadata repository.</li>
</ul>
<p>If the data file format is extended to include metadata, we&#8217;d have to abandon the CSV format together with its advantages listed above. A remote metadata repository may be useful, but testing is a lot easier if you aren&#8217;t coupled to network resources, so a file next to the data seems to be the way to go.</p>
<p>The metadata format proposed here uses XML because it is human readable, well-understood and has excellent tool support (that should sound familiar). The format serves several purposes:</p>
<ul>
<li>It declares the overall representation of CSV files (field separator, whether compression is used etc.).</li>
<li>It lists the set of files that make up the entire collection of data sets.</li>
<li>It defines the data fields and their respective types.</li>
</ul>
<p>Here&#8217;s an example of how it looks (see the <a href="http://musicbrainz.org/~matt/misc/meta-0.1.rng">Relax NG Schema</a> for the semi-formal definition):</p>
<pre>
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;

&lt;meta xmlns=&quot;http://mafr.de/ns/meta-0.1#&quot;&gt;
  &lt;file-location format=&quot;csv&quot; compression=&quot;gzip&quot; separator=&quot;\t&quot;&gt;
    &lt;chunk location=&quot;data_1.csv.gz&quot; size=&quot;123456&quot;/&gt;
    &lt;chunk location=&quot;data_2.csv.gz&quot; size=&quot;34354&quot;/&gt;
    &lt;!-- more chunk declarations ... --&gt;
    &lt;chunk location=&quot;data_N.csv.gz&quot; size=&quot;7890123&quot;/&gt;
  &lt;/file-location&gt;

  &lt;data-fields&gt;
    &lt;string-field name=&quot;USER_ID&quot;/&gt;
    &lt;string-field name=&quot;NAME&quot;/&gt;
    &lt;continuous-field name=&quot;AGE&quot; missing=&quot;0&quot;/&gt;
    &lt;categorical-field name=&quot;COUNTRY&quot; missing=&quot;?&quot;&gt;
      &lt;category value=&quot;DE&quot;/&gt;
      &lt;category value=&quot;UK&quot;/&gt;
      &lt;category value=&quot;OTHERS&quot;/&gt;
    &lt;/categorical-field&gt;
  &lt;/data-fields&gt;
&lt;/meta&gt;
</pre>
<p>This format is pretty much self-explanatory, except for the data field declarations which use data mining terminology. The <em>continuous</em> type is for floating point numeric values, while the <em>categorical</em> field can be compared to enums in programming languages like C. The optional <em>missing</em> attribute defines how an unknown (aka NULL) value is represented. Of course, different and/or more data types with arbitrary<br />
restrictions could be defined.</p>
<p>The data fields have to be listed in the order they appear in the CSV files. Reading applications may choose to accept any order though. Field names have to be unique and may not contain the field separator for obvious reasons.</p>
<p>Depending on policy, applications can either ignore unknown elements (or attributes) or flag an error. There&#8217;s also the option of declaring optional parts of the format in a different XML namespace. Some applications can use those elements while others may ignore them safely.</p>
<p>There are several ways of extending XML schemas, either by explicitly allowing unvalidated content (usually in a different namespace) or by including the basic schema from another, more specialized schema that extends definitions as necessary. Updating the base schema can be done via namespace and/or schema versioning, but since this isn&#8217;t entirely trivial I&#8217;ll leave it for a future article.</p>
<h4>A Note on Container Formats</h4>
<p>When you split your files into multiple chunks for parallel processing, you end up with lots of files. To avoid confusion and to simplify transfer between systems you might be tempted to use a container format that packages all your data into a single file (using tar, for example). This may work in some cases, but if your data files are large and created in parallel, the creation of the container is a long and I/O-intensive operation. In batch operations this causes a significant overhead that you have to subtract from your time window.</p>
<p>A compromise is to use a natural but cheap container format: A file system directory. It may only be a &quot;virtual&quot; container, but combined with a proper delivery protocol, it&#8217;s still useful.</p>
<h4>The Delivery Protocol</h4>
<p>If you&#8217;re handing collections of CSV files from one system to another make sure you follow a simple protocol: Copy the data files first and the metadata file last. The delivery isn&#8217;t complete until the metadata file is there. The best approach is to take advantage of the atomic renaming feature many file systems provide (see the rename(2) syscall on Unix). Copy the metadata to a temporary file and then rename it. That way the receiver will never try to read an incomplete file.</p>
<h4>The Verification Process</h4>
<p>With the information contained in the metadata file, it is easy to verify the CSV files as much as required. The most basic check would only make sure that all of the listed files are there and have the declared file sizes. A simple consistency check would parse the headers to see if all data fields are there. This would also detect if the field separator is correct.</p>
<p>The most thorough check would then go through all of the files and make sure the data fields match their declarations. Since this is extremely expensive, it should usually be a side effect of regular processing rather than an up-front operation.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/unmaintainable.wordpress.com/52/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/unmaintainable.wordpress.com/52/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/52/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=52&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2008/04/26/metadata-for-csv/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>Getting Started With Existing Code</title>
		<link>http://unmaintainable.wordpress.com/2008/02/23/getting-started-with-existing-code/</link>
		<comments>http://unmaintainable.wordpress.com/2008/02/23/getting-started-with-existing-code/#comments</comments>
		<pubDate>Sat, 23 Feb 2008 21:46:18 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[process]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/?p=47</guid>
		<description><![CDATA[Software developers often have to work with existing code bases, whether it&#8217;s for joining an ongoing development effort or for maintenance work on a legacy application. Getting familiar with foreign code takes time and can be a frustrating experience. In this article I&#8217;m going to describe my strategies for getting up to speed quickly.  [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=47&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Software developers often have to work with existing code bases, whether it&#8217;s for joining an ongoing development effort or for maintenance work on a legacy application. Getting familiar with foreign code takes time and can be a frustrating experience. In this article I&#8217;m going to describe my strategies for getting up to speed quickly.  </p>
<p><span id="more-47"></span></p>
<h4>Gather Information</h4>
<p>First of all, get in touch with the people who know the system best. Especially those who originally designed and built it can often provide interesting insights as can people from IT or operations. Be communicative: Don&#8217;t waste your time struggling endlessly, ask for their advice on what&#8217;s important and what is not.</p>
<p>Make a list of all the information that&#8217;s available for the given system and find out what&#8217;s outdated and what is still relevant. The information you typically need includes original visions, business concepts, use cases, and presentations, as well as technical documents describing the architecture, subsystems, used patterns, or even key algorithms. Documentation may reside on file servers, wikis, source code repositories, internal content management systems, or just as printouts on paper. Try to get a comprehensive list.</p>
<h4>Use a Top-Down Approach</h4>
<p>Learn the system&#8217;s business domain, make yourself familiar with the overall architecture and have a look at how the domain&#8217;s concepts map to the software. Try to understand the <em>general idea</em> that&#8217;s behind the system and the kinds of problems it was designed to solve. Often that&#8217;s particularly difficult to extract from documentation. The most valuable information is usually in other people&#8217;s heads, so get them to share it with you.</p>
<p>Get a good understanding of the system&#8217;s environment: The kind of hardware it runs on, the number of users it services, the amount of data it produces etc. Get an overview of the most important libraries and frameworks the system uses. Usually it&#8217;s extremely helpful to understand its UI (if there is any) and its interfaces to other systems to see which services it provides and consumes.</p>
<p>As soon as you have worked through the high level architectural stuff, it&#8217;s time to check out the source code from whatever repository there is.</p>
<h4>Get the Code Running</h4>
<p>Set up a working development environment and make yourself familiar with the build system. Especially for older systems this can be surprisingly difficult: You might have to dig out ancient development tools because the newer ones are incompatible. Ask for assistance when you need it, it&#8217;s easy to waste a lot of time there. Make sure to document everything to save future team members the time and hassle.</p>
<p>Find out how to test the system. Go the whole way from running the test suite (if there is any) up to an end-to-end integration test. If there are databases or other dependencies in the development or testing environment see if they are still up to date. This is extremely important: If you fix a bug later, you need to be confident that the system still works and that you didn&#8217;t break anything in the process.</p>
<p>When maintaining legacy systems it&#8217;s often necessary to create a new release quickly after bugs have been fixed. Make yourself familiar with the release process, from both the technical and the organizational perspective and find out how the system&#8217;s deployment works. That&#8217;ll save you from nasty surprises when short response times are needed.</p>
<p>If your job is maintenance, you have to be able to react quickly to service requests or bug reports. First hand information is valuable, so try to get accounts for the production environment if possible. Access to log files and databases is very helpful as the operations or IT staff doesn&#8217;t always provide all the information you need.</p>
<h4>Read the Code (Some of It)</h4>
<p>Up to now I haven&#8217;t talked much about reading the actual code. That&#8217;s no coincidence because I think there&#8217;s not too much to gain, especially in maintenance of legacy systems. Even for small code bases it&#8217;s hardly feasible to go through the code and understand it all, line by line. That would usually just be a waste of time. Problems will likely turn up in areas you&#8217;ve never seen, so concentrate your efforts on understanding the organization of the code and a few selected key areas.</p>
<p>Often it&#8217;s helpful to pick a single interesting use case and trace it through the system at source level. Draw a few diagrams and ask your colleagues if they make sense. That way you&#8217;ll get a feeling for the architecture and implementation. At the same time you learn the coding style and conventions.</p>
<p>In an ongoing development project you&#8217;ll get down to coding soon enough. Writing a few test cases is a great help for getting familiar with the internal APIs. As soon as you feel comfortable (or maybe a bit sooner) get a few smaller assignments to dive in a bit deeper and read the related code as you go. It&#8217;s far more efficient than spending lots of time browsing through code at the beginning.</p>
<h4>Share Your Knowledge</h4>
<p>Getting into an existing project can be highly interesting: There is a lot to be learned even from the worst and most obfuscated applications out there. If documentation is bad you will probably need a fair amount of help from your colleagues. Don&#8217;t let it discourage you, instead make sure to write down your findings and use your fresh perspective to improve things. The next developer to go your way will love you for it.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/unmaintainable.wordpress.com/47/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/unmaintainable.wordpress.com/47/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/47/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=47&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2008/02/23/getting-started-with-existing-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
		<item>
		<title>(No) Comment?!</title>
		<link>http://unmaintainable.wordpress.com/2007/08/20/no-comment/</link>
		<comments>http://unmaintainable.wordpress.com/2007/08/20/no-comment/#comments</comments>
		<pubDate>Mon, 20 Aug 2007 15:27:30 +0000</pubDate>
		<dc:creator>mafr</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[opinion]]></category>
		<category><![CDATA[quality]]></category>

		<guid isPermaLink="false">http://unmaintainable.wordpress.com/2007/08/20/no-comment/</guid>
		<description><![CDATA[Many software developers feel bad because they make little use of comments in their code. Often, using lengthy comments is considered good style. In the old days, with languages like C or assembler, things got messy pretty fast, so comments were the only way to keep track of processor registers or pointer arithmetic. In modern [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=29&subd=unmaintainable&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Many software developers feel bad because they make little use of comments in their code. Often, using lengthy comments is considered good style. In the old days, with languages like C or assembler, things got messy pretty fast, so comments were the only way to keep track of processor registers or pointer arithmetic. In modern programming languages, more powerful abstractions are available. The question is, does this change the strategy of commenting your source code?</p>
<p><span id="more-29"></span></p>
<p>First of all, I&#8217;d like to differentiate between <em>interface documentation</em> and actual <em>code comments</em>. Interface documentation is often created from specially formatted code comments, but they aren&#8217;t comments in the usual sense. I&#8217;ll discuss interface documentation separately in this article.</p>
<p>In agile software development processes, you usually try to keep the number of supporting artifacts low. Lengthy analysis documents, fancy diagrams and the like tend to become outdated very soon as the code base is constantly being refactored. Updating artifacts takes time and discipline and you can&#8217;t be agile if you&#8217;re carrying around too much weight.</p>
<p>With comments it&#8217;s not much different. Although they live inside the code base, comments can be seen as artifacts, too. They have to be kept up to date and cause confusion if they aren&#8217;t. If a piece of code and its comment diverge, you can never be sure if it&#8217;s a bug or an outdated comment. In fact, the situation is worse than having no comment at all.</p>
<p>Based on books about software design (most notably: Martin Fowler, Refactoring; Eric Evans, Domain Driven Design) and my own experiences, I worked out the following strategy:</p>
<p>I try to avoid writing comments when possible. If a piece of code is getting complex and cannot be understood easily, I try to refactor it to make it simpler. Decomposing a method and using intention revealing variable and method names usually make the code easier to read. Only as a last resort, if a piece of code is inherently complicated, I add comments.</p>
<p>Generally, I favour comments on <em>why</em> something is done over those telling <em>how</em> something is done; it&#8217;s the code&#8217;s job to communicate the &#8220;how&#8221; part. Failed attempts and reasons why they failed are valuable, too, as they help a later maintainer to avoid old mistakes.</p>
<p>When it comes to interface documentation, I differentiate between <em>private</em>, <em>public</em>, and <em>published interfaces</em>. Inside of a class, where there are many small private methods, it&#8217;s hardly feasible to document everything. In-depth documentation would hold me back from necessary refactorings. Much the same holds for the majority of public methods, but I try to write at least a few sentences about each class, its purpose, and responsibilities.</p>
<p>Things are completely different with published interfaces. A published interface consists of all classes, interfaces, and methods (or other public features) that are to be used by external clients. I document classes, their invariants, method parameters, constraints etc. extensively. These interfaces typically don&#8217;t change a lot over time so the effort is certainly worth it. When published interfaces do change, it&#8217;s often a major effort that requires changing client code. In this case, an update to the documentation is unavoidable anyway.</p>
<p><em>Assertions</em> are another useful way of documenting code. They can  check pre-conditions and state the caller&#8217;s responsibilities at the same time. I run my test suites with assertions enabled to catch bugs but disable them in production environments.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/unmaintainable.wordpress.com/29/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/unmaintainable.wordpress.com/29/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/unmaintainable.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/unmaintainable.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/unmaintainable.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/unmaintainable.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/unmaintainable.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/unmaintainable.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/unmaintainable.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/unmaintainable.wordpress.com/29/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/unmaintainable.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/unmaintainable.wordpress.com/29/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=unmaintainable.wordpress.com&blog=586265&post=29&subd=unmaintainable&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://unmaintainable.wordpress.com/2007/08/20/no-comment/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/59c9677a3b9569af44561adab6c2a980?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mafr</media:title>
		</media:content>
	</item>
	</channel>
</rss>