Should I refactor this piece of code?

Most of you will probably be familiar with the concept of refactoring. It is the process of modifying an exisiting piece of code in order to improve it's readibility or structure while preserving it's original meaning or behavior. Automated testing makes refactoring a lot easier, as you can see if the changes you make to the code break it's original functionality. Anyway, this post was actually going to be about a little Helper function I wrote and felt like sharing and not about refactoring. But as it's the case with most of my code, the first step before blogging about it, is to refactor it for readability. But often I also tend to refactor code on here to be more generic / reusable then it needs to be for my actual application to make it more attractive to a broader audience. This was also the case with the little Helper function that can be used to convert a given amount of bytes into a more meaningful string using the appropriate units. The end result was that I decided to use the resulting code as an example for refactoring instead.

First of all, consider the function I initially thought about posting:

class UnitsHelper extends Helper
{
function bytesToString($bytes, $round = 1)
{
if ($bytes==0)
return '0 bytes';
elseif ($bytes==1)
return '1 byte';

$units = array('bytes' => pow(2, 0),
'kb' => pow(2, 10),
'mb' => pow(2, 20),
'gb' => pow(2, 30),
'tb' => pow(2, 40),
'pb' => pow(2, 50),
'eb' => pow(2, 60),
'zb' => pow(2, 70),
'yb' => pow(2, 80));

$lastUnit = 'bytes';
foreach ($units as $unitName => $unitFactor)
{
if ($bytes >= $unitFactor)
$lastUnit = $unitName;
else
return round($bytes/$units[$lastUnit], $round).' '.$lastUnit;
}

}
}

Now even so I expect most of us to deal within units much smaller then 1 TB on the web, I added the units that come afterwards for completeness. But that's not really important. What's interesting is is that this function above serves a well defined purpose and there is no strong reason for not leaving it as is. However, when posting about it on here (so I thought), it should be more generic as it's part of an object called UnitsHelper. What happened was a ridiculous little refactoring session turning this innocent little function into being part of an automagic unitToString function that works like the Model::findBy<Field>() stuff in CakePHP. Only that in this case it's UnitsHelper::<unit>ToString(). Before explaining any further, take a look at the result:

class UnitsHelper extends Helper
{
function __call__($method, $param)
{
if (preg_match('/^(.+)toString$/i', $method, $match))
{
array_unshift($param, $match[1]);
return call_user_func_array(array(&$this, 'unitToString'), $param);
}
}

function unitToString($unit, $quanitity, $round = 1)
{
if ($quanitity==0)
return '0 '.$unit;
elseif ($quanitity==1)
return '1 '.Inflector::singularize($unit);

$unitFct = $unit.'Units';
if (!is_callable(array(&$this, $unitFct)))
return $quanitity.' '.$unit;

$units = $this->{$unitFct}();
foreach ($units as $unitName => $unitFactor)
{
if ($quanitity >= $unitFactor)
$lastUnit = $unitName;
else
return round($quanitity/$units[$lastUnit], $round).' '.$lastUnit;
}
}

function bytesUnits()
{
return array('bytes' => pow(2, 0),
'kb' => pow(2, 10),
'mb' => pow(2, 20),
'gb' => pow(2, 30),
'tb' => pow(2, 40),
'pb' => pow(2, 50),
'eb' => pow(2, 60),
'zb' => pow(2, 70),
'yb' => pow(2, 80));
}
}

First of all: Note that this code will only work in CakePHP 1.2 where Helper extends Overloadable and therefor makes it easy to implement automagic functions like the one used here.

Other then that this is good example to answer the question about when to refactor. What I did in this case was to refactor the function, so it would be easy to use it's core functionality for other (linear) unit systems as well. So if I was to expect that I will deal with lot's of units in the metric system in a project, this little refactoring would become very useful as the amount of used units grows. However, in the little experimental project I'm using this Helper in (a code review assistant) there probably won't ever be another unit then bytes.

So what does this mean? Well I think it's obvious: The principle of YAGNI (You Arent Gonna Need It) says that in my case I just wasted a lot of time. However, in the scope of somebody elses project who might reads this post, refactoring this Helper was very benifital. In general I'm in big favor of 37 Signals philosophy of not trying to be everything to everybody, and sticking with what you need. So just because there are a lot of benifits to staying DRY (Don't Repeat Yourself), this doesn't mean it should become your #1 priority. If you are unsure if you are going to need something, don't code it. If you are not sure if refactoring some piece of code will make the world a better place, don't do it. Try to get a feel of where the critical parts of your application are located and focus on making them the most beautiful code you can imagine (or afford in terms of time investment ^^). But don't try to be as insane and refactor the little bytesToString function of mine if all you need is a bytesToString function ... ; ).

Now even if so this post didn't answer all your questions about refactoring, it at least provides you with (one) solution to a common problem ; ).

--Felix Geisendörfer aka the_undefined

&nsbp;

You can skip to the end and add a comment.

scott lewis said on Nov 09, 2006:

You have incorrect unit abbreviations. Byte is abbreviated as a upper-case B. Bit is abbreviated as lower-case b. IEEE 1594 and the Metric Interchange Format agree on this (with Bit as b), while IEC 60027 gives no abbreviation for Byte (and specifies bit for Bit). In addition, the capitalization of the first character in everything over a kilobyte is incorrect.

The proper sequence would be: bytes, kB, MB, GB, TB, PB, EB, ZB, YB.

Of course, the SI units only apply to powers of 10. For powers of 2, the standards demand the use of the binary units (kebibyte, mebibyte, etc.)

The technically correct sequence would be bytes, KiB, MiB, GiB, TiB, PiB, EiB, ZiB, YiB.

But, as we all know, the use of SI units to refer to powers of two is rampant and well understood except when dealing with hard drive capacity. :)

A useful quick reference to the MIF is: http://swiss.csail.mit.edu/~jaffer/MIXF/

Felix Geisendörfer said on Nov 09, 2006:

scott lewis: Well, seems like you are an expert on this topic. My research on this topic merly consisted of a quick visit over at http://en.wikipedia.org/wiki/Byte ; ). For my application, this is how I want bytes to be displayed, but feel free to change it in yours.

Grant Cox said on Nov 10, 2006:

That's a good post, Felix. There are many times that I've made mini-libraries out of code that really doesn't have that many purposes, and usually doesn't get used in other applications.

It's an odd behaviour, and something that many programmers do. Is there something inherently beautiful about all encompassing code (which in reality will have more bugs and be more of a headache than just Get It Done code). Or is it just to show that we can do it?

Felix Geisendörfer said on Nov 10, 2006:

Hey Grant: Well I'm refactoring things that don't really *need* to be refactored a lot of times as well. I'm generally trying to avoid starting own libraries for everything as this is really overkill in terms of time. But as far as refactoring other things goes ... it's a passion of mine. I simply hate working with code I don't consider beautiful, and so refactoring bad code makes me feel better about the entire application. Besides changing the structure of the code, one of thing I do very often is to replace variable/function/object names with more descriptive ones.

Basically one thing you need to consider for refactoring are the application's requirements. If the application requires very high maintainability & code readability - refactor a lot. However, this does not mean that everything needs it's own library, automagic functions and other fancy stuff.

And about the question if we refactor to show what we can do? That sure is true. When I was coding all by myself (in VB 6) years ago on code nobody else would (and has) ever see(n), my code was horrible. My variable names had usually one letter, I used little OOP and did't care much about the code itself. Ok, I've definitely learned a lot since than and my private code looks a lot better these days. But I noticed that since I started blogging I felt the urge to write better code in general. And now, that I get most of my business contacts through this blog (I've never planned that when I started it), it's even more important to me to do a good job on the code I publish. The fact that I consider to potential publish posting most of the code I write has definetly raised the bar. However, as deadlines come closer things often become more pragmatic ... ; ).

CumpsD said on Jan 07, 2007:

"I simply hate working with code I don’t consider beautiful"

Programming is an art! :)

You'll find a lot of (good) programmers who will share your view on the beautiful part :)

Martin Labuschin said on Apr 24, 2008:

Is pow(2, 0) really neccessary? I don't think so.

This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.

debuggable