Posts Tagged ‘codekata’
Today was the pre-compiler, the “optional” first day of CodeMash.
I did two sessions: Ruby Koans and Software Craftsmanship.
Ruby Koans is simply a series of tests that “prove out” Ruby as a language, with the primary goal of teaching Ruby as you make the tests pass, and the secondary goal of actually seeing if anything in Ruby breaks from version to version. One could write Koans for any language, actually, and the whole concept of such Koans is a fantastic idea, especially for open source languages that are updated frequently. Check out the Ruby Koans on GitHub.
This is my first real experience with Ruby, and I have to be honest: it doesn’t really appeal to me very much, yet. I can definitely see why people like it, and I can kinda see the advantages that it might bring, but I just don’t feel confident enough in my own abilities to see myself using Ruby. But, like I said, this is only the beginning of my experience with it, so who knows what the next couple of days have in store for me…
The software craftsmanship session was really about practicing the craft of development via “katas”, or repetitive development exercises. The first was Uncle Bob’s Bowling Game Kata. The presenters demonstrated Uncle Bob’s version, and then I paired up and used Mono (my first experience with Mono/Monodevelop) to implement it. Clearly we both need more practice with this Kata, as our version turned out as quite a monstrosity. The second kata was a variation of the Supermarket pricing kata that I’ve done before. I was humbled by the expertise of the people that I worked with on this kata, and learned a great deal about the strategy pattern, resharper, and test-driven development. The last “kata” was FizzBuzz, which was kinda disappointing, because it’s entirely too simple, and writing tests around it feels wholly unessential, or at least wildly unwarranted. So I took the time to work on the shopping kata some more.
Finally, I unwound after the sessions with my fellow Quick Solutions colleagues, and had a great time discussing current and past projects over drinks and large portions of Italian food.
All in all, an excellent day of coding and nerding out.
Here is the fourth of the CodeKata exercises.
Data munging–sounds kinda gross in some…indefinable way. But it’s actually some programmer’s
slang meaning “Mash Until No Good”, or
if you prefer recursion, it means “Mung Until No Good”.
This Kata is mostly in code with a few brief questions. I’ve chosen…wait for it…ASP.NET with C#. C#
because it’s what I’m currently best with, and ASP.NET because I find console apps dirty, and a full
Windows app to be overkill. I’ve made the source available on CodePlex. Okay, on with the Kata:
Part One: Weather Data
I initially thought the data was tab-delimited, but no, turns out it’s just fixed width. Fixed
width data from a 3rd party can be very tricky to write a reliable parser around, but since this is just
a learning exercise, I’m going to take a very naive, minimal approach. I created a “WeatherData” entity
class to hold the day, max temp, and min temp, and a method to calculate the difference. I don’t
need the other data, so why waste time trying to parse it? I created a WeatherReader class to read in
the text file and output a nice list of entity objects.
When parsing, first of all, I need to ignore the first 8 lines of the text file, because they are not data. Then, I need to
read in each line of data one at a time, and use a .Substring call to pull each column out. I put a wrapper
around the .Substring because I also wanted to Trim, remove invalid characters (notice the asterisks), etc.
When reading each line in a loop, I need a stopping place. Again, I took a naive approach and just stopped
reading once I reached the “mo” line.
Once I have a nice List of WeatherData, I just use some Linq to search for the day with minimum spread.
If you’ve not used Linq in C#, think of it like SQL for collections of objects, because, that’s pretty much
what it is.
Part Two: Soccer League Table
Pretty much the same story here. I created a SoccerTeam entity and a SoccerReader class to read in the text
file, parse it, and output a list. The data is a bit different in that it’s 5 lines at the top to ignore,
there’s a big dashed line right smack above Ipswich, and the data ends with a ‘</pre>’ instead of ‘mo’.
Part Three: DRY Fusion
So here’s the important part: factor out the common functionality of parts 1 and 2 in order to follow
the DRY principle.
This can be a tricky exercise because the parts are so deceptively simple. There’s obviously some of
the same things going on here, but I think an approach that’s too aggresive can lead to
some Liskov Substitution Principle issues.
A very aggresive approach might lead to one class to handle both the Weather and Soccer information, which one
might be able to pull off, but there are some downsides to that:
- If the weather.dat file format would change, any updated code could break the football.dat processing,
and vice versa. - If you want to add another file and entity, like grocery.dat, you will likely have a bunch of if/switch
statements to update, and again, you risk breaking the other processors. - Intuitively, Weather data and Soccer data are not really cohesive concepts. Even Baseball data and
Soccer data would be a really big stretch, whereas English Premier League and Major League Soccer data could
be much more cohesive.
So, instead, I took more of an incremental, systematic approach to refactoring.
I first created a base class: DatReader. I then made WeatherReader and SoccerReader inherit from DatReader.
There’s one method that was an obvious move to the base class: GetColumnValue, my SubString wrapper. The method
was 100% identical in both classes, so that’s an easy one. I also had a FileName string field that was
identical in both, so that was easy to move too.
Okay, that’s the fruit that’s close to the ground, anything else would require some actual refactoring:
- There
were two constructors in each class, one of them worked directly with FileName, so I moved that and just changed
the parameterless contructors in the subclasses to call “base” instead of “this”. - The MapText…() and TheText() methods contain logic that is specific to the dat file, so while there is some
overlap that is very tempting to refactor, I mostly left those alone, because in the real world, those naive
implementations will probably change very, very often. - However, the GetAll…() methods looked similar enough to factor out. In order to do that, the base class
is going to need to be generic. Then I needed to add two abstract methods for TheText and MapTextToEntity, the latter
of which will return my T generic type, and also be the new name of the MapText…() methods in the subclasses. - WeatherReader should now inherit from DatReader<WeatherDay>, and likewise for SoccerReader.
- Now that MapTextToEntity is abstract, the GetAll…() methods don’t need to call anything but base class
methods, so I refactored that to be just plain GetAll, returning an IList<T>.
Beautiful. It looks pretty darn elegant to me. Each subclass contains exclusively logic specific
to the dat file it’s trying to read. Any changes to the dat formats will only require changes in the subclasses,
and nowhere else.
Kata Questions
To what extent did the design decisions you made when writing the original
programs make it easier or harder to factor out common code?
Splitting the code into an Entity and EntityReader classes ala the Repository pattern I think made it easy
to identify what logic should go where. Additionally, splitting the functionality of the Reader classes into
small, readable methods made it much easier to see where refactoring would be a good idea, and more importantly,
where it would not.
Was the way you wrote the second program influenced by writing the first?
Yes. I practically copied and pasted my way through the whole thing. It took only minutes to write,
compared to the first, which took an hour or two.
Is factoring out as much common code as possible always a good thing? Did the
readability of the programs suffer because of this requirement? How about the maintainability?
1) Yes, except when it’s not. 2) A little. “GetAllSoccerGames” is much more usable than
“GetAll”, but I think the tradeoff is okay, because the class name is still “SoccerReader”.
3) Because I’ve dealt with fixed width text file parsing in the past, I was very cognizant of the
maintainability issues, so I think the way that I refactored it would be very easy to maintain, were this
a real application.
This a cool exercise. I feel like my solution is a very good one, and I think I went through this Kata in
exactly the way it was intended. If you disagree, please check out my CodeKata source code at CodePlex.
Believe it or not, I’m still going through CodeKata exercises.
The 3rd Kata is, again, not exactly about writing an actual program–it’s more of a thought exercise.
There are two parts: How big? and How fast?
How big? asks questions about how much memory various numbers and data would take up in memory. How fast? asks
how long certain processing tasks would take.
How big?
1. Roughly how many binary digits (bit) are required for the unsigned representation of…
- 1000: 2^10 = 1024, so 10 bits is enough to represent 1000
- 1000000: 2^20 = 1048576, so 20 bits would be enough. 19 bits would not cut it.
- 1000000000 – 2^30 = 1 billion, so 30 bits would cover it. 29 bits would not.
- Etc…using powers of 2 can identify how many places would be required in a binary representation.
2. My town has approximately 20,000 residences. How much space is required to store the names, addresses, and a phone number for all of these (if we store them as characters)?
Roughly speaking:
- Name => first (50 bytes), second (50 bytes)
- Address => city (50 bytes), state (2 bytes), zip (5 bytes)
- Phone => 10 digit number (10 bytes)
- Total => 167 bytes each. However, in a typical database, varchars expand just enough to contain the data, so it’s probably much less
- Let’s say around 30 characters for a name, and 12 characters for a city.
- So usually ~59 bytes each. Then 59 * 20000 = 1180000 bytes, or ~ 1 MBytes (give or take).
3. I’m storing 1,000,000 integers in a binary tree. Roughly how many nodes and levels
can I expect the tree to have? Roughly how much space will it occupy on a 32-bit architecture?
I’m going to assume a balanced tree, which is the most optimal case. So I can start
counting nodes like this…
- level 1 – 1 node (1 total)
- level 2 – 2 nodes (3 total)
- level 3 – 4 nodes (7 total)
- level 4 – 8 nodes (15 total)
- level 5 – 16 nodes (31 total)
- level 6 – 32 nodes (63 total)
- …etc…
- level 19 – 262144 nodes (524287 total)
- level 20 – 524288 nodes (1048575 total)
I actually wrote a quick Excel spreadsheet to calculate this faster. It looks like there would be 20 levels
minimum in the tree, but if it’s not balanced, it would have many more, of course.
Assuming each node can store an int, there would be a minimum of 1,000,000 nodes. Each node would contain
three 32-bit ints: one for the actual number, one ‘left’ pointer, and one ‘right’ pointer. This means there would
be 12 bytes of data in each node, and thus a bare minimum of 12 million bytes to store the information. There would
probably be a few bytes more for of overhead, depending on how it was implemented.
How fast?
4. Sending a book by modem
A 56K modem typically runs around 40-50 kbit/s. Let’s assume 1500 characters a page (1500 bytes), and Amazon tells us
that his book has 1296 pages. So the total book is around 15552000 bits (8-bits per byte). 15552000 / 45kbits/sec = 337.5 seconds ~ 5.6 minutes.
That assumes the entire book is text–no images/illustrations/etc.
5. Binary search
I know that binary search runs in O(ln n) from the 2nd kata.
The data given is:
- O(ln 10000) = 4.5mS
- O(ln 100000) = 6mS
- O(ln 10000000) = ?
I’m assuming worse case times, but it could be best/average, so long as it’s consistent.
A graph of log n would show a steep initial slope and then a gradual flattening. Here are some points on that
graph:
- ln 10000 = 9.21
- ln 100000 = 11.51
- ln 10000000 = 16.11
If you line up the points here with the points above, you can see there’s about a +5 shift in the magnitude. I’m doing some hand-waving here, but I would think the third search would take around 10 mS.
6. Unix passwords
The hash seems like a red herring here. We’re really just brute forcing the password, so that’s pretty straightforward
to calculate.
Passwords can be UP TO 16 characters, but let’s make it easy and just assume that everyone has an 8 character password.
Each character can be one of 96 different characters, so that’s 96^8 different possible passwords, or 7,213,895,789,838,336
possible passwords. At 1mS each attempt, that’s 7,213,895,789,838 seconds, or roughly…228000 years.
So, yeah…not a viable approach. And that’s when everyone has an 8 character password. Imagine how long
it would take for passwords varying up to 16 characters. A dictionary attack would be much smarter, and even asking someone directly for their
password would be an approach much more likely to succeed.
I’m still trudging through the CodeKata exercices.
Code Kata 2 is a challenge to implement the binary search algorithm. Binary search is an algorithm
that is used to search sorted arrays. The idea is very much like a “guess the number” game, where you
are pretty much guaranteed to win in a maximum of log2(n) guesses (n being the size of the array). Compare this to a linear search,
where you start at the beginning of the array and search sequentially until you find the value. In the worst case,
you’d need to search every element (though binary search doesn’t work on an unsorted array).
The challenge is not to write a single binary search method, but 5 of them. The obvious first 2 are
recursive and iterative–everyone did those in those college. But what about another 3? This turned out
to be the most challenging part to me, and one that I did not ultimately succeed at.
My first method was recursive. With the method signature that’s given by the Code Kata test cases, this
was more challenging than I thought. Sure, I could basically make it a wrapper method, but I opted against
that. Instead, I had to learn to use Array.Copy. Note that this pretty much defeats the purpose of using
binary search instead of linear search, but it does confine the search to one method. In C++ or some other
unmanaged language, Array.Copy would not be necessary.
Secondly, I did the iterative approach, and this was much easier. There are 4 total cases and 3 comparisons
per iteration in this implementation. It felt suboptimal to me, and it turns out I was right.
Next, I hit a brick wall. What else can I really do? So I thought I would pull a Kirk and rig the
exam so that I could win. I simply made a chop() method that wraps around the built-in C# Array.BinarySearch
method. I had to kajagger the return value a bit to pass the test cases, but it works. Yeah it’s not
very interesting, but I didn’t know what else to do.
Finally, I broke down for the last two and just plain cheated. I looked up a couple of iterative implementations.
The first one did a single comparison per iteration,
and the second one is just a little nicer looking with max of 2 compares per iteration.
Here are the 3 Code Kata questions:
1. Keep notes on the errors. Did you learn from experience on one technique when coding
another?
Exactly as stated in the kata description, the errors I ran into were “fencepost” errors where it would pass most tests except maybe
one edge case. A quick adjustment of a -1 or +1 here and there made the fixes. It’s difficult sometimes when
using 0-based arrays to separate the maximum index and the length when juggling numbers in my head.
2. What can you say about the relative merits of the various techniques you’ve
chosen? Which is the most likely to make it in to production code? Which was the most
fun to write? Which was the hardest to get working? And for all these questions, ask yourself “why?”
My two methods are definitely smelly, and I wouldn’t want to use them in production. However, they all
passed test cases, so I could use them in a pinch. However, the only one I would want in production would be
the Kirk method: using the C# built-in search. Binary search has been written millions of times by thousands
of people smarter than me: there’s no sense in reinventing the wheel. That one was also the most fun to write,
because it was the quickest and easiest to write. The hardest to get working was the initial recursive method,
and this is only because I was committed to using one method with the given signature, and I was using a
managed language.
3. It’s fairly hard to come up with five unique approaches to a binary chop. How did
you go about coming up with approaches four and five? What techniques did you use to fire those “off the wall” neurons?
Heh, well, I really didn’t. I only came up with 3 totally on my own, and one of those didn’t really
count. Additionally, the two iterative methods that I stole were really just (better) variations of my own
lousy iterative method, so I’d be very much interested in seeing the code of someone who has written
5 unique approaches to binary search.
That’s the end of the second kata. It took me well over the standard week, but I actually only spent about 4-5
nights really cranking at this problem.
Here’s the code for Code Kata 2. Just rename the method you want to
run tests on to “chop”. I used C# with MbUnit.
I’m going to attempt to go through all of the CodeKata exercices.
Why? I think it will help me to become a better developer, and I think it will be a lot of fun. (Note that the catalyst for this was Alan Stevens’s presentation on “Coding in Public“). I will
be recording brief overviews of my “solutions” here. Please feel free to suggest improvements or point out
weaknesses, because constructive criticism is, well, constructive.
Code Kata 1 is more of a thought exercise than a coding exercise. However, once I read through it,
I thought I would use some code to organize my thoughts better, and actually try to design some classes.
I struggled with the design for a while. The “can of beans” case is a trivial one, but I tried the “three for a dollar”
case and ran into a wall. It might sound equally simple, but I thought of Speedway, which sometimes
has Coke 2-Liters priced at 3/$3.33, but $1.59 if you buy less than 3. That really stumped me, until I realized
that the price depends on the total quantity purchased, which means I need some way to track the total
quantity purchased: Each item would contain a UPC code, and a quantity of that item.
The $1.99/pound case seemed pretty trivial: just store a price for weight, and then the weight. The Price()
method can do the math. I didn’t want to overthink this, so I didn’t.
Finally, the “buy two, get one free” case might sound tough, but it is trivial as well, since I
think the “three for a dollar” case at Speedway that I described pretty much covers that. So, I don’t even
need a new subclass for that.
Here is my complete class diagram:

I didn’t design a UI or anything, just some classes and tests. The source code is in C#, and I used MbUnit
for unit testing.
Here are the 5 other questions, which I didn’t really do any coding for:
1. Does fractional money exist?
Yes. Why not?
2. When (if ever) does rounding take place?
Well, since fractional money exists, we really don’t have to round. However, when paying with legal
tender, there is an actual smallest unit of payment that the consumer can use (a penny, for instance, or a Yen).
Prices needed to be rounded to that smallest unit. When the rounding takes place, I think, should be at
the last possible moment. However, that may depend on the grocery store, and if they have some rounding
logic they want to use to nickel & dime their customers. If it was me, I’d add up all the fractional
cents, and then round right at the very end.
3. How do you keep an audit trail of pricing decisions (and do you need to)?
In my code, the only thing to audit is the adding of items to the basket, and then the calculation of the
final price (itemized). In the real world, items would need to be able to be removed from the basket, coupons
added, etc. In that case, I would consider an itemized receipt to be the final audit: I don’t care how many
times you change your mind about buying that box of cereal, I just care about what you finally paid for, how
many you paid for, and how much you paid for it. So, my Basket could generate an entire receipt, not just
a Total scalar.
4. Are costs and prices the same class of thing?
I didn’t do any costing, but costs should be totally separate from price! If I buy 144 boxes of Raisin Bran
at $1.00 a box, that’s my cost no matter what I sell them at. Now, in a real supermarket, often times there
are “return” agreements, in which the vendor will take back unsold stock. There might be some logic to that,
and it would certainly vary on the vendor and agreement.
5. If a shelf of 100 cans is priced using “buy two, get one free”, how do you value the stock?
Again, my costs wouldn’t change based on the pricing or promotion. 3 boxes of Raisin Bran cost me $1.00 each, so
my stock is valued at $144. Real inventory systems may need to follow LIFO or FIFO rules, but I still don’t
see prices having any affect on that.
That concludes my first Kata. It took me about 4 nights of coding-pondering-starting over to come to
this solution. What could have been done better?