Believe it or not, I’m still going through CodeKata exercises.
The 3rd Kata is, again, not exactly about writing an actual program–it’s more of a thought exercise.
There are two parts: How big? and How fast?
How big? asks questions about how much memory various numbers and data would take up in memory. How fast? asks
how long certain processing tasks would take.
How big?
1. Roughly how many binary digits (bit) are required for the unsigned representation of…
- 1000: 2^10 = 1024, so 10 bits is enough to represent 1000
- 1000000: 2^20 = 1048576, so 20 bits would be enough. 19 bits would not cut it.
- 1000000000 – 2^30 = 1 billion, so 30 bits would cover it. 29 bits would not.
- Etc…using powers of 2 can identify how many places would be required in a binary representation.
2. My town has approximately 20,000 residences. How much space is required to store the names, addresses, and a phone number for all of these (if we store them as characters)?
Roughly speaking:
- Name => first (50 bytes), second (50 bytes)
- Address => city (50 bytes), state (2 bytes), zip (5 bytes)
- Phone => 10 digit number (10 bytes)
- Total => 167 bytes each. However, in a typical database, varchars expand just enough to contain the data, so it’s probably much less
- Let’s say around 30 characters for a name, and 12 characters for a city.
- So usually ~59 bytes each. Then 59 * 20000 = 1180000 bytes, or ~ 1 MBytes (give or take).
3. I’m storing 1,000,000 integers in a binary tree. Roughly how many nodes and levels
can I expect the tree to have? Roughly how much space will it occupy on a 32-bit architecture?
I’m going to assume a balanced tree, which is the most optimal case. So I can start
counting nodes like this…
- level 1 – 1 node (1 total)
- level 2 – 2 nodes (3 total)
- level 3 – 4 nodes (7 total)
- level 4 – 8 nodes (15 total)
- level 5 – 16 nodes (31 total)
- level 6 – 32 nodes (63 total)
- …etc…
- level 19 – 262144 nodes (524287 total)
- level 20 – 524288 nodes (1048575 total)
I actually wrote a quick Excel spreadsheet to calculate this faster. It looks like there would be 20 levels
minimum in the tree, but if it’s not balanced, it would have many more, of course.
Assuming each node can store an int, there would be a minimum of 1,000,000 nodes. Each node would contain
three 32-bit ints: one for the actual number, one ‘left’ pointer, and one ‘right’ pointer. This means there would
be 12 bytes of data in each node, and thus a bare minimum of 12 million bytes to store the information. There would
probably be a few bytes more for of overhead, depending on how it was implemented.
How fast?
4. Sending a book by modem
A 56K modem typically runs around 40-50 kbit/s. Let’s assume 1500 characters a page (1500 bytes), and Amazon tells us
that his book has 1296 pages. So the total book is around 15552000 bits (8-bits per byte). 15552000 / 45kbits/sec = 337.5 seconds ~ 5.6 minutes.
That assumes the entire book is text–no images/illustrations/etc.
5. Binary search
I know that binary search runs in O(ln n) from the 2nd kata.
The data given is:
- O(ln 10000) = 4.5mS
- O(ln 100000) = 6mS
- O(ln 10000000) = ?
I’m assuming worse case times, but it could be best/average, so long as it’s consistent.
A graph of log n would show a steep initial slope and then a gradual flattening. Here are some points on that
graph:
- ln 10000 = 9.21
- ln 100000 = 11.51
- ln 10000000 = 16.11
If you line up the points here with the points above, you can see there’s about a +5 shift in the magnitude. I’m doing some hand-waving here, but I would think the third search would take around 10 mS.
6. Unix passwords
The hash seems like a red herring here. We’re really just brute forcing the password, so that’s pretty straightforward
to calculate.
Passwords can be UP TO 16 characters, but let’s make it easy and just assume that everyone has an 8 character password.
Each character can be one of 96 different characters, so that’s 96^8 different possible passwords, or 7,213,895,789,838,336
possible passwords. At 1mS each attempt, that’s 7,213,895,789,838 seconds, or roughly…228000 years.
So, yeah…not a viable approach. And that’s when everyone has an 8 character password. Imagine how long
it would take for passwords varying up to 16 characters. A dictionary attack would be much smarter, and even asking someone directly for their
password would be an approach much more likely to succeed.