PDF Haberdashery
Here’s a neat trick that I just learned. Ever wanted to print something to PDF, but for whatever reason, printing to your PDF Printer/CutePDF/etc was blocked?
Well, I’ve found an incredibly circuitous way around it!
First, make sure your actual printer is turned off. Yes, turned off. Now, try printing out the thing which you want to become a PDF. Your computer will likely say something like “GREERRRRORR CANNOT PRINT YOUR PRINTER IS OFF MMMBBEEP”. This is good. Now, don’t cancel or anything like that: leave the error message there!
Okay, now here’s the tricky part: open up a file explorer window and go to c:\Windows\System32\spool\PRINTERS. You should see a file here like “FP00059.SPL”. The “FP00059″ could be anything, the important part is the .SPL. Copy this file somewhere else, like your desktop. Okay, now you can cancel the error or turn your printer back on and print, whatever.
So what is this SPL file? It’s what’s known as a “spool”, which is an antiquated term meaning “stuff that’s ready to print but hasn’t been sent to the printer yet”. If only there was some way to view this file…but wait! There is! I found a nice small program called EMF Printer Spool File Viewer, which, despite it’s incredibly ambiguous name, let’s you view EMF Printer Spool Files. Download it, run it, file->open the SPL file.
Ahhhh, now we see the thing which we want to put in a PDF. Fortunately, the EMF file viewer has no problem letting you do a File->Print to PDF Printer/CutePDF/etc. So do it already. Now you have a nice PDF file with which to do all manner of haberdashery upon. Now don’t use this to circumvent DRM or anything, because that would just be wrong.
Code Kata 3 – How Big, How Fast?
Believe it or not, I’m still going through CodeKata exercises.
The 3rd Kata is, again, not exactly about writing an actual program–it’s more of a thought exercise.
There are two parts: How big? and How fast?
How big? asks questions about how much memory various numbers and data would take up in memory. How fast? asks
how long certain processing tasks would take.
How big?
1. Roughly how many binary digits (bit) are required for the unsigned representation of…
- 1000: 2^10 = 1024, so 10 bits is enough to represent 1000
- 1000000: 2^20 = 1048576, so 20 bits would be enough. 19 bits would not cut it.
- 1000000000 – 2^30 = 1 billion, so 30 bits would cover it. 29 bits would not.
- Etc…using powers of 2 can identify how many places would be required in a binary representation.
2. My town has approximately 20,000 residences. How much space is required to store the names, addresses, and a phone number for all of these (if we store them as characters)?
Roughly speaking:
- Name => first (50 bytes), second (50 bytes)
- Address => city (50 bytes), state (2 bytes), zip (5 bytes)
- Phone => 10 digit number (10 bytes)
- Total => 167 bytes each. However, in a typical database, varchars expand just enough to contain the data, so it’s probably much less
- Let’s say around 30 characters for a name, and 12 characters for a city.
- So usually ~59 bytes each. Then 59 * 20000 = 1180000 bytes, or ~ 1 MBytes (give or take).
3. I’m storing 1,000,000 integers in a binary tree. Roughly how many nodes and levels
can I expect the tree to have? Roughly how much space will it occupy on a 32-bit architecture?
I’m going to assume a balanced tree, which is the most optimal case. So I can start
counting nodes like this…
- level 1 – 1 node (1 total)
- level 2 – 2 nodes (3 total)
- level 3 – 4 nodes (7 total)
- level 4 – 8 nodes (15 total)
- level 5 – 16 nodes (31 total)
- level 6 – 32 nodes (63 total)
- …etc…
- level 19 – 262144 nodes (524287 total)
- level 20 – 524288 nodes (1048575 total)
I actually wrote a quick Excel spreadsheet to calculate this faster. It looks like there would be 20 levels
minimum in the tree, but if it’s not balanced, it would have many more, of course.
Assuming each node can store an int, there would be a minimum of 1,000,000 nodes. Each node would contain
three 32-bit ints: one for the actual number, one ‘left’ pointer, and one ‘right’ pointer. This means there would
be 12 bytes of data in each node, and thus a bare minimum of 12 million bytes to store the information. There would
probably be a few bytes more for of overhead, depending on how it was implemented.
How fast?
4. Sending a book by modem
A 56K modem typically runs around 40-50 kbit/s. Let’s assume 1500 characters a page (1500 bytes), and Amazon tells us
that his book has 1296 pages. So the total book is around 15552000 bits (8-bits per byte). 15552000 / 45kbits/sec = 337.5 seconds ~ 5.6 minutes.
That assumes the entire book is text–no images/illustrations/etc.
5. Binary search
I know that binary search runs in O(ln n) from the 2nd kata.
The data given is:
- O(ln 10000) = 4.5mS
- O(ln 100000) = 6mS
- O(ln 10000000) = ?
I’m assuming worse case times, but it could be best/average, so long as it’s consistent.
A graph of log n would show a steep initial slope and then a gradual flattening. Here are some points on that
graph:
- ln 10000 = 9.21
- ln 100000 = 11.51
- ln 10000000 = 16.11
If you line up the points here with the points above, you can see there’s about a +5 shift in the magnitude. I’m doing some hand-waving here, but I would think the third search would take around 10 mS.
6. Unix passwords
The hash seems like a red herring here. We’re really just brute forcing the password, so that’s pretty straightforward
to calculate.
Passwords can be UP TO 16 characters, but let’s make it easy and just assume that everyone has an 8 character password.
Each character can be one of 96 different characters, so that’s 96^8 different possible passwords, or 7,213,895,789,838,336
possible passwords. At 1mS each attempt, that’s 7,213,895,789,838 seconds, or roughly…228000 years.
So, yeah…not a viable approach. And that’s when everyone has an 8 character password. Imagine how long
it would take for passwords varying up to 16 characters. A dictionary attack would be much smarter, and even asking someone directly for their
password would be an approach much more likely to succeed.