Posts Tagged ‘Hackery’

ExpanDrive Version 1.2

Monday, May 12th, 2008

Fresh off the press, out today, come and get it while it’s hot. Since 1.2 seems to be the magic number, that’s what we’re calling ours too.

Big ticket items: free space remaining now displays correctly on servers that support python. A filter field’s been added to the Drive Manager for those of us that have oh-so-many drives. Public key support is far more robust – in addition, encrypted private keys are also now supported.

Also, you might want to try a little Dino Run.

Finessing international characters out of Python

Tuesday, May 6th, 2008

Whilst we whittled our filesystem problems down to a remaining few and sent our first Release Candidate out into the wild, we discovered we had another specter on the horizon to deal with: International Filename Support. Python generally handles this pretty well: it defaults to the web standard, UTF-8, so if you received a UTF-8 string, python will print the correct representation upon your call to “print”. No other work is necessary. This does not go so smoothly if the string you get is not encoded in UTF-8 (or ascii, since it is a true subset of UTF-8). We learned this limitation, and how to overcome it, over the course of two frustating days.

In our testing, we used another commercial SFTP Client to put some files with international characters in their names onto our test server (to wit: the files were called Québécois and Dvořàk). Unbeknownst to us, the client we used defaulted to Latin-1, aka ISO-8859-1 encoding. However, at this point, we also did not know about encoding in python, so we just output the strings as we received them. What we saw was Qu?b?cois and Dvo??k from the Terminal, and even worse in Finder, Qu? and Dvo? (more on why this was so later).

Python does not auto-detect encodings. You can get some third-party modules to get Python to try and do this.

We knew we had international characters, and we also knew that Mac OS X likes its characters to be encoded as UTF-8 (sort of).

So we tried this:

output_string = input_string.encode('utf-8')

Exception!

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

It looks like python is guessing the string is ASCII. We think it’s UTF-8, so let’s try it again:

ouptut_string = input_string.decode('utf-8').encode('utf-8')

Exception!

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-4: invalid data

Oh dear. At this point, I insisted the client we were using was definitely not encoding filenames as UTF-8 data, but Jeff insisted that it had to be (it’s the standard, after all). Then we had an argument about the semantics of decoding vs. encoding. On a whim, I tried decoding the string using ‘latin-1′ as an argument. Ta da! No more Unicode exception! We came to the following conclusion about python encoding/decoding: python always stores strings in an internal, canonical representation. Therefore strings are always implicitly decoded from ASCII to this form.

In short, python does this with every incoming string:

canonical_string = decode(input_string, 'ascii')

output_string = encode(canonical_string, 'ascii')

If the incoming strings are not ASCII-encoded, you must explicitly call decode() on them with the appropriate codec as an argument. Our codec in this case is Latin-1 (aka ISO-8859-1); so far so good.

Now that we have our string object, we must call encode() on it with ‘utf-8′ as an argument, since UTF-8 is almost what Mac OS X expects. I say “almost” because there are two possibilities for UTF-8 encoding: “Canonical From” and “Decomposed Form”. The difference is in how characters with diacritics, like à or é, are transmitted. Mac OS X uses decomposed form, which simply means that à is transmitted as two characters, ` and a, which are then combined. Python defaults to canonical form, so before we re-encode the strings as UTF-8, we’ve got to make this switch.

import unicodedata       
decomposed_string = unicodedata.normalize('NFD', \
   input_string.decode('latin-1'))

Now we can finish up the task.

output_string = decomposed_string.encode('utf-8')

Hooray! We’re done.

But wait… what happens if some other client uses a different encoding? Well, of course the characters will display incorrectly. We need some sort of default encoding that will work. We saw above that using UTF-8 as a default will not work, since there are encodings of characters in latin-1 (and probably other codecs) that are invalid in utf-8. We settled on defaulting to ASCII. This is acceptable in all cases because of a basic truth about text encoding: every single character is transmitted as at least one byte of data. ASCII has a printable representation of every possible byte. So while the character à does not have an encoding in ASCII, its byte sequence, \xc3\xa0, does, though it will usually just print as ?? since both those numbers are greater than 0x7F and ASCII is not standardized above 0x7F.

Putting it all together, this is basically the function we use to handle these strings.

import unicodedata

def re_encode(input_string, decoder = 'utf-8', encoder = 'utf=8'):   
   try:
     output_string = unicodedata.normalize('NFD',\ 
        input_string.decode(decoder)).encode(encoder)

   except UnicodeError:
     output_string = unicodedata.normalize('NFD',\ 
        input_string.decode('ascii', 'replace')).encode(encoder)
   return output_string

And that’s really all there is to it. Python wins the game. By defaulting to ASCII encoding, you won’t get any unhandled exceptions, and you’ll also know pretty quickly that something is wrong (just look for the ???????s). For a much lengthier discussion of what Unicode is and does, see Joel Spolsky’s verbose take on the matter.

Sharing a Virtual Machine between VMWare Workstation and Fusion

Tuesday, October 23rd, 2007

Here is how to share a VM between Windows-based VMWare Workstation and Mac-based Fusion:

  1. Create a large FAT32 partition. You can either carve up your primary hard drive using something like Partition Magic – or do something more sane like buy external Firewire drive [USB drive performance on OS X is abysmal]. I own this Firelite drive which is powered over Firewire and also this Firewire 800 G-Tech drive. Let me re-emphasize: get a Firewire drive, USB is painfully slow.
  2. Format the drive using Disk Utility with the ‘MS-DOS’ filesystem. Windows, for no apparent reason, refuses to format a FAT32 volume larger than 32GB – so the format must be done in Disk Utility.
  3. FAT32 is limited to 4 GB files, so you’ll need to make sure your virtual disk is split in to into 2GB segments. It’s easy to specify this option during VM creation or you can convert an existing VM with the command line VMWare disk utilities. I recommend Robert Petruska’s DiskManager GUI, which makes things much easier. I recommend copying the virtual disk to a local drive first, it’ll save a lot of time.
  4. Modify the VM configuration to point at the split disk you just converted, and you’re good to go!

The only real drawback is that Fusion cannot do much [anything] with the tree of snapshots in created in workstation.

Google Calculator from the Command Line

Tuesday, August 21st, 2007

I found this the other day – a command line version of Google Calculator. Very cool!

$ gcalc
gcalc version 0.1 by Greg Miller
Usage: gcalc [-d]
example:  gcalc "5+2*2"
example:  gcalc 5!
example:  gcalc "sqrt(-4)"
example:  gcalc "160 pounds * 4000 feet in calories"
example:  gcalc avogadros number
example:  gcalc 0b110111010 + 0x33 in decimal
example:  gcalc 22 lira in yen
example:  gcalc 2 to the power of 5

VMWare Fusion high CPU usage hint

Thursday, August 16th, 2007

Whether iTunes or VMWare Fusion v1.0 is the culprit, I’m not sure. I’ve been getting high CPU usage while my guest OS is idle in VMWare Fusion 1.0 final. The solution, noted elsewhere, is to disable ituneshelper.exe using autoruns or msconfig.exe. vmware-vmx CPU usage with an idle guest OS went from 38% to 8%

Coconut Wifi – Airport Menubar replacement

Wednesday, July 25th, 2007

This is what I’ve been looking for! I’ve complained before that the default Airport dropdown is hopelessly inadequate if you’re looking to discover an open access point, or select one that has the strongest signal. Thankfully, this guy went and made it happen. Awesome.

Here is a screenshot from their website that shows you what’s up:

Parallels to VMWare disk image conversion

Friday, July 13th, 2007

I’ve gotten a few questions regarding my Switch to VMWare Fusion post – namely, how do you go about converting your existing Parallels virtual disk so it’ll run inside VMWare Fusion.

Unexpected answer – VMWare Converter. This free tool is designed to convert a physical machine into a VMWare format virtual machine. Nothing says you can’t have it convert a virtual machine, usually, it just doesn’t make much sense.

You’ll need to either install VMWare converter inside your Parallels VM or do a “remote” connection to it, set a few configuration options and then let’r rip. Hope you’ve got some extra hard drive space, you’ll need room to store the additional copy of your virtual hard drive while the conversion is being performed. Enjoy.

UPDATE: VMWare has some detailed instructions on this process in their forums

VMWare Internet Connection Sharing appliance

Friday, June 22nd, 2007

I admit, I have never quite understood the push behind these VMWare appliances. For me, they fall into the giant soup of enterprise products that I can’t imagine ever using. That being said, I’ve finally found one that is quite useful for me, the non-enterprise user.

Supposedly simple task: Share the wireless connection of my IBM Thinkpad with my MacBook Pro.

Windows Internet Connection Sharing [ICS] has proven super flakey and slow, not to mention its complete lack of advanced options. After 45 minutes of pain, I gave this VMWare appliance a shot. I set up one VMNet1 to run NAT and DHCP against the host and VMNet2 to bridge with the ethernet connection. Set the VM to boot when the Thinkpad powers up. 32 Meg memory footprint. Good to go.

It just works, and the performance is fantastic.

Awesome.

I’ll also use this post to give a shout out to the best-free-product-in-the-universe, VMWare GSX Server. It really is quite amazing. It has nearly all the power of VMWare Workstation, and has some extra cool features of its own. Did I mention it is completely free?

Subscribe:

Add to Google
RSS
Try ExpanDrive

If you’ve heard of SSH then you need ExpanDrive.