Suppose, for the sake of argument, that you went completely crazy one day, and decided to build a MacFUSE file system that gives you access to the files stored on an FTP server somewhere.
Leaving aside the inarguable fact that anybody who took on such a project must be stark raving mad—if not before the project, then certainly afterward—it turns out that FTP has several qualities (so-called) that make it particularly unsuited to such a task. For the sake of curiosity, and possibly posterity, I would like to catalogue a few of these.
Persons of sensitive digestion or a strong belief in moral clarity might wish to leave the room or avert their eyes. Caveat lector.
Like a mortgage broker after the bailout, MacFUSE is going to demand all kinds of invasive personal information about each of the files and folders you are ostensibly providing access to. Of course, any program needs the name of each file, but MacFUSE in addition demands to know all kinds of other things as well:
- The type of each file (e.g., file, directory, or symbolic link),
- The size of each file (how many bytes of storage it occupies),
- The ownership of the file (to whom it belongs),
- The permissions of the file (who is permitted to read or write the file), and
- Various timestamps for the file (most importantly, when it was last modified).
Unless you can fill in all these details (and more), you don’t get to play with the other filesystems. File transfer applications like Fetch or Transmit, like the loan officers at Freddie Mac before the collapse, take a somewhat more lenient approach: They’ll use whatever personal details you provide them, but all they really need to do business is a name, and they’re good to go. As a filesystem, you have the watchful eye of the law upon you, so you have to fill in all the blanks.
To answer these questions, the FTP server provides you with exactly one tool. It is called the LIST command, and it’s supposed to tell you what files are available on the server. Unfortunately for you, the FTP Protocol does not say anything precise about what the LIST command should give you back: Most FTP servers run whatever program is used to generate directory listings on their particular operating system, and dump the results back over the network to you.
Since most FTP servers run on Unix-like operating systems, this is usually the output from the ls command. Even then, there is considerable variation in what the ls command spits out, from one Unix-like operating system to another. Windows-based FTP servers often use the dir command, but some of them emulate the output of ls instead. So, you might get this (Unix):
drwxr-sr-x 5 1 512 Jun 29 2001 admin lrwxrwxrwx 1 0 1 Jun 29 2001 archive -> . drwx--s--x 2 1 512 Jun 28 2001 bin drwxr-xr-x 2 1 512 Jun 29 2001 cas -rw-r--r-- 1 21 90112 Jun 29 2001 compress.tar drwxr-sr-x 3 1 1024 Jun 28 2001 doc d-wx--s--x 6 1 512 Jun 28 2001 etc drwxr-sr-x 2 1 512 Jun 29 2001 government -rw-r--r-- 1 21 798720 Jun 29 2001 gzip.tar
Or, you might get this (Windows):
Volume in drive C has no label. Volume Serial Number is 987F-84C0
Directory of C:\Documents and Settings\Administrator
09/27/2008 03:23 PM <DIR> . 09/27/2008 03:23 PM <DIR> .. 12/22/2008 02:44 AM <DIR> Desktop 09/24/2007 02:05 PM <DIR> Favorites 09/28/2008 02:08 PM <DIR> My Documents 09/05/2007 07:32 PM <DIR> Start Menu 0 File(s) 0 bytes 6 Dir(s) 21,312,827,392 bytes free
Or, you might even get this (OpenVMS):
TCPIP$FTP_SERVER.LOG;1 0/35 18-DEC-2008 14:02:31 [TEST] (RWED,RWED,RE,) ALPHA.TXT;1 582/595 20-FEB-1995 17:15:05 [TEST] (RWED,RWED,RE,) MYTOYS.DIR;1 1/35 18-DEC-2008 14:01:00 [TEST] (RWE,RWE,RE,E) X.COM;1 1/35 20-JUN-1995 10:21:09 [TEST] (RWED,RWED,RE,) X.PS;3 1/35 12-JUN-2007 14:29:17 [TEST] (RWED,RWED,RE,)
And lest you think I’m making things up, these are excerpts from the responses given by real FTP servers. The output of the LIST command is intended to be read by human beings, not by other programs, so no effort is made to keep things consistent from one server to another.
From this mess, you have to figure out a way to extract the name, type, size, owner, permissions, and timestamp for each file on the server. Solving this little problem occupies nearly 10% of the code that comprises the FTPFS implementation in ExpanDrive. That doesn’t even count the code that obtains the LIST output from the FTP server. And, this code has been implicated in virtually every customer-facing bug we have fixed so far with ExpanDrive’s FTP filesystem. It’s carefully written, beautiful code, and I detest every line of it personally.
The Worst of Times
So now you’ve figured out the names of your files, and their types, sizes, permissions, and ownership. But don’t sit back just yet—now you’re in for a real treat. Almost nothing compares to the thrilling excitement of trying to get an FTP server to answer the question, “when was this file last modified, anyway?”
At first glance, this looks pretty easy. Each line of a typical ls output contains a modification date, something like this:
Dec 15 11:45
Parsing a date and time written like that is no problem; it’s basically one line of Python code. But before you get too comfortable, please consider the following simple question: December 27th of what year?
If you answered anything other than “it depends,” then I find your freshly-scrubbed naïveté to be quite charming. In particular, if you assumed any date without a year must mean “this year,” you’re living in a dream world. (I lived there too, for a while—it was quite a pleasant fantasy). Because, you see, if today were December 19th, 2008, then “Dec 15 11:45″ means a quarter till noon on December 15th, 2008. But it also means that if today is February 2nd, 2009, because clearly a file cannot have been modified in December 2009 yet, right?
So, to interpret a date without a year correctly, you can start out by assuming that it is “this year,” and then check to see if that yields a time in the future. If it does, you know it must really means “last year.” Isn’t this fun?
Don’t relax now. There’s more!
Because, you see, it turns out that some dates do have a year, such as:
Apr 11 2004
In this case, you don’t have to guess about the year. However, because you are extremely observant, you will have immediately noticed that while this date does include a year, it does not include a time. So, you know on what day the file was modified, but not the time of day. Like most file systems, MacFUSE wants modification times that are accurate to the nearest second, so this is a bit of a conundrum. Unfortunately, you can’t do much better than to pick some arbitrary time on that date (say, midnight), and hope it’s all right. The FTP server knows the real modification time, but it’s not saying.
“That’s okay,” you say. “We’ll just say it’s midnight, and nobody will notice.” Oh, but they will. Your customers will log into the FTP server and check the timestamps manually against the values you reported, and they will file a bug against your software when the two do not match. This is perfectly fair: They don’t match, after all! But you can’t do a damned thing about it. Even the dates that do include a time generally only give it to the nearest minute.
So Let it Be Written
Having scaled the mountains of madness, you are now prepared to tackle the most essential tasks in a filesystem’s life—the reading and writing actual file data. Since it is, by nature, a protocol for transferring whole files, FTP provides basically two commands for this purpose: The RETR command, to retrieve a file from the server to the client, and the STOR command, to store a file from the client onto the server. By intent, these commands transfer an entire file at a time. But, since files can be very big, computers sometimes crash and networks are occasionally subject to disconnection, FTP also has a command called REST that lets you pick up a failed STOR or RETR command from where it left off.
This model is actually pretty good for file transfer. Most filesystems, however, take a slightly different approach. The “read” operation on a file generally takes a starting starting position and a number of bytes, and fetches that many bytes starting at the given position in the file. The “write” operation, in turn, takes a starting position and a list of bytes, and writes the bytes into the file starting at that location, extending the file as necessary to accommodate them. Turning a filesystem-style “read” into a FTP RETR command is a little tricky. If the filesystem asked you for 32000 bytes starting at position 25 of a file that is 2.5 gigabytes long, and you simply start fetching from position 25, it will be a long time before you can return those bytes. Users get cranky when they have to wait so long for such a small amount of data. And other programs get cranky too—if the MacOS Finder has to wait “too long” for a read or write to be completed, it may force-quit your filesystem completely.
To work around this, you have one more tool at your disposal. FTP also provides an ABOR command, allowing the client to “abort” a RETR that it has started but does not want to finish. To read 32000 bytes starting at position 25, you can resume from position 25, retrieve the file, and then after you have read the 32000 bytes you want, abort the transaction. Easy!
Something you need to know about FTP is that it uses multiple separate network connections for each session. That’s part of the reason why it’s so hard to configure network firewalls for it. One connection, the “control channel,” is for sending commands like LIST, STOR and RETR; all the other connections (“data channels”) are for sending file and file-listing data back and forth. In theory, the control channel is independent from the data channels, but in practise many FTP servers do not really pay attention to the control channel while they are sending or receiving file data. As you can imagine, this makes it hard to get your ABOR command processed correctly. What’s worse, the exact response that an FTP server is supposed to give upon receiving an ABOR command is not terribly well-defined, so different servers handle it in different ways.
What follows, therefore, is an incredibly delicate dance, evolved over many years of ad hoc FTP client and server design, for the handling of abort requests. The exact effect varies widely: Some servers handle the abort promptly and without complaint. Others throw up their hands in disgust that a client would dare to abort a transfer, and boot them from the server. On servers that do the latter, you may find that each time you read data from a file, you are forced to completely re-establish your connection to the FTP server. I’m not naming any names, but Microsoft knows who they are.
Keeping Up to Date
Take heart, you are not alone! While you are reading and writing all these files, somebody else is out there doing it too! And, if you’re especially fortunate, they’re doing it to the very same files and folders you are using.
Now, I can hear you asking, “Michael, if that’s true, then how will I know if somebody has created a new file, or deleted an existing file, or modified one of the files I am using?” My answer to you is, “that’s an excellent question, now get me a drink.”
The only way you can find out if files or folders have been created, destroyed, or changed, is to ask the server for up-to-date listings of files. That’s right, the LIST command again. The FTP server does not provide any way you can get updated information about individual files; you can only get updates if you re-load the whole directory containing the file you’re interested in. This might be time-consuming, but you have to do it, or you’ll never notice changes somebody else might have made, and wouldn’t you be confused if you tried to open a file that didn’t exist anymore?
Don’t worry. Your customers will understand when it takes a long time to browse through their files and folders because you are using the LIST command so often. I promise.
Wearing your Password on Your Sleeve
Probably the best feature of FTP is that it avoids all the messy complications of encryption and security. When you sign in to a typical FTP server, you just send your password right over the Internet in clear, unencrypted text. What could be easier than that? There’s no need to muck about with public keys, site certificates, trust roots, or any of that other nonsense. Of course, just about any yahoo with a copy of tcpdump could read your password, but that seems a small price to pay for the convenience of software developers, doesn’t it?
Many users were predictably unsatisfied with this excellent “zero-security” design of FTP, and became so agitated about it that many sites now implement a modified version of the FTP protocol called “FTPS”. This is supposed to stand for “FTP Secure,” but it shouldn’t be confused with SFTP, which is a completely unrelated protocol that is actually secure. FTPS comes in a couple of different incompatible varieties, the more common of which requires that you install custom FTP software that supports some extra security-related commands.
With FTPS, you no longer get to send your password over the network in clear text. That is, provided your FTP server supports the extensions (most don’t), and that it does so correctly (most don’t), and that you are using client software that understands FTPS (most do).
Fortunately, despite the loss of cleartext password transmission, you can still have some fun with FTPS, because once you have encrypted the FTP control channel, many network firewalls will no longer be able to handle the FTP data connections, causing your program to hang when it tries to transfer data. Some clever mensch tried to solve this problem by adding a command to disable encryption on the FTP control channel after you have sent your password, but fortunately for you, plenty of servers don’t implement it correctly, so you will have some wild times coping with the connection failures that result.
Sites that still don’t support SFTP have defended their users’ right to have their passwords compromised on a public network by keeping FTP instead. Modifying their servers to support FTPS was probably a lot more work than installing SFTP would have been, but for some hosting sites, no effort is too much for their loyal customers. So, while you may have some difficulties in implementing FTPS support for your MacFUSE filesystem, you can take comfort from the fact that your own customers will appreciate your efforts to safeguard their traditional folkways.
Reflections and Conclusions
For those of you who, in open defiance of all reason, have made the stimulating choice to implement a filesystem based on FTP, I hope that my little discussions may help you along your path. If you are in need of any further assistance, I can also recommend several excellent varieties of single-malt Scotch whisky that will provide incalculable benefit to your efforts.
Despite the sage words of Steve Frank and other upstarts, who advocate switching from FTP to SFTP, many sites continue to cling doggedly to FTP as a primary means of access to their customers’ data. Indeed, one could argue they are giving excellent access—to anyone in the world who might conceivably want it. Furthermore, these hosts break the chains of traditional patriarchalist Western rationality, which would ordinarily insist upon such details as accurate file timestamps, machine-readable status information, good performance, and the ability to play nicely with network firewalls.
Everybody’s got their little rebellions.