FTP Considered Harmful
February 2nd, 2009Suppose, for the sake of argument, that you went completely crazy one day, and decided to build a MacFUSE file system that gives you access to the files stored on an FTP server somewhere.
Leaving aside the inarguable fact that anybody who took on such a project must be stark raving mad—if not before the project, then certainly afterward—it turns out that FTP has several qualities (so-called) that make it particularly unsuited to such a task. For the sake of curiosity, and possibly posterity, I would like to catalogue a few of these.
Persons of sensitive digestion or a strong belief in moral clarity might wish to leave the room or avert their eyes. Caveat lector.
Essential Truths
Like a mortgage broker after the bailout, MacFUSE is going to demand all kinds of invasive personal information about each of the files and folders you are ostensibly providing access to. Of course, any program needs the name of each file, but MacFUSE in addition demands to know all kinds of other things as well:
- The type of each file (e.g., file, directory, or symbolic link),
- The size of each file (how many bytes of storage it occupies),
- The ownership of the file (to whom it belongs),
- The permissions of the file (who is permitted to read or write the file), and
- Various timestamps for the file (most importantly, when it was last modified).
Unless you can fill in all these details (and more), you don’t get to play with the other filesystems. File transfer applications like Fetch or Transmit, like the loan officers at Freddie Mac before the collapse, take a somewhat more lenient approach: They’ll use whatever personal details you provide them, but all they really need to do business is a name, and they’re good to go. As a filesystem, you have the watchful eye of the law upon you, so you have to fill in all the blanks.
To answer these questions, the FTP server provides you with exactly one tool. It is called the LIST command, and it’s supposed to tell you what files are available on the server. Unfortunately for you, the FTP Protocol does not say anything precise about what the LIST command should give you back: Most FTP servers run whatever program is used to generate directory listings on their particular operating system, and dump the results back over the network to you.
Since most FTP servers run on Unix-like operating systems, this is usually the output from the ls command. Even then, there is considerable variation in what the ls command spits out, from one Unix-like operating system to another. Windows-based FTP servers often use the dir command, but some of them emulate the output of ls instead. So, you might get this (Unix):
drwxr-sr-x 5 1 512 Jun 29 2001 admin
lrwxrwxrwx 1 0 1 Jun 29 2001 archive -> .
drwx--s--x 2 1 512 Jun 28 2001 bin
drwxr-xr-x 2 1 512 Jun 29 2001 cas
-rw-r--r-- 1 21 90112 Jun 29 2001 compress.tar
drwxr-sr-x 3 1 1024 Jun 28 2001 doc
d-wx--s--x 6 1 512 Jun 28 2001 etc
drwxr-sr-x 2 1 512 Jun 29 2001 government
-rw-r--r-- 1 21 798720 Jun 29 2001 gzip.tar
Or, you might get this (Windows):
Volume in drive C has no label.
Volume Serial Number is 987F-84C0
Directory of C:\Documents and Settings\Administrator
09/27/2008 03:23 PM <DIR> .
09/27/2008 03:23 PM <DIR> ..
12/22/2008 02:44 AM <DIR> Desktop
09/24/2007 02:05 PM <DIR> Favorites
09/28/2008 02:08 PM <DIR> My Documents
09/05/2007 07:32 PM <DIR> Start Menu
0 File(s) 0 bytes
6 Dir(s) 21,312,827,392 bytes free
Or, you might even get this (OpenVMS):
Directory DISK$USR_THINGY:[TEST]
TCPIP$FTP_SERVER.LOG;1
0/35 18-DEC-2008 14:02:31 [TEST] (RWED,RWED,RE,)
ALPHA.TXT;1 582/595 20-FEB-1995 17:15:05 [TEST] (RWED,RWED,RE,)
MYTOYS.DIR;1 1/35 18-DEC-2008 14:01:00 [TEST] (RWE,RWE,RE,E)
X.COM;1 1/35 20-JUN-1995 10:21:09 [TEST] (RWED,RWED,RE,)
X.PS;3 1/35 12-JUN-2007 14:29:17 [TEST] (RWED,RWED,RE,)
And lest you think I’m making things up, these are excerpts from the responses given by real FTP servers. The output of the LIST command is intended to be read by human beings, not by other programs, so no effort is made to keep things consistent from one server to another.
From this mess, you have to figure out a way to extract the name, type, size, owner, permissions, and timestamp for each file on the server. Solving this little problem occupies nearly 10% of the code that comprises the FTPFS implementation in ExpanDrive. That doesn’t even count the code that obtains the LIST output from the FTP server. And, this code has been implicated in virtually every customer-facing bug we have fixed so far with ExpanDrive’s FTP filesystem. It’s carefully written, beautiful code, and I detest every line of it personally.
The Worst of Times
So now you’ve figured out the names of your files, and their types, sizes, permissions, and ownership. But don’t sit back just yet—now you’re in for a real treat. Almost nothing compares to the thrilling excitement of trying to get an FTP server to answer the question, “when was this file last modified, anyway?”
At first glance, this looks pretty easy. Each line of a typical ls output contains a modification date, something like this:
Dec 15 11:45
Parsing a date and time written like that is no problem; it’s basically one line of Python code. But before you get too comfortable, please consider the following simple question: December 27th of what year?
If you answered anything other than “it depends,” then I find your freshly-scrubbed naïveté to be quite charming. In particular, if you assumed any date without a year must mean “this year,” you’re living in a dream world. (I lived there too, for a while—it was quite a pleasant fantasy). Because, you see, if today were December 19th, 2008, then “Dec 15 11:45″ means a quarter till noon on December 15th, 2008. But it also means that if today is February 2nd, 2009, because clearly a file cannot have been modified in December 2009 yet, right?
So, to interpret a date without a year correctly, you can start out by assuming that it is “this year,” and then check to see if that yields a time in the future. If it does, you know it must really means “last year.” Isn’t this fun?
Don’t relax now. There’s more!
Because, you see, it turns out that some dates do have a year, such as:
Apr 11 2004
In this case, you don’t have to guess about the year. However, because you are extremely observant, you will have immediately noticed that while this date does include a year, it does not include a time. So, you know on what day the file was modified, but not the time of day. Like most file systems, MacFUSE wants modification times that are accurate to the nearest second, so this is a bit of a conundrum. Unfortunately, you can’t do much better than to pick some arbitrary time on that date (say, midnight), and hope it’s all right. The FTP server knows the real modification time, but it’s not saying.
“That’s okay,” you say. “We’ll just say it’s midnight, and nobody will notice.” Oh, but they will. Your customers will log into the FTP server and check the timestamps manually against the values you reported, and they will file a bug against your software when the two do not match. This is perfectly fair: They don’t match, after all! But you can’t do a damned thing about it. Even the dates that do include a time generally only give it to the nearest minute.
So Let it Be Written
Having scaled the mountains of madness, you are now prepared to tackle the most essential tasks in a filesystem’s life—the reading and writing actual file data. Since it is, by nature, a protocol for transferring whole files, FTP provides basically two commands for this purpose: The RETR command, to retrieve a file from the server to the client, and the STOR command, to store a file from the client onto the server. By intent, these commands transfer an entire file at a time. But, since files can be very big, computers sometimes crash and networks are occasionally subject to disconnection, FTP also has a command called REST that lets you pick up a failed STOR or RETR command from where it left off.
This model is actually pretty good for file transfer. Most filesystems, however, take a slightly different approach. The “read” operation on a file generally takes a starting starting position and a number of bytes, and fetches that many bytes starting at the given position in the file. The “write” operation, in turn, takes a starting position and a list of bytes, and writes the bytes into the file starting at that location, extending the file as necessary to accommodate them. Turning a filesystem-style “read” into a FTP RETR command is a little tricky. If the filesystem asked you for 32000 bytes starting at position 25 of a file that is 2.5 gigabytes long, and you simply start fetching from position 25, it will be a long time before you can return those bytes. Users get cranky when they have to wait so long for such a small amount of data. And other programs get cranky too—if the MacOS Finder has to wait “too long” for a read or write to be completed, it may force-quit your filesystem completely.
To work around this, you have one more tool at your disposal. FTP also provides an ABOR command, allowing the client to “abort” a RETR that it has started but does not want to finish. To read 32000 bytes starting at position 25, you can resume from position 25, retrieve the file, and then after you have read the 32000 bytes you want, abort the transaction. Easy!
Well, almost.
Something you need to know about FTP is that it uses multiple separate network connections for each session. That’s part of the reason why it’s so hard to configure network firewalls for it. One connection, the “control channel,” is for sending commands like LIST, STOR and RETR; all the other connections (“data channels”) are for sending file and file-listing data back and forth. In theory, the control channel is independent from the data channels, but in practise many FTP servers do not really pay attention to the control channel while they are sending or receiving file data. As you can imagine, this makes it hard to get your ABOR command processed correctly. What’s worse, the exact response that an FTP server is supposed to give upon receiving an ABOR command is not terribly well-defined, so different servers handle it in different ways.
What follows, therefore, is an incredibly delicate dance, evolved over many years of ad hoc FTP client and server design, for the handling of abort requests. The exact effect varies widely: Some servers handle the abort promptly and without complaint. Others throw up their hands in disgust that a client would dare to abort a transfer, and boot them from the server. On servers that do the latter, you may find that each time you read data from a file, you are forced to completely re-establish your connection to the FTP server. I’m not naming any names, but Microsoft knows who they are.
Keeping Up to Date
Take heart, you are not alone! While you are reading and writing all these files, somebody else is out there doing it too! And, if you’re especially fortunate, they’re doing it to the very same files and folders you are using.
Now, I can hear you asking, “Michael, if that’s true, then how will I know if somebody has created a new file, or deleted an existing file, or modified one of the files I am using?” My answer to you is, “that’s an excellent question, now get me a drink.”
The only way you can find out if files or folders have been created, destroyed, or changed, is to ask the server for up-to-date listings of files. That’s right, the LIST command again. The FTP server does not provide any way you can get updated information about individual files; you can only get updates if you re-load the whole directory containing the file you’re interested in. This might be time-consuming, but you have to do it, or you’ll never notice changes somebody else might have made, and wouldn’t you be confused if you tried to open a file that didn’t exist anymore?
Don’t worry. Your customers will understand when it takes a long time to browse through their files and folders because you are using the LIST command so often. I promise.
Wearing your Password on Your Sleeve
Probably the best feature of FTP is that it avoids all the messy complications of encryption and security. When you sign in to a typical FTP server, you just send your password right over the Internet in clear, unencrypted text. What could be easier than that? There’s no need to muck about with public keys, site certificates, trust roots, or any of that other nonsense. Of course, just about any yahoo with a copy of tcpdump could read your password, but that seems a small price to pay for the convenience of software developers, doesn’t it?
Many users were predictably unsatisfied with this excellent “zero-security” design of FTP, and became so agitated about it that many sites now implement a modified version of the FTP protocol called “FTPS”. This is supposed to stand for “FTP Secure,” but it shouldn’t be confused with SFTP, which is a completely unrelated protocol that is actually secure. FTPS comes in a couple of different incompatible varieties, the more common of which requires that you install custom FTP software that supports some extra security-related commands.
With FTPS, you no longer get to send your password over the network in clear text. That is, provided your FTP server supports the extensions (most don’t), and that it does so correctly (most don’t), and that you are using client software that understands FTPS (most do).
Fortunately, despite the loss of cleartext password transmission, you can still have some fun with FTPS, because once you have encrypted the FTP control channel, many network firewalls will no longer be able to handle the FTP data connections, causing your program to hang when it tries to transfer data. Some clever mensch tried to solve this problem by adding a command to disable encryption on the FTP control channel after you have sent your password, but fortunately for you, plenty of servers don’t implement it correctly, so you will have some wild times coping with the connection failures that result.
Sites that still don’t support SFTP have defended their users’ right to have their passwords compromised on a public network by keeping FTP instead. Modifying their servers to support FTPS was probably a lot more work than installing SFTP would have been, but for some hosting sites, no effort is too much for their loyal customers. So, while you may have some difficulties in implementing FTPS support for your MacFUSE filesystem, you can take comfort from the fact that your own customers will appreciate your efforts to safeguard their traditional folkways.
Reflections and Conclusions
For those of you who, in open defiance of all reason, have made the stimulating choice to implement a filesystem based on FTP, I hope that my little discussions may help you along your path. If you are in need of any further assistance, I can also recommend several excellent varieties of single-malt Scotch whisky that will provide incalculable benefit to your efforts.
Despite the sage words of Steve Frank and other upstarts, who advocate switching from FTP to SFTP, many sites continue to cling doggedly to FTP as a primary means of access to their customers’ data. Indeed, one could argue they are giving excellent access—to anyone in the world who might conceivably want it. Furthermore, these hosts break the chains of traditional patriarchalist Western rationality, which would ordinarily insist upon such details as accurate file timestamps, machine-readable status information, good performance, and the ability to play nicely with network firewalls.
Everybody’s got their little rebellions.



February 3rd, 2009 at 12:21 am
This is, by far, one of the best rants I’ve read in a long time. If it were easier to do so through the internet (using a protocol other than FTP), I would buy you a drink.
February 3rd, 2009 at 12:29 pm
Nice text. I missed one gotcha in the discussion about time stamps: they do not list a time zone, so you must hope to guess that correctly. This can be a big problem when attempting to synchronize directories.
February 3rd, 2009 at 12:46 pm
Someone writes:
That’s absolutely true. In fact, I would go so far as to say that using FTP as a backup mechanism is dangerous, unless you care only about the contents of your files and not their names, permissions, and timestamps. Trying to synchronize file metadata via FTP is basically doomed to failure, since the protocol provides no standard way to modify any of these things. Some FTP servers support commands like SITE CHMOD and SITE UTIME, and others the MDTM extension from RFC 3659, but that support is by no means widespread, so you can’t rely upon it at all. Some FTP servers I’ve seen even modify the names of files—silently renaming “My File” to “My_File” for example.
If you really want to back up your data on a remote server, and FTP is your only option, you are much better off copying your data into a disk image file, and just transferring the disk image to the remote server. I’ve had some success mounting remote disk images in the Finder, though I haven’t yet tried synchronizing to an image mounted that way.
February 3rd, 2009 at 2:21 pm
… And gasoline is a terrible fuel, it cost a lot to make, its highly toxic, combustible, corrosive and dangerous to store. Why would anyone ever use such a fuel …. Because that’s what other people do. You’re right in your assessment of the failings and limitations of FTP, it was designed long ago for a very different computing environment, and looking at it with today’s requirements really shows its shortcomings, but the reason that I or anyone else would want an FTP client is simply because there are FTP servers and until one changes the other is not likely to change either. Blaming the protocol for not measuring up with modern requirements is like blaming your phone for not knowing where anyone you know is at any point in time. Sure the two phones can talk to one another, but locating each other is just not something they were designed for.
February 3rd, 2009 at 3:00 pm
A regular FTP apologist!
The flaw with your argument is that it is easy and inexpensive for users and administrators to swap out FTP in favor of SFTP. They’ve also had decades to perform the task and educate users. With your car analogy are huge cost obstacles and infrastructure costs that do not exist in this scenario.
February 3rd, 2009 at 3:34 pm
I’m not apologizing for FTP at all (re-read my post) FTP needs to go. For that matter, so do POP and SMTP, they’re just too outdated. Yes, it’s inexpensive and relatively easy to change, but the problem is with people. For crying out loud, they’ve had 10 years to get ready for digital TV and they’re in shock because they actually have to “do something” to change over. What I was trying to say is that providers use FTP because people have a client on-hand, and people have the client because there are so many FTP servers. It doesn’t matter to them that it’s a bad protocol, just that they don’t have to think about it (like putting gas in your car). What “needs” to happen is for servers to just not support FTP anymore, but the trick is in getting everyone to agree on a new alternative.
February 3rd, 2009 at 5:58 pm
Ray, I mostly agree with your points about why old and crufty protocols refuse to die, though I also think Jeff is right that automobiles don’t make a good analogy for that particular problem.
My intent, however, wasn’t merely to point out how old and horrible FTP is (although it is, and I did). Rather, I wanted to highlight some of the impedance mismatches that arise when you try to apply a technology that was designed for one task (human-controlled bulk file transfer, for example) to a related but different problem (in this case, the implementation of a filesystem).
If someone wanted to summarize my intended message in a single sentence, I’d prefer it to read “You’d have to be crazy to build a filesystem on top of FTP,” rather than, “You’d have to be crazy to use FTP at all.” Although, to be fair, the latter might also be true. ;)
February 3rd, 2009 at 6:23 pm
The funny thing is that I never talked about cars, automobiles, or infrastructure. Gasoline is just something that people use for convenience, hardly ever thinking about how dangerous it really is. (like FTP and other unsecured protocols)
“Impedance Mismatch” … That’s an incredibly succinct description, it really is two things that you would think are the same, but the really aren’t at all. I’ll go you one better, you’d almost have to be crazy to build a file transfer protocol on top of FTP
February 7th, 2009 at 10:10 am
Great article. As someone who has actually written the LIST parsers you are talking about I feel your pain. It seems that a standard LIST format would have been something they would have included in RFC 959, but unfortunately no. In fact, it is different for almost every platform (Windows, AS/400, UNIX etc. etc.) each of which has it’s own format.
If you are lucky, the server will support the MLSD command which is a common format listing. Not all servers support this so you have to use the FEAT command to see if the command is supported (and not all servers support the FEAT command either). Enjoy :)
February 7th, 2009 at 2:59 pm
Thanks, Van!
I debated talking about MLST and MLSD, but the post was already getting kind of long, so I left it out. In fact, ExpanDrive uses MLSD whenever it’s available, but, as you say, many servers do not support it, so it’s a mixed blessing.
More subtly, the permission flags returned by MLSD do not correspond directly to the POSIX file bits, so on servers that do include this information in the LIST output, you have to use LIST anyway. Worse, MLSD listings do not always distinguish symbolic links from normal files, nor do they give you the target of a symbolic link, while the “ls” style output usually does. For file transfer, these distinctions are not terribly important, but a filesystem has to get it right.
The reason we use MLSD at all is that the timestamps it returns are accurate to the second, rather than to the day, and are guaranteed (assuming we trust the FTP server to implement the standard faithfully) to be in UTC. So far, that seems to be mostly true, but I have been extremely unimpressed with the quality of most FTP server implementations, so nothing would surprise me at this point.
December 2nd, 2009 at 9:33 am
Important info for anyone who needs backlinks for no cost. Anyone want free one way backlinks for their webite? I figured I would distribute some good information I discovered a little while ago. Free one way backlinks for your website. I have been using this for my websites and it absolutely works great! Just click my name to get what I mean. Not peddling a thing, it’s completely free and it works.