Posts Tagged ‘File Systems’

FTP Considered Harmful

Monday, February 2nd, 2009

Suppose, for the sake of argument, that you went completely crazy one day, and decided to build a MacFUSE file system that gives you access to the files stored on an FTP server somewhere.  

Leaving aside the inarguable fact that anybody who took on such a project must be stark raving mad—if not before the project, then certainly afterward—it turns out that FTP has several qualities (so-called) that make it particularly unsuited to such a task. For the sake of curiosity, and possibly posterity, I would like to catalogue a few of these.

Persons of sensitive digestion or a strong belief in moral clarity might wish to leave the room or avert their eyes. Caveat lector.

Essential Truths

Like a mortgage broker after the bailout, MacFUSE is going to demand all kinds of invasive personal information about each of the files and folders you are ostensibly providing access to. Of course, any program needs the name of each file, but MacFUSE in addition demands to know all kinds of other things as well:

  • The type of each file (e.g., file, directory, or symbolic link),
  • The size of each file (how many bytes of storage it occupies),
  • The ownership of the file (to whom it belongs),
  • The permissions of the file (who is permitted to read or write the file), and
  • Various timestamps for the file (most importantly, when it was last modified).

Unless you can fill in all these details (and more), you don’t get to play with the other filesystems. File transfer applications like Fetch or Transmit, like the loan officers at Freddie Mac before the collapse, take a somewhat more lenient approach: They’ll use whatever personal details you provide them, but all they really need to do business is a name, and they’re good to go. As a filesystem, you have the watchful eye of the law upon you, so you have to fill in all the blanks.

To answer these questions, the FTP server provides you with exactly one tool. It is called the LIST command, and it’s supposed to tell you what files are available on the server. Unfortunately for you, the FTP Protocol does not say anything precise about what the LIST command should give you back: Most FTP servers run whatever program is used to generate directory listings on their particular operating system, and dump the results back over the network to you.

Since most FTP servers run on Unix-like operating systems, this is usually the output from the ls command. Even then, there is considerable variation in what the ls command spits out, from one Unix-like operating system to another. Windows-based FTP servers often use the dir command, but some of them emulate the output of ls instead. So, you might get this (Unix):


drwxr-sr-x   5 1            512 Jun 29  2001 admin
lrwxrwxrwx   1 0              1 Jun 29  2001 archive -> .
drwx--s--x   2 1            512 Jun 28  2001 bin
drwxr-xr-x   2 1            512 Jun 29  2001 cas
-rw-r--r--   1 21         90112 Jun 29  2001 compress.tar
drwxr-sr-x   3 1           1024 Jun 28  2001 doc
d-wx--s--x   6 1            512 Jun 28  2001 etc
drwxr-sr-x   2 1            512 Jun 29  2001 government
-rw-r--r--   1 21        798720 Jun 29  2001 gzip.tar

Or, you might get this (Windows):


Volume in drive C has no label.
Volume Serial Number is 987F-84C0
Directory of C:\Documents and Settings\Administrator
09/27/2008 03:23 PM <DIR> . 09/27/2008 03:23 PM <DIR> .. 12/22/2008 02:44 AM <DIR> Desktop 09/24/2007 02:05 PM <DIR> Favorites 09/28/2008 02:08 PM <DIR> My Documents 09/05/2007 07:32 PM <DIR> Start Menu 0 File(s) 0 bytes 6 Dir(s) 21,312,827,392 bytes free

Or, you might even get this (OpenVMS):


Directory DISK$USR_THINGY:[TEST]
TCPIP$FTP_SERVER.LOG;1 0/35 18-DEC-2008 14:02:31 [TEST] (RWED,RWED,RE,) ALPHA.TXT;1 582/595 20-FEB-1995 17:15:05 [TEST] (RWED,RWED,RE,) MYTOYS.DIR;1 1/35 18-DEC-2008 14:01:00 [TEST] (RWE,RWE,RE,E) X.COM;1 1/35 20-JUN-1995 10:21:09 [TEST] (RWED,RWED,RE,) X.PS;3 1/35 12-JUN-2007 14:29:17 [TEST] (RWED,RWED,RE,)

And lest you think I’m making things up, these are excerpts from the responses given by real FTP servers. The output of the LIST command is intended to be read by human beings, not by other programs, so no effort is made to keep things consistent from one server to another.

From this mess, you have to figure out a way to extract the name, type, size, owner, permissions, and timestamp for each file on the server. Solving this little problem occupies nearly 10% of the code that comprises the FTPFS implementation in ExpanDrive. That doesn’t even count the code that obtains the LIST output from the FTP server. And, this code has been implicated in virtually every customer-facing bug we have fixed so far with ExpanDrive’s FTP filesystem. It’s carefully written, beautiful code, and I detest every line of it personally.

The Worst of Times

So now you’ve figured out the names of your files, and their types, sizes, permissions, and ownership. But don’t sit back just yet—now you’re in for a real treat. Almost nothing compares to the thrilling excitement of trying to get an FTP server to answer the question, “when was this file last modified, anyway?”

At first glance, this looks pretty easy. Each line of a typical ls output contains a modification date, something like this:

Dec 15 11:45

Parsing a date and time written like that is no problem; it’s basically one line of Python code. But before you get too comfortable, please consider the following simple question: December 27th of what year?

If you answered anything other than “it depends,” then I find your freshly-scrubbed naïveté to be quite charming. In particular, if you assumed any date without a year must mean “this year,” you’re living in a dream world. (I lived there too, for a while—it was quite a pleasant fantasy). Because, you see, if today were December 19th, 2008, then “Dec 15 11:45″ means a quarter till noon on December 15th, 2008. But it also means that if today is February 2nd, 2009, because clearly a file cannot have been modified in December 2009 yet, right?

So, to interpret a date without a year correctly, you can start out by assuming that it is “this year,” and then check to see if that yields a time in the future. If it does, you know it must really means “last year.” Isn’t this fun?

Don’t relax now. There’s more!

Because, you see, it turns out that some dates do have a year, such as:

Apr 11  2004

In this case, you don’t have to guess about the year. However, because you are extremely observant, you will have immediately noticed that while this date does include a year, it does not include a time. So, you know on what day the file was modified, but not the time of day. Like most file systems, MacFUSE wants modification times that are accurate to the nearest second, so this is a bit of a conundrum. Unfortunately, you can’t do much better than to pick some arbitrary time on that date (say, midnight), and hope it’s all right. The FTP server knows the real modification time, but it’s not saying.

“That’s okay,” you say. “We’ll just say it’s midnight, and nobody will notice.” Oh, but they will. Your customers will log into the FTP server and check the timestamps manually against the values you reported, and they will file a bug against your software when the two do not match. This is perfectly fair: They don’t match, after all! But you can’t do a damned thing about it. Even the dates that do include a time generally only give it to the nearest minute.

So Let it Be Written

Having scaled the mountains of madness, you are now prepared to tackle the most essential tasks in a filesystem’s life—the reading and writing actual file data. Since it is, by nature, a protocol for transferring whole files, FTP provides basically two commands for this purpose: The RETR command, to retrieve a file from the server to the client, and the STOR command, to store a file from the client onto the server. By intent, these commands transfer an entire file at a time. But, since files can be very big, computers sometimes crash and networks are occasionally subject to disconnection, FTP also has a command called REST that lets you pick up a failed STOR or RETR command from where it left off.

This model is actually pretty good for file transfer. Most filesystems, however, take a slightly different approach. The “read” operation on a file generally takes a starting starting position and a number of bytes, and fetches that many bytes starting at the given position in the file. The “write” operation, in turn, takes a starting position and a list of bytes, and writes the bytes into the file starting at that location, extending the file as necessary to accommodate them. Turning a filesystem-style “read” into a FTP RETR command is a little tricky. If the filesystem asked you for 32000 bytes starting at position 25 of a file that is 2.5 gigabytes long, and you simply start fetching from position 25, it will be a long time before you can return those bytes. Users get cranky when they have to wait so long for such a small amount of data. And other programs get cranky too—if the MacOS Finder has to wait “too long” for a read or write to be completed, it may force-quit your filesystem completely.

To work around this, you have one more tool at your disposal. FTP also provides an ABOR command, allowing the client to “abort” a RETR that it has started but does not want to finish. To read 32000 bytes starting at position 25, you can resume from position 25, retrieve the file, and then after you have read the 32000 bytes you want, abort the transaction. Easy!

Well, almost.

Something you need to know about FTP is that it uses multiple separate network connections for each session. That’s part of the reason why it’s so hard to configure network firewalls for it. One connection, the “control channel,” is for sending commands like LIST, STOR and RETR; all the other connections (“data channels”) are for sending file and file-listing data back and forth. In theory, the control channel is independent from the data channels, but in practise many FTP servers do not really pay attention to the control channel while they are sending or receiving file data. As you can imagine, this makes it hard to get your ABOR command processed correctly. What’s worse, the exact response that an FTP server is supposed to give upon receiving an ABOR command is not terribly well-defined, so different servers handle it in different ways.

What follows, therefore, is an incredibly delicate dance, evolved over many years of ad hoc FTP client and server design, for the handling of abort requests. The exact effect varies widely: Some servers handle the abort promptly and without complaint. Others throw up their hands in disgust that a client would dare to abort a transfer, and boot them from the server. On servers that do the latter, you may find that each time you read data from a file, you are forced to completely re-establish your connection to the FTP server. I’m not naming any names, but Microsoft knows who they are.

Keeping Up to Date

Take heart, you are not alone! While you are reading and writing all these files, somebody else is out there doing it too! And, if you’re especially fortunate, they’re doing it to the very same files and folders you are using.

Now, I can hear you asking, “Michael, if that’s true, then how will I know if somebody has created a new file, or deleted an existing file, or modified one of the files I am using?” My answer to you is, “that’s an excellent question, now get me a drink.”

The only way you can find out if files or folders have been created, destroyed, or changed, is to ask the server for up-to-date listings of files. That’s right, the LIST command again. The FTP server does not provide any way you can get updated information about individual files; you can only get updates if you re-load the whole directory containing the file you’re interested in. This might be time-consuming, but you have to do it, or you’ll never notice changes somebody else might have made, and wouldn’t you be confused if you tried to open a file that didn’t exist anymore?

Don’t worry. Your customers will understand when it takes a long time to browse through their files and folders because you are using the LIST command so often. I promise.

Wearing your Password on Your Sleeve

Probably the best feature of FTP is that it avoids all the messy complications of encryption and security. When you sign in to a typical FTP server, you just send your password right over the Internet in clear, unencrypted text. What could be easier than that? There’s no need to muck about with public keys, site certificates, trust roots, or any of that other nonsense. Of course, just about any yahoo with a copy of tcpdump could read your password, but that seems a small price to pay for the convenience of software developers, doesn’t it?

Many users were predictably unsatisfied with this excellent “zero-security” design of FTP, and became so agitated about it that many sites now implement a modified version of the FTP protocol called “FTPS”. This is supposed to stand for “FTP Secure,” but it shouldn’t be confused with SFTP, which is a completely unrelated protocol that is actually secure. FTPS comes in a couple of different incompatible varieties, the more common of which requires that you install custom FTP software that supports some extra security-related commands.

With FTPS, you no longer get to send your password over the network in clear text. That is, provided your FTP server supports the extensions (most don’t), and that it does so correctly (most don’t), and that you are using client software that understands FTPS (most do).

Fortunately, despite the loss of cleartext password transmission, you can still have some fun with FTPS, because once you have encrypted the FTP control channel, many network firewalls will no longer be able to handle the FTP data connections, causing your program to hang when it tries to transfer data. Some clever mensch tried to solve this problem by adding a command to disable encryption on the FTP control channel after you have sent your password, but fortunately for you, plenty of servers don’t implement it correctly, so you will have some wild times coping with the connection failures that result.

Sites that still don’t support SFTP have defended their users’ right to have their passwords compromised on a public network by keeping FTP instead. Modifying their servers to support FTPS was probably a lot more work than installing SFTP would have been, but for some hosting sites, no effort is too much for their loyal customers. So, while you may have some difficulties in implementing FTPS support for your MacFUSE filesystem, you can take comfort from the fact that your own customers will appreciate your efforts to safeguard their traditional folkways.

Reflections and Conclusions

For those of you who, in open defiance of all reason, have made the stimulating choice to implement a filesystem based on FTP, I hope that my little discussions may help you along your path. If you are in need of any further assistance, I can also recommend several excellent varieties of single-malt Scotch whisky that will provide incalculable benefit to your efforts.

Despite the sage words of Steve Frank and other upstarts, who advocate switching from FTP to SFTP, many sites continue to cling doggedly to FTP as a primary means of access to their customers’ data. Indeed, one could argue they are giving excellent access—to anyone in the world who might conceivably want it. Furthermore, these hosts break the chains of traditional patriarchalist Western rationality, which would ordinarily insist upon such details as accurate file timestamps, machine-readable status information, good performance, and the ability to play nicely with network firewalls.

Everybody’s got their little rebellions.

ExpanDrive for OS X

Tuesday, March 11th, 2008

Magnetk is thrilled to announce the release of ExpanDrive for OS X, our second product. We’ve been working for a long time on ExpanDrive, and are very proud of the result. Our early adopters seem quite pleased too.

ExpanDrive builds SFTP support right into the core of OS X – just like SftpDrive builds SFTP support into Windows. Now any application on your Mac can read and write remote files as easily as if they were on a USB drive plugged into your computer. ExpanDrive even brings SFTP right into Finder, letting you manage your remote server as easily as your MacBook. We’ve already pushed two updates in response to user feedback in just the first week of open release, polishing the UI and improving user experiences all around. In the coming weeks and months ExpanDrive will expand beyond just SFTP, letting you access a wide variety of data through the filesystem. Keep an eye on this blog, as well as our Twitter feed, to keep track of the developments.

MacFUSE 1.0

Tuesday, October 30th, 2007

Amit Singh just released MacFUSE 1.0, adding a great number of new features relevant to developers, not the least of which is support for Leopard. Along with that, there are many of bug fixes and general improvements. Amit is a fulltime engineer at Google, recently penned a nearly 1,000 page book on OS X and his “spare time” authors MacFUSE. Given the scope of the changelog I’m not sure Amit sleeps. For that, we thank him.

Is MDS taking over your CPU on OS X? Try Spotless.

Sunday, June 24th, 2007

My 5 month old MacBook Pro often has the MDS process chewing about 45-50% of CPU at various points in the day. Given that MDS is supposed to sit around and casually index – so that Spotlight can quickly perform a search, this CPU usage pattern is a red flag. Chances are something is wrong with either the disk or the Spotlight metadata index [what MDS manages]. After using Disk Utility to verify the volume, I tried Spotless. It worked great.

spot10.jpg

Spotless is a nice little $15 nagware utility made by Fixamac Software that helps you delete and recreate the Spotlight metadata folder and index, so that it can let MDS rebuild from scratch. It offers a few other moderately useful features, but if MDS is getting aggressive, this makes it real easy to get a clean rebuild.

Update 8/11/09: JS points out comments that you can instruct MDS to clear out the metadata cache and rebuild from scratch using this command run from Terminal:
sudo mdutil -avE

This is effectively what Spotless does, but without the user interface.

SftpDrive v1.6.0 Released

Monday, May 14th, 2007

After much delay, SftpDrive Version 1.6 is out the door. The major enhancement is Vista support. Along with Vista, there are many small tweaks for connection stability and various bug fixes that have been added.

Going forward, we are working on SftpDrive for Windows and OS X at a breakneck pace. While we’re excited to have worked on the Slingshot project, SftpDrive remains our focus. Our shortlist includes dramatic speed enhancements, greater interoperability between various SFTP servers, and most importantly releasing the OS X client.

sshfs for Darwin

Thursday, January 18th, 2007

Another open source project working on a SFTP filesystem for the Mac can be found at:

http://mac.pqrs.org/sshfs/

While it’s certainly not production quality yet, it is certainly encouraging that other people feel that the lack of an SFTP filesystem is a rather obvious pain point.

MacFUSE — Awesome

Friday, January 12th, 2007

Amit Singh over at Google has ported FUSE to the Mac. Good Work!

FUSEFUSE is popular user space framework for Linux, and now the Mac. What this means is that it is much easier for filesystem developers to create and test new filesystem implementations. Filesystems run, usually, as extensions to the kernel of the OS – like a device driver. When code runs in the kernel, it is much more difficult to develop and debug. When code crashes in the kernel, your machine crashes. It’s hard.

FUSE tries to abstract away some of that pain for developers, by providing a “general” filesystem driver that communicates with a service in user space. Then you can write an extension to the FUSE user space service, much like normal application development, and have it extend the power of the general purpose filesystem driver. This is a real boon for programmers.

Also cool — Google has a patch for SSHFS, a fuse module, so that you can mount an SSH/SFTP server using MacFUSE.

It’s no secret that we’re working on SftpDrive for OS X, it’s nice to see other people out there are trying to get some innovative work done on OS X network filesystems. SftpDrive:Mac is our number one dev priority, and along with that iMac, we’ve got a half dozen other Macs that we’re using for development and testing. I’ve gotten quite a few emails and IMs today about MacFUSE. We’re excited that it is being done. SftpDrive for the Mac will be released as a very polished product with robust reconnect support, a simple user interface, and the a great overall experience – just like our windows client. Having SSHFS is great, but beachballing finder when your wifi connection drops or you hop to a new AP isn’t very fun.

We’re not bashing FUSE by any means – but this example illustrates certain virtues of commercial software, developed by people that really care about what they are doing. Software where developers spend an inordinate amount of time solving edge cases, so that the experience doesn’t have gaps. When you plunk down $39 for SftpDrive, you get software that effortlessly hops between wireless access points and has intelligent caching to provide a good experience while using poorly behaving applications. And just like free software, you get developers who really cared about what they are doing.

That last 20% of functionality really does take 80% of the effort.

Subscribe:

Add to Google
RSS
Try ExpanDrive

If you’ve heard of SSH then you need ExpanDrive.