Technological Land Mines

Michael Fromberger July 31st, 2009

Jeff posted about the SSL vulnerability described at Black Hat this year. And he’s right: It is scary.

But rather than calling this an SSL bug, even though it sort of is, I would call this another new application of the same persistent and recurring security problems that exist in most low-level libraries and applications, due to reliance on the C language and its standard libraries. The assumption that anything string-like can be treated as a zero-terminated array of characters is pervasive not just because it’s simple, but because C is more or less the only language environment that is universally supported on every platform, from 8-bit microcontrollers up to highly-concurrent multiprocessor systems, and it supports only three basic data types (four, if you squint).

This bug, like most of the other important security threats of the past thirty years, boils down to that: Lacking a strong and expressive type system, C not only permits but encourages its programmers to sacrifice correctness, safety, robustness, testability, and maintainability in favor of some highly underdeveloped and ill-measured ideas about “performance”. Much of the infrastructure of the Internet is built out of this garbage. Robert T. Morris, Jr.’s SMTP worm in 1988 was only the first in a long series of large-scale exploits, yet even today, the same practices that made that worm possible are being deployed in new software.

It is absolutely possible to write correct, safe, robust, testable, maintainable, and high-performance code in C. But to do so requires an enormous amount of discipline and attention to detail on the part of programmers, and most of us (myself included) simply do not have the discipline, the knowledge, or the attention to detail that it requires. As a result, most of the C code you encounter in the wild is unmentionable dreck. The fact that it compiles at all is more a testament to the inhuman patience of compiler writers, than to its status as working or worthwhile code. And, to paraphrase an old saying, anybody who considers C for high-level application development at this point in history, is in a grievous state of sin.

In a sense, C is a kind of technological land mine: Easy to deploy, very powerful, and highly effective for solving certain kinds of problems. However, once it’s buried in the ground underneath your project, it can be very dangerous to those who walk in your footsteps. There’s a good reason the United Nations has a convention banning land mines; perhaps it’s time software developers considered a similar approach.

17 Responses to “Technological Land Mines”

  1. Leroy Valdecoxib Says:

    “But to do so requires an enormous amount of discipline and attention to detail on the part of programmers, and most of us (myself included) simply do not have the discipline, the knowledge, or the attention to detail that it requires.”

    But some of us do, so you stick with whatever you’re using and stop declaring what should be considered basic competence a “grievous state of sin.”

  2. Janitha Karunaratne Says:

    C is used for low level libraries for it’s lean and mean performance. It sacrifices checks and safety features for this, and allows the programmer full control. Do you see professional race cars with ABS and Automatic Stabilization? No, you give the Driver FULL and TOTAL control, same with C and other low level languages. C has only a few data types that are as basic as you can get, I mean what do you expect use something like STL strings?

    If you start having type checking and various other easy-to-code and child-safety features, you are bloating and giving up performance in the low level libraries, if this happens imagine what the performance on the higher up application level would be.

    I would like to know what your alternative suggestion is?

  3. Cartman Says:

    “But some of us do, so you stick with whatever you’re using and stop declaring what should be considered basic competence a ‘grievous state of sin.’”

    How very true…

    I call the sort of reasoning used in the above article – the hammer falacy – to describe general banishment of using a tool simply based on the fact that some people are too clueless to use it. Should hammers be abolished simply because some idiot hurt his fingers more often than he hit the intended nail ?

  4. David Says:

    Michael, I’d be willing to listen to your point of view on C, but you provide no solution to the problems that C solves. C allows you to be extremely specific about every action, and at the same time presents us with what has become a familiar syntax. For the kind of work I do, that’s a perfect balance. Can you name me a language that is better suited for network and hardware programming?

  5. Jeff Mancuso Says:

    @Leroy – it’s ridiculous for you to assume that you, and others like you, always get it right. You can’t. Someday a mistake will be made and with the paradigm as-is, getting it wrong proves deadly.

    @David – Python+Ctypes, that’s what we use for ExpanDrive

  6. Janitha Karunaratne Says:

    @Jeff

    Did you just suggest Python+CTypes for Network and Hardware programming (from David’s comment). Head explodes.

  7. Leroy Valdecoxib Says:

    @Jeff – I’m not assuming that anybody gets it right all the time. I certainly am not saying that I do. All programmers write bugs, it’s the nature of the beast, but it’s the nature of the software that’s being written that defines the severity of the hole. If SSL was written in Python+CTypes, there would most definitely be holes in it too, and saying “C is hard, we shouldn’t use it anymore because we’ve got a generation of lazy programmers who can’t be bothered to keep an eye on things” is stupid. Eventually programmers too lazy to write decent code in C become programmers too lazy to write decent Python, and then we’ll be back with some talking head saying Python is too hard. We don’t expect enough anymore out of programmers, and it’s just getting worse.

  8. Jeff Mancuso Says:

    @Janitha - That is indeed what I suggested. When you say network and hardware programming – I obvious don’t suggest you start writing drivers in Python [despite the obvious impossibility], but yes, usermode networking code and hardware interaction code can certainly be done well and more than fast enough. There will certainly be cases when you want to go all C, as there are cases in C when you want to go pure ASM.

  9. Michael Fromberger Says:

    @David — There is no doubt that there are some fairly narrow problem domains for which C is a reasonable choice. Building low-level device drivers to interface with memory-mapped hardware is one example. However, I would not put “network” in the same category as “hardware” — most of the software that people use on a day-to-day basis to interact with networks could more productively be handled in type-safe languages like CAML. But honestly, I think almost any language with a more expressive type system would probably be fine. I happen personally to be fond of functional languages with strong static type systems, but pick your poison.

    It is an unfortunate reality that even for those problem domains where C is appropriate, the language is not precisely enough specified to allow portable development. Even very simple language features like the register and volatile keywords, the order of side-effects between sequence points, the order of initialization of variables, and the platform-dependent mutability of string constants, can bite the unwary developer. I have written hardware interfaces in C, and they are very hard to get right. Of course, such problems are always hard to get right, but one hopes at least that errors will be visible in the source code, rather than falling out of unstated slack in the language spec.

    More importantly, however, I believe most of the programming done on a day-to-day basis by most of the world’s programmers is not the kind for which C is appropriate. And yet, new projects are built with it every day, not because it’s good, but because it’s there and because it’s familiar.

    @Janitha — The “performance” argument is nearly always a red herring; we should worry about performance after we’ve got a correct program, not before.

    If you start having type checking and various other easy-to-code and child-safety features, you are bloating and giving up performance in the low level libraries, if this happens imagine what the performance on the higher up application level would be.

    To me, this sounds like you are saying, “if we have to start making sure our programs don’t contain errors, we will wind up making them slower.” I have to ask: Does it really matter how fast our programs run, if they don’t work right?

    Furthermore, it doesn’t matter what we “imagine” the effect on performance might be — only what the effect actually is. Actual performance data are often surprising and counter to intuition. Anybody who wishes to risk making a program wrong in order to make it run faster should be prepared to substantiate their desire with measurements, not suppositions. As William Wulf said, “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason.”

  10. Ken Says:

    “it supports only three basic data types (four, if you squint)”

    Fixnums, arrays, structs, unions?

  11. Jose Says:

    Ok, so we don’t use C, let’s use something “better”: Java, .NET C#, or python. Theoretically “better”, but in my experience, SLOOOOOOW as a turtle,and FAT programs. (and I really like and use python)

    Hey you want a dog house? You call the class City building, extract the subclass Small city building,from there extract floor plant,then extract room, and finally copy instance door, and scale it, do the same with different buildings with the roof, and the concrete. Hey, you got a house, right?.

    So instead of google earth, let’s have a Nasa World wind, and wait, wait and wait for everything you do. Let’s do a notetaking app(really simple one) and make it crap, let’s make a tomboy for notetaking, or beagle for search.

    I’m sorry, I think char strings are a really bad solution, but the XML thing is just as bad. I use my own lib to make everystring a number, every word a number. Makes programs that do the same things with 100 times less code, thousands times faster, and I have measured it, this is not a theoretical statement like yours.

    IMHO, I see people complaining about c that don’t use c. They are like: Hey, I don’t do that, this is the past, I only care about the future, EVERYONE SHOULD DO WHAT I DO!!! And the world would be better, you know what? The world would be better in your sense, BUT it would be worse in a lot of others areas you don’t care.

    highly underdeveloped and ill-measured ideas about…: That defines your way of thinking, other people reasons are ill-measured, let me guess: Your way is the way, and only way.

    I’m with Linus Torvalds, when asked why the linux kernel was c only, he answer: “Not really a c thing, of course you can do a lot of things you do in c in other languages as well and do it well, with good programmers, but I don’t want to attract the EASY PROGRAMMERS that this action will trigger. Those people that ONLY program in high level because they are TOO LAZY to do otherwise.

    Low level stuff,like SSL should be done it low lever stuff, that it is not perfect? of course, in a big programming language like C # it would be worse. One of the things I learnt when I studied to dissasemble and crack programs(as a hobby when I was a boy) was that big programs are so much, so much, so much easy to crack and find vulnerabilities. In theory they should not have(interpreted code and so), in practice is another history. In theory there is no difference between theory and practice, in practice there is.

    Good night

  12. Greg Herlein Says:

    From my experience, most of the code encountered in the wild is a wreck. It takes discipline to write good code in any language. The worst projects I have encountered use high level libraries without even a clue of what they pull in or depend on – leading to enormous binaries with performance of a sloth. I would argue that there are security issues in many higher level libraries too – but the vast majority of programmers are too unfamiliar with those vast libraries to even approach fixing them.

    I get tired of this “it’s all C’s fault” argument. Bugs are the responsibility of the programmer. Period. If you code with the sharpest knife in the tool chest – C – then learn to use it well. Yes, some will fail, and they will pay the price. But to blame the language is silly. C is a tool that’s perfectly appropriate for many – but not all – software projects.

  13. Kevin Stewart Says:

    “It’s a poor craftsman who blames his tools.”

  14. Michael Says:

    To all you staunch defenders of C, I’d ask you: why not do everything in ASM? The answer is that it would be exceedingly tedious, and your work would be prone to error. But if that’s all you did, day in and day out (and some of you have been there), you can speed up the process by reusing chunks of code and developing habits that protect you from making those mistakes. Why then move to C?

    Because it takes care of the details. It embodies years of experience with ASM. The C compiler is now so optimized that there are few human coders skilled enough to turn out faster code. Given two programmers of equal skill, the one using C will generally be much more productive than the one using ASM.

    In essence, we’ve now done with C what we originally did with ASM. We’ve compensated for the language’s shortcomings by developing reusable libraries and personal programming habits. By doing the same thing we’d have done with ASM PLUS leveraging the higher level syntax, we’ve become more productive (and correspondingly less buggy). We let the C compiler take care of details for which the best practices became well known.

    Why then is it so hard to imagine taking the next step beyond C would be any different? No one is suggesting abolishing it entirely, but the fact is that it is prone to bugs by design and shouldn’t be the default choice simply based on performance arguments.

  15. Jon Shea Says:

    @Janitha: As someone pointed out on the HN thread, ABS and traction control are so effective that they were banned in Formula 1 and NASCAR because they make the sport too easy.

    @Jose: There’s a lot of room for an architectural compromise between C (which doesn’t even have a string native string type) and XML. In particular, if C had a opaque string type with operations that were automatically buffer length safe, then a whole class of security bugs might be wiped out.

    @Kevin: Part of the logic behind that cliche is that a good craftsman would use tools that are so good they can’t be blamed. Even a very talented carpenter can’t do great work with a dull saw or chisel. We use better tools than C.

  16. Kevin Stewart Says:

    @Jon Actually, a good craftsman knows which tool to use for which job. I RARELY reach for C these days (write most of my code in Ruby). However, I know C and know when I need to use it (these days only for wringing out extra performance when changing algorithms won’t do or creating a RubyGem around an existing C library)

    The only time one should reach for C as their FIRST choice is if you are writing a) an operating system, b) a device driver or as a distant 3rd c) a compiler (which you can write in other languages these days.

  17. oomu Says:

    everything near hardware and needing performance (as SSL does) have to be written in C

    it’s portable, it’s efficient , it’s standard. Yes it’s dangerous (because it’s portable, efficient and standard)

    Applications can be wrote in other upper language as Objective C or Python. The loss of performance is compensated by the features of the language.

    Anything else is a moot point.

Leave a Reply

Subscribe:

Add to Google
RSS
Try ExpanDrive

If you’ve heard of SSH then you need ExpanDrive.