Five-Dollar Words for Programmers, Part One: Idempotence

Five-Dollar Words for Programmers, Part One: Idempotence

Rate This
  • Comments 20

Programmers, particularly those with a mathematical background, often use words from mathematics when describing their systems. Unfortunately, they also often do so without consideration of the non-mathematical background of their colleagues. I thought I might talk today a bit about the word "idempotent". This is a very easy concept but lots of people don't know that there is a word for it.

There are two closely related definitions for "idempotent. A value is "idempotent under function foo" if the result of doing foo to the value results in the value right back again.

function is "idempotent" if the result of doing it twice (feeding the output of the first call into the second call) is exactly the same as the result of doing it once. (Or, in other words, every output of the function is idempotent under it.)

The most trivial mathematical example of the second kind is the constant function.  If f(x) = c, then clearly f(x) = f(f(x)) = f(f(f(x))) ... so f is idempotent (and the constant is idempotent under it). The identity function f(x) = x is also idempotent (and every value is idempotent under it). The function that takes a set of numbers and returns a set containing its largest element is idempotent (and every one-element set is idempotent under it). I'm sure you can think of lots of examples of idempotent functions.

To get a handle on the other sense, pick an operation -- say, doubling.  The only value which is idempotent under that operation is zero. The operation "subtracting any non-zero value" has no idempotent values. Squaring has two idempotent values, zero and one.

The second characterization of this concept comes up all the time in practical programming, particularly around caching logic. Usually when used in the computer science sense we don't mean that the effect of the function is invariant under composition, but rather that it is invariant over the number of calls. For example, I don't know how many times I've written:

HRESULT GetTypeLibCreator(ICreateTypeLib2 ** ppctl) {
  if (this->m_pctl == NULL) {
    HRESULT hr;
    hr = CreateTypeLib2(SYS_WIN32, pszName, &this->m_pctl);
    if (FAILED(hr)) return hr;
  }
  *ppctl = this->m_pctl;
  this->m_pctl->AddRef();
  return S_OK;
}

A nice little idempotent function -- calling it two, three, n times has exactly the same result as calling it once.

The place you see the other characterization of idempotence all the time is in C++ header files. Include a needed header zero times and you'll get "not defined" errors. Accidentally include it twice and you'll get "redefinition" errors. It's a major pain to make sure that every header file is included exactly once. Therefore, most headers use  some trick to make them idempotent under the inclusion operation:

#ifndef STUFF_H_INCLUDED
#define STUFF_H_INCLUDED

// headers here

#endif // STUFF_H_INCLUDED

or in more modern systems, the #pragma once directive makes headers idempotent under inclusion.

  • Good stuff... I approve.
  • Funny stuff. After doing some DBA style work a few months ago, I wrote a little on idempotence myself. I like the technique a great deal.

    http://www.mschaef.com/cgi-bin/blosxom.cgi/tech/prog_well/idempotence/idempotence_1.txt

    http://www.mschaef.com/cgi-bin/blosxom.cgi/tech/prog_well/idempotence/idempotence_2.txt
  • Just to be annoyingly picky, "subtraction" is a function that takes two values, so I'm not sure what it means to say that no value is idempotent under subtraction. Unless the function is x -= 1, I suppose :)
  • Stuart Langridge wrote:
    Just to be annoyingly picky, "subtraction" is a function that takes
    two values, so I'm not sure what it means to say that no value is
    idempotent under subtraction. Unless the function is x -= 1, I
    suppose :)

    I am one of those programmers with a mathematical background Eric is talking about, so I can be really picky, if I want to: the function "x -= 1" is idempotent for the value -inf, and probably for some NaNs, too.
  • I indended to say "subtracting a non-zero value" -- I've fixed the text.

  • I'm no a C programmer, but it looks like GetTypeLibCreator is not an idempotent function, because its input and output types are not the same, i.e. you cannot do GetTypeLibCreator(GetTypeLibCreator(x))
  • More importantly, wouldn't it be considered non-idempotent because of the AddRef call which changes the reference count on the returned object. Calling this function twice would not result in the same system state as calling it once...
  • If a function is idempotent, does that mean it's really a dysfunction?

    Oops, missed a couple letters there...
  • Quote: The operation "subtracting any non-zero value" has no idempotent values

    Sure it does, positive and negative infinity :)
  • Objective-C fixed the #include problem by creating #import. It's exactly the same as #include except for the fact that it will include a file exactly once.

    You have to wonder why the C designers didn't think of that.
  • In my experience, the most common kind of idempotence in computing doesn't fit either of the mathematical definitions. It's more like: if you call the same function twice with the same parameters, the resultant system state is the same as if you had called it once. You could call it operational idempotence.

    Of course, it does fit the mathematical definition if you include the entire state of the computer as an implicit parameter to the function and as part of the result, but programmers don't usually think in these terms.
  • "Of course, it does fit the mathematical definition if you include the entire state of the computer as an implicit parameter to the function and as part of the result, but programmers don't usually think in these terms. "

    If you have side effects in your langauge, don't you _have_ to think that way?
  • "If you have side effects in your langauge, don't you _have_ to think that way?"

    Monads in Haskell make this idea explicit, but most programmers find them difficult because they don't think like this.

    In most languages you think about things (like objects, variables, files, etc.) having things done to them. This is quite different from thinking functionally. Well, it is in my head anyway :)
  • WHY?
  • Idempotent (in my experience) is rarely applied to functions from R to R. It's usually reserved for functions that operate on "more complex" domains. The place you run into it frequently is linear algebra and operator theory. So things like subspace projection operators are described as idempotent.
    Additionally, I don't think I've ever heard anyone say anything like "squaring has 2 idempotent values". I would say "squaring has 2 fixed points", or something like that. Fixed points and idempotency are related; in the subspace projection example, half of the reason that it's idempotent is that every vector in the subspace is a fixed point (the other half is that the operator maps every point into the subspace).
    The ways I've seen it used in CS/SE are an extension on this. They require that you to think of the program as a state machine. Some operations can be run more than once with the same affect as if you had run it once. But the concept is really only interesting if the operation has side effects; otherwise, running the operation once is also the same as never running it, from a program state POV. That is, it's too much like the identity operator. It is good to know that some functions don't have side effects, but I think if that's what you mean you should just say so, and save your $5 word for a place where the function *does* have side effects, but is *still* idempotent.
Page 1 of 2 (20 items) 12