Peter’s response to my TiVo post for some reason reminded me of a rambling e-mail conversation I had several months ago; for your amusement I repeat it below.  Incidentally, mail threads like this are reason #128 why I like working at Microsoft…

From: Ken
Sent: Thursday, July 17, 2003 12:12 PM
To: Barry; Michael; Wade; Richard; Mark; John; Keven; Praveen; Bruce
Subject: How many lines of code can fit on the head of a pin?
Importance: Low

Discussion that arose questioning how many lines of code there are in the world.

We were discussing Feynman and his thoughts on miniaturization, nano-tech, etc….many years ago he put forward a challenge to reduce a page of text to 1/2500th of it’s original size. At the time he made this challenge, all of the books in the world would be able to be “printed” on a sheet 3 meters on each side (9m^2 area). Note that this was not encoding, but actually forming all letters, pictures, etc. The original challenge was met (to make this a bit more “real”, think of the 30 volume Encyclopedia Britannica…at 1/2500 it would all of the pages would fit on the head of a pin).

So, to us geeks, the natural question arises, how big of an area to print all the code? What if you didn’t print it, but encoded it (I would guess that you’d want at least 3 atoms/bit to make retrieving it a bit easier.) Of course you could use single atoms to represent more than a bit, one carbon atom might represent ‘0’, gold atom for ‘1’, platinum for ‘2’, lead for ‘3’…just right there we’ve reduced the size of a two-dimension printout by a factor of four.

My initial “quick-thought” below.


 A co-worker and I were discussing this today:

·         How many lines of code (LOC) are there in the world?

·         What if you limit it to 'active' code?

·         What's the ratio of total lines to active lines?

·         Are there more LOCs than lines in books?

I'd guess somewhere in the realm (within a couple orders of magnitude) of 10 Billion LOC (which is based somewhat on how many LOC I think are at Microsoft), with the Total/Active ratio somewhere around [2-3]:1.

Based on an older, but probably still "right ballpark", figure of 30 million books in the world (taken from Feynman's The Pleasure of Finding Things Out) and supposing that an average book has 750 pages with 50 lines each we'd get 1,125,000,000,000 lines (a bit over a trillion lines). As this is several orders of magnitude above my guess, I’d bet that books have the lead.

PS: Why the strange “To:” line? Figured I’d include the folks that worked on the horrid LOC project, some of the stranger thinkers I know, a tried and true researcher (who probably also fits in the previous category), the man whose initials are found in BILLIONS of executables. No “deep-thought” answers required, but if you’d like to chime in let me know if I can include your thoughts on my web page.

From: Mark
Sent: Thursday, July 17, 2003 2:54 PM
To: Ken; Barry; Michael; Wade; Richard; John; Keven; Praveen; Bruce
Subject: RE: How many lines of code can fit on the head of a pin?


I think there are way more than 30M books.  Harvard’s library has 11M+.  What does LOC (Library of Congress) have in it?

Feynman was merely talking about taking the existing representation (aka print) and shrinking it.  Clearly, printing code and then shrinking it will have the same set of issues.

Certainly, we can encode the book in atoms, but thermal, chemical, and quantum issues make reading the data interesting.  If reading wasn’t interesting, then I could encode it all in one bit.  0 means not the entire set of code. 1 means the entire set of code.  If you valued reading, then the issue of specifying the decoding algorithm should be brought in.  With the single bit interpretation above, you could end up with a mighty complex decoding algorithm.

Thermal issues:

Atoms move about unless really close to absolute zero, even in solids.  Trying to measure single atoms might be fairly difficult;

Chemical issues:

OK, are we talking one-time encoding or an encoding that can be used for archival?  The former is easy, but things like oxidation, photochemistry, etc. will impact just how dense we can make things.

Quantum issues:

Yes, atoms are quantum beasts.  Measuring them in a non-destructive fashion might also be difficult.

Backing out of physics, from the standpoint of encoding, you could simply encode the ASCII representation (what do you do about those EBCDIC Cobol and RPG programs?).  But code is also notoriously compressible (ala LZW). LZW is nice because it’s dictionary-free; the dictionary is the previously input data.  You can do even better with Markov modeling of the code, but you’d have to include the size of the Markov tables as well in the size. The Markov models could be for individual characters or for lexical items.  You could also do something along the lines of encoding the parse tree of the source file (suitably annotated with comments).  Handling .h files might be dicey.

Enough with rambles…

From: Ken
Sent: Thursday, July 17, 2003 3:59 PM
To: Mark; Barry; Michael; Wade; Richard; John; Keven; Praveen; Bruce
Subject: RE: How many lines of code can fit on the head of a pin?
Importance: Low

11M is the number that Feynman quoted for Lib o’ Congress…I think he gave the speech where quoted 30M in the late 60’s, but the book I’m “reading” (listening to) jumps around from childhood right up to the late 80’s. During the speech he actually describes a method for reading the “small print”…but at the time the technology hadn’t gone far enough to print it yet. With the rise of the Internet/PC/easy publishing, I’d be surprised if the number was not at least around 60M these days (double the number in the 60’s)…while I’d be surprised if it was in the 100M or above, I don’t see even that sort of number as inconceivable.

As we approach the means to really make good on some of Feynman’s “plenty of room at the bottom” dreams (Cornell made a 1-atom wide transistor a bit over a year ago), I find myself looking at things like this more and more. Due to my sordid past, the notion of a coming up with a good (and defensible) estimate for the number of LOC in existence strikes me as a fun exercise. The real meat isn’t there, nor is it, other than as a “perspective exercise”, in figuring out how big (small) of a piece of something you would need to hold it.

Granting that there will be a host of new difficulties when we start getting components that are made at the atomic level, I think that we will live to see the truly tiny become reality. As we creep into nano-tech, the processes and research will (I hope and believe) begin to build on each other. While we may not be part of the world of physics (and chemistry, and bio-chem, and…) that will be building these gadgets straight out of the most far-fetched science fiction, it will be up to us to help define at least how some of this stuff will benefit the “common man”. What changes do we make to the operating system, to Office, etc when a person can have a couple of terabytes of storage with them all the time? How soon before it’s not only feasible but practical to record everything we read (http://SIS is close on this front), hear, say, or see…index it all while we sleep, and “auto-fill” details the next day when we start writing a report?

I used to chuckle when I read old Robert A. Heinlein books and he had a character load terabytes of information onto a small cube.

Thermal, Quantum, Chemical…and the list goes on, but I do think the world has the set of minds as well as the preparation of the giants of the last century to beat these problems (actually, in some ways the thermal might work for us). Anyone want to adopt me so I can head back to college…thinking I’d enjoy a decade or two back in school.


From: Bruce
Sent: Thursday, July 17, 2003 6:55 PM
To: Ken; Mark; Barry; Michael; Wade; Richard; John; Keven; Praveen
Subject: RE: How many lines of code can fit on the head of a pin?

We are moving to a world where there is more information than a single person can process.  People will have to become more and more selective in the data they choose to read.  (or see, or hear, or taste…)  As a computer person, I view this as a problem to be solved, and my first instinct is to hypothesize a solution involving a software agent that can selectively choose and display only those facts and media that fit some criteria we give it.

However, I see this as bringing us to a more devided and insular world.  Democrats will have agents that spin things they way they want to see them; Republicans likewise.  I’m sure the sci-fi fans will continue to form their own strange sub-culture.  Ones entire world view will be shaped by what one chooses to experience, and when there is a surfeit of information that does appeal, folks will be less and less inclined to view that which does not.

In some sense, our choice of agent will decide who we are, and who we become.  (At least to the extent that one believes in nurture over nature.)  Could one then change oneself by altering the agent programming?  In any kind of serious, personality-altering way?  What happens if some hacker gets into your agent, or the men in the black helicopters do?

Beware – here there be dragons

From: Richard
Sent: Friday, July 18, 2003 2:36 PM
To: Bruce; Ken; Mark; Barry; Michael; Wade; John; Keven; Praveen
Subject: RE: How many lines of code can fit on the head of a pin?

Everyone reading this thread is the product of a lifetime of agents choosing and biasing what media we consume.

It starts with our parents and families, and churches, friends, Television, the so called “mass-media”, then later teachers in school, employers, and government, etc.

These agents also already have their hackers, which can be though of as agents themselves.  These are the same form as the agents that affect us.  Though they may be different TV shows, different media, different teachers, and different employers.

Every time you hear of censorship, political correctness, and boycott are examples of agents attempting to restrict the exposure of content.

Every time you hear of praise and awards agents are promoting content.

Other things like reviews and bias reporting can swing either way.

But it is all affecting the perceived worth of consuming the target content and ideas.

Its all a tight feedback system.  It would seem that the truest “individuals” would live in the wilderness with no outside influence.  But if we met someone like that we’d likely not enjoy their company nor would they enjoy ours.  We enjoy being with people more when we can communication and have thoughts and topics to share.  This is one of the feedbacks.

We choose our agents when we choose our books, schools, subscriptions, employer, clubs, homepage, and program our Tivo.

No doubt this does affect who we become.

I wonder if agent selection will ever become as sophisticated as allowing me to choose what personal traits I’d like to make stronger.

You turn up your agent’s sensitivity dial and Tivo starts recording “Little House and the Prairie” and “Touched by an Angel”

Ramble on…

From: John
Sent: Friday, July 18, 2003 2:36 PM
To: Bruce; Ken; Mark; Barry; Michael; Wade; Richard; Keven; Praveen
Subject: RE: How many lines of code can fit on the head of a pin?

 (This thread is getting pretty far a field.)

As Richard points out knowledge is already being filtered by "agents" (many of which are not user configurable.) So at the end of the day we have to examine the trust relationship between ourselves and those agents.

Some of the things I have noticed:

- The less interested you are in a topic, the more you are willing to trust the agent. The more you are interested (and experienced) in a topic the more scrutiny you will give to the agent.

- Once an agent has seriously disappointed you never trust the agent again.

- The more you are exposed to agents as a class the less you trust them.

Think about really, really good internet trolls: they contain enough fact, wit, and divisiveness to cause chaos amongst the most well ordered communities. What about trolls that build a trust relationship first?

Think about what makes a trust relationship. Generally you have seen some objective demonstration that convinces you, and you are willing to believe the demonstrator on related subjects without objective proof. Perhaps the objective demonstration is abstracted through a certification authority, like a drivers license: you didn't see me take the test, but I can show you my license. Think about all of the points of failure in this trust system. Having or not having a drivers license doesn't prove or disprove ability to drive.

While you could certainly manipulate people through media collection agents, There is still (and always will be) informal agents with a disproportionate amount of trust. If your grandmother told you something you would probably believe her, unless you had objective proof, or special knowledge. Grandma is an informal agent that gets a huge amount of trust, and it is very difficult to erode that trust (unless she drinks or something)

The concept of tweaking your agents for "self-improvement" is interesting (assuming a direct correlation between watching lots of Lifetime TV and being more sympathetic) although you would also have to tweak your informal agents as well. Basically you watch all of the Oprah in the world but until you quit the Hell's Angels I don't think you are going to start crying at movies.

- John