# The Data Saturation Quotient

### The Data Saturation Quotient

Rate This

I was thinking about something this morning on the ride into work that I’d like to ask your opinion about. It has to do with the amount of data that could be absorbed and acted on by a unit or system in a given period of time.

The idea is that there is only so much information that a unit (such as a person) can adequately absorb and process at a time. At some point the system becomes unstable (or the person becomes so randomized in tasks) that that unit is saturated. I would imagine that adding more units (people, computers, sensing units, etc.) would work to some level, but even a complex system of units such as a company or a bank of computers could then be viewed as simply another single unit – or perhaps not.

It also seems that there would be a natural “curve” in the data where adding more units or inputs would not help process any more data.

It would be useful to have a formula for determining this “level” of information absorption. It would have implications from entertainment to screen design, and on to encryption algorithms and more. For instance, Disney might use it to determine just how many visual elements to put on a ride or attraction, and no more. Software designers could use it to determine when there is too much information on a screen for a given class of user. And organizations could use it to determine when they should add resources, such as people, to a group to solve a problem, and when adding more headcount simply won’t help.

I’m thinking that there are at least these variables or parameters in the mix, but the definitions would need to include at least these elements:

·         T – Time. This is the time taken by each

·         Ar - Acquisition Rate for the unit

·         Pr – Processing Rate for the unit. This would need to include the experience level of the unit with the information (preprocessing) and so on.

·         I – Inputs on the unit. The number of discrete or combinatorial elements available to the unit to

·         D = Amount of data. This is the total amount of information available to the input device (eyes, ears, CPU, photometer, etc.)

·         R – Response. The amount of time taken to complete a cycle.

As to the formula, I’m not sure how I would go about developing that. I welcome your input.

This work may have already been done – either in cognitive research fields or systems analysis, but I couldn’t find a formula. Do any of you have any thoughts on this topic?

• Please add 8 and 4 and type the answer here:
• Post
• I've gotten some great comments through e-mail - here's one from my friend Nate:

"Also, there might be two things here…the systems side and the human side. When you say: “For instance, Disney might use it to determine just how many visual elements to put on a ride or attraction, and no more. Software designers could use it to determine when there is too much information on a screen for a given class of user.” I think of visual perception/cognitive load

I’ve never seen a real formula – the only rule I know of for visual elements is the short term memory rule of 7 (+ or – 2), and then ‘chunking’ and schema construction (described in cognitive load article)

"

• My friend George tells me that this is similar to system dynamics (http://en.wikipedia.org/wiki/System_dynamics). I think there are similarities, but these don't provide the same formulas I'm seeking here.

• And Chris tells me:

"Well on the human side, the amount of information people can absorb at once seems to increase with each generation.  Just over 100 years ago, it was possibly for a learned person to grasp the whole of human knowledge.  That is clearly impossible now, but it doesn't stop us from trying.

The "MTV effect" changed how quickly images appeared on our TV screens.  The "Nintendo generation" loses patience unless information is delivered at a rapid pace.  Now we have channels like Bloomberg HD which fill every pixel with some kind of meaningful data.  Each new generation that grows up surrounded by these info-sources (ooh, is that an SAP BW term?) seems to be able to assimilate them much more quickly than their parents.  We now have teenagers that do their homework while listening to their iPod, downloading movies, chatting on multiple IM clients, updating their Myspace page, etc.  We multitask more and more.  Your Disney example is a good one; look at the Disney rides from 30 years ago compared to those now.

I guess your question is:  How much of this are we absorbing in a meaningful way?  Is it a question of quantity vs. quality?  I would aruge that a high degree of synergy can be achieved by accessing all these datastreams.  But it's more or less like any piece of technology-- it's all in how you use it.  There will be those than squander technology's opportunities, and those that profit hugely from them.  Overall, we're still evolving."

• Paul sent me this:

Very interesting. Two parameters I think you're missing are:

• S - overhead of Sharing data between units. For example:

o cache line invalidation overhead in multi-core CPUs)

o latency of human communication - how quickly does data found by a person in one team make its way to someone in another team where it may have a profound effect on a business decision or behavior

• C - Complexity of sharing data between units. For example:

o The sheer overhead of establishing data sharing mechanisms, like having to cultivate relationships between different divisions in a company like Microsoft

o Working out how to pipeline results in a multi-issue instruction stream processor (I remember writing assembly code for a quad-issue Alpha processor while at DEC - it was a bitch to get right)

o The same pipelining problem, but in an SMP or NUMA box

The formula you describe also sounds like it would assume that all units operate at 100% capacity, so you'd probably also want to add in:

• E - average Efficiency of the unit. For example, most people don't operate at 100% capacity all through every working day

• R2 - Receptiveness to the information being presented. For example, does the unit believe the data being presented

• D2 - Data loss. For example, a person may forget some info and so the input is lost to the overall process

Hope this helps!

Paul S. Randal

Managing Director, www.SQLskills.com

http://www.sqlskills.com/blogs/paul

•    Very interesting topic. Reading through it, there is allot that comes to mind and I would have to spend some time to explain and explore the impact of each.

So let me see if I can just hit some highlights for you with insert to the article.  This will be difficult to do via email vs multi hour discussion. Thoughts may be scattered as I work thought this over the next few hours.

I'm sure my daily input will continue...............

The idea is that there is only so much information that a unit (such as a person) can adequately absorb and process at a time.

I would agree that this is true, and that the level varies with each unit. But I do think that we can increase the capability of the units input threshold with training and practice. But also keep in mind the level of input over a sustained period would most likely lead to some burnout on the individual. So,,, much like the bell curve your productivity would increase with training and practice, plateau for a time then gradually if not suddenly decrease while still maintain a level higher then when you began.

At some point the system becomes unstable (or the person becomes so randomized in tasks) that that unit is saturated. I would imagine that adding more units (people, computers, sensing units, etc.) would work to some level, but even a complex system of units such as a company or a bank of computers could then be viewed as simply another single unit – or perhaps not.

I equate the "randomizing" as "multitasking". I have had the conversation of "MULTI TASKING" on several occasion. My thought on multitasking is that you are only time sharing. you set and event or operation to run the move to the next operation until you have time to get back to the other. if that is not complete you start another. Eventually you have enough task or processes running that you become confuse on the time cycle for each and which is ready for the next step, now your efficiency decreases, due to saturation.People, like computers,  time share we don't multi task much beyond crewing gun and walking with out some unwanted results. Driving and talking seem to work ok,,,, but driving and talking on a cell phone add a new dimension which leads to accidents.

It also seems that there would be a natural “curve” in the data where adding more units or inputs would not help process any more data.

Additional unit could increase your out put, but at what cost to efficiency. do you have the supporting infrastructure, equipment , space to make such a change. If these are not considerations then yes I would think you could apply a linear ratio to the average productizing per unit.

It would be useful to have a formula for determining this “level” of information absorption. It would have implications from entertainment to screen design, and on to encryption algorithms and more. For instance, Disney might use it to determine just how many visual elements to put on a ride or attraction, and no more. Software designers could use it to determine when there is too much information on a screen for a given class of user. And organizations could use it to determine when they should add resources, such as people, to a group to solve a problem, and when adding more headcount simply won’t help. I suspect a basic formula could be derived but I would think that would need be an adjustable formula based partly on the specific nature of the environment, adding or negating variables.

I’m thinking that there are at least these variables or parameters in the mix, but the definitions would need to include at least these elements:

•         T – Time. This is the time taken by each

•         Ar - Acquisition Rate for the unit (This being the time to hire, train, and put to work?)

•         Pr – Processing Rate for the unit. This would need to include the experience level of the unit with the information (preprocessing) and so on.

•         I – Inputs on the unit. The number of discrete or combinatorial elements available to the unit to

•         D = Amount of data. This is the total amount of information available to the input device (eyes, ears, CPU, photometer, etc.)

Needs subset to the formula for ease in variation (could be a formula all to itself.)

Dey = eyes

Der = ears

Dcpu = CPU

•         R – Response. The amount of time taken to complete a cycle.

One thing I don't see here is any thought to simple limit the input and Schedule the work.

What can I or my company do in a given time frame?

Accept the for what it is and schedule the project requested.

As to the formula, I’m not sure how I would go about developing that. I welcome your input.

This work may have already been done – either in cognitive research fields or systems analysis, but I couldn’t find a formula. Do any of you have any thoughts on this topic?

Dave's Summary thoughts and maybe I'm getting off track or away from the intent. of the article.

1. Prioritize what needs to be done, what you want to accomplish.

2. Focus on the task at hand and run to completion.

3. Minimize inputs required.

I'll read it again and think some more, but this is very interesting ......

________________________________________

• From a human aspect:

T varies widely per person, especially as we age, so factor in an "old person slows down" ratio... perhaps varies with the reciprocal of age?

Ar -- Same, kids are sponges, adults ignore things they don't [want to] understand -- have you tried playing a video game lately?

Pr -- I think fast, everybody else thinks slowly, isn't this what everybody believes?

I -- best to leave this one alone... inputs are fixed, but attached to the Processor :)

D -- again, as I get older...

R -- Response time on a tennis court, still pretty good...

Have a wonderful day,

Jeff Garbus

• Possibly check out Flow.

http://en.wikipedia.org/wiki/Flow_%28psychology%29

It’s less about machine-like information processing and more about how focus and awareness changes at different degrees of ‘information processing’.

An equation would have to include more than quantity and speed variables. May generate an s-type curve..

Rob.

• Hi Buck,

I read your blog this morning and had the following thought: I wonder if you could acquire measurements of the human activities and derive the equation from the shape(s) of the curve(s)? I recall Joel Spolsky writing about developer performance ( http://www.joelonsoftware.com/articles/HighNotes.html). A professor somewhere maintains records of student performance - some of the data you seek may be there.

Andy

--

Andy Leonard, MVP

• And from Steve:

Its retro but I think in terms of signal to noise ratio. One source with everything on a topic may have more data than many web pages, blogs, mags and books combined but given its lower noise level I can absorb more data in less time.

The more complex the Disney ride the more I can ride it and not be bored.....I'm waiting on grandchilderen.

...being able to drill down into interpretaive information about an SSMS report from the report....without having to search books online etc.....that's more data in one place but in another way its less.

• From Kathy:

Interesting idea to try to come up with a formula for determining celings for information absorption.

I've never come across something like this. I like the idea, and the additions suggested by others.

•         S - overhead of Sharing data between units.

• C - Complexity of sharing data between units.

• E - average Efficiency of the unit. For example, most people don't operate at 100% capacity all through every working day

• R2 - Receptiveness to the information being presented. For example, does the unit believe the data being presented

• D2 - Data loss. For example, a person may forget some info and so the input is lost to the overall process.

Until you come up with the equation, I'll continue to use the approach described by another reader:

When faced with information overload:

1. Prioritize what needs to be done, what you want to accomplish.

2. Focus on the task at hand and run to completion.

3. Minimize inputs required.

• From another good friend:

Hello Buck

Here goes….

There are two components:

1) An Evolver – to produce solution sets based on the variables identified - The Genetic Algorithm.

2) An Evaluator – to take solution sets generated by the Genetic Algorithm and evaluates them for fitness – The Neural Network.

3) The Evaluator – the Neural Networks learns over time.

4) The Evaluator generates the solution set (best fit) and feeds back into the Genetic Algorithm to generate again.

5) As variables are added or changed the Genetic Algorithm generates new solution sets

The Genetic Algorithm is a non-formula, very powerful, a different way of thinking, falling under the realm of bio-mathematics, when used in conjunction

Neural Networks they can provide solutions and best fit implementations to problems.

You have covered several different sets of problems, they are the same and unique at the same time.  Some constraint based, resources based, time dependent, etc.

No one size fits all solutions, however. The bundling of the “Response” variable implies some degree of intelligence in this type of system(s) and not just a cause-effect based design.  The processing times to generate a “response” are a factor but once these networks are tuned (learned) they should perform.

Just my 2 cents….

Good hearing from you,

Regards,

Page 1 of 1 (12 items)