LinkedIn | FaceBook | Twitter
I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields.
Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist?
This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April 2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”. I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there.
Definition of Science
To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas:
The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm.
Definition of Data Science
And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present).
Researching the term's general use I created an amalgam of the definitions this way:
“Studying and applying mathematical and other techniques to derive information from complex data sets.”
Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself.
I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications.
As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability. These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it.
So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking.
What do you think? Science, or not? Does it matter?
I agree that data science is "science" and the term "data science" is a tad overloaded but so is "Big Data" . It may be possible that as they mature, we will have a cacophony of terms: data statistician, in-memory transaction stores, etc. But for now,the overload actually makes things easier to understand....for now ;-)
You won't get a quarrel from me. In this paper www.datasciencecentral.com/.../blogpost I argued that what we call data science is not really science and I created four "types" of analytics to define it.
I recently started studying Data Science. In my week of research, I've seen a whole bunch of Mathematics complete with hyperspaces, vector mechanics, kernel functions, proofs, derivations, etc., but nary a sign of a hypothesis or an experiment. I'd say "Data Science" appears to actually be a Philosophy (like Mathematics) and not a Science (like Chemistry).
This is a philosophical argument of course, but that is just part of the terrain when it turns out you are becoming a Philosopher.
We tend to borrow from other disciplines a lot in our industry. Are Data Warehouse Architects really architects? Are Software Engineers really engineers? I sometimes think this happens because its the best way we can explain to outsiders what we do by giving them context to something they already understand.
Thanks for posing the question Buck, which is often the most important part
No something interesting I found. “Data Science” is maybe not so new. The search frequency graph shows it is bigger than it has ever been, but had some popularity in the 2004-2005 area
Now “Data Scientist”, according to this graph is stunningly new
Regarding what Denny said, I agree. I often start by defining Big Data with the 3 V’s, but then back off and say 2 V’s will do, as long as the *insight* you derive is Big
(note: I hate to post G links on an MSDN blog, but AFAIK Bing does not expose public facing search trend graphs)
" i don't want my data come back after deletion operation ...? is that shrink operation work.. Hello hfrmobile can you explain me how can you do this thing for deleting of data...
...please help me i have such a problem in my current project
.. my email id: email@example.com / firstname.lastname@example.org
Science has to do with observation, modelling, and testing. So does data science, so yes: it is a science, complete with the intuitive component that goes into model building. I would even contend that physical science is s subset of data science: in physical science, data is collected, modelled, and the models are tested by making predictions and running more tests.
Data science formalizes this from the "data modelling" perspective. Physical science errantly thinks it is modelling "physical reality" but really it's just modelling data recorded in experiments. That is an important distinction too, as quantum mechanics tells us.