Data science is a set of processes and concepts that act as a guide for making progress and decisions within a data-centric project.
This contrasts with the view of data science as a set of statistical and software tools and the knowledge to use them, which in my experience is the far more popular perspective taken in conversations and texts on data science.
I don’t mean to say that these two perspectives contradict each other; they’re complementary. But to neglect one in favor of the other would be foolish.
To compare with carpentry, knowing how to use hammers, drills, and saws isn’t the same as knowing how to build a chair. Likewise, if you know the process of building a chair, that doesn’t mean you’re any good with the hammers, drills, and saws that might be used in the process.
To build a good chair, you have to know how to use the tools as well as what, specifically, to do with them, step by step.
Origin of Data Science
The origins of data science as a field of study or vocational pursuit lie somewhere between statistics and software development. Statistics can be thought of as the schematic drawing and software as the machine.
Data flows through both, either conceptually or actually, and perhaps it was only in recent years that practitioners began to give data top billing.
Though data science owes much to any number of older fields that combine statistics and software, such as operations research, analytics, and decision science.
In addition to statistics and software, many folks say that data science has a third major component, which is something along the lines of subject matter expertise or domain knowledge.
Although it certainly is important to understand a problem before you try to solve it, a good data scientist can switch domains and begin contributing relatively soon. Just as a –
- Good accountant can quickly learn the financial-nuances of a new industry.
- A good engineer can pick up the specifics of designing various types of products.
- A good data scientist can switch to a completely new domain and begin to contribute within a short time.
That is not to say that domain knowledge has little value, but compared to software development and statistics, domain-specific knowledge usually takes the least time to learn well enough to help solve problems involving data.
If you can do data science, you can walk into a planning meeting for a brand-new data-centric project, and almost everyone else in the room will have the domain knowledge you need, whereas almost no one else will have the skills to write good analytic software that works.