What I leaned from a career in data science about myself, and that I think might apply more generally:
Someone on Quora
asked me this, so I thought it was worth spending a few minutes on it. I have worked in what is now called “data
science” since my mid-20’s, so for over 30 years. Until about 10 years ago, I generally used
the terms “data analysis”, “statistical analysis” or simply “statistics”, when
describing my work. Those mutated into the
term “data science”, as many new methods and algorithms were devised and/or
perfected (e.g. neural nets aren’t new, but they are far more effective than
they were in the past). Besides, people
seem to find the term interesting and forward-thinking, rather than stodgy old statistics.
My undergrad was
actually in physics and math, but I supplemented that post-degree with the
equivalent of a Masters in inferential statistics and related social science
methodologies. I worked for the non-profit
sector, then government, then the university sector. The research was operational – i.e. not for
publication, other than conference proceedings and the like. It involved a lot of coding in various
languages, report writing, regression analysis, logistic regressions, survey design
and analysis, clustering, factor analysis, text analysis and similar
methods. Generally speaking, the purpose
was to solve immediate problems that the institution that I worked for had,
rather than pure curiosity driven research.
But, over time, I managed some of both in my data science jobs.
So, I think my
career path is of the sort that most people interested in data science will
actually experience during their careers (i.e. lots of interesting and
important studies, but nothing dramatically exciting or new). Here’s a list of things I learned about the
profession, and about myself, in terms of working in the profession:
1.
The thing that drives you forward is curiosity
about the problem that you have to solve, and discovering whatever it is that the
dataset you have can help you in your task.
2.
For example, I am currently doing text analysis on
survey data, and am curious about whether the sentiment scores generated by the
algorithms correlate with some of the other variables in the survey, and
whether these sentiment scores will predict important real world outcomes, in
this case whether university students will complete their programs.
3.
Curiosity about learning new methods of analysis is
also important. It is good to be innately curious about the various techniques
that come along (e.g. does a neural net work as well as a logistic regression,
and how do either of these compare to decision trees, for prediction purposed).
4.
That being said, you need to maintain a reservoir
of skepticism, and not get swept up by every new technique that comes along.
Companies want to sell products to make money, and they don’t always care about
solving data problems for you.
5.
Open source is ok. Don’t assume you always have to
get an expensive new data mining package from a big company. On the other hand,
they have their place, so don’t always try to go cheap or reinvent the wheel.
6.
A dataset usually reveals an underlying story. A
big part of your job is to uncover that story and present it in such a way that
non-data science people (e.g. managers) can see that story too, and understand
and act on it, where appropriate.
7.
Communication is important, written, oral and
visual. Sometimes, one really effective
graph can go a long way in telling your story.
Other times, it may take a long and detailed report. It depends both on the story and the audience.
8.
Sometimes you will be expected to find evidence to
back up a notion that management wants to see supported. At other times you will be given free reign
to come to completely unbiased conclusions, with no “advocacy issues” to worry
about. The former can be frustrating,
while the latter is a lot more satisfying. But, in the real world, you have to try to
thread the needle, protecting your integrity as a professional, while managing
to keep the boss on your side. That’s usually attainable, as most bosses are
reasonable people, and will accept what the data is telling them.
9.
There is a good chance you will have to give
presentations in meetings, conferences and the like. Many people drawn to data
science are not extroverts, so that can be difficult. But, it gets easier with
experience, and people want you to do well, so the audience will accept a bit
of nervousness from you, as long as you really know your content.
10.
Use statistical, mathematical, and scientific
rigor, but only present as much of that as is needed for your audience. You can
lose people if you use excessively technical language or get bogged down in
explaining the minor points of an analysis.
11.
A good educational foundation is, of course, useful
and important. You should know the underlying math behind the principles of
statistical inference as well as being adept with the data science algorithms,
and coding in general. Learning how to write well is important, too, so pay
close attention in courses that involve reading a lot and writing essays, even
if you are not inclined to the literary arts. That said, you will always have
to learn on the job. Products and processes
that will become important later in your career haven’t even been invented or
discovered yet.
12.
Accept the fact that data is usually not clean, and
you will have to do a lot of tedious data wrangling, munging and cleaning. It’s
boring but necessary.
13.
Learn SQL, as getting data out of systems can be just
as important as analysing it. Even if
you have IT people who specialize in this (e.g. data warehouse specialists), it
is always good to be comfortable with SQL yourself (and, yes, there may be
other query languages developed in the future, but SQL is the most important one
now).
14.
Data science exists on a continuum (extracting
data, cleaning data, validation, descriptive analysis, visual presentation,
inferential and predictive analysis, developing new algorithms and methods).
They are all important.
15.
You will usually work in a team environment, and
everyone has an important role to play. Everyone deserves respect.
16.
Try to remember, you want to make your boss’s job
easier, not harder. Do that and you will
prosper in your career.
17.
Money is important, but so is enjoying your job and
fellow employees. More money doesn’t always make you happier.
18.
Don’t let work prevent you from other important
goals in life. Go ahead and start a family, if you are inclined that way. It
may seem like it will get in the way of your career, but you can work it out.
19.
Take work seriously, but don’t burn yourself out
with too much work.
20.
I could think of more, but that’s enough for now. Besides, it’s nice to end on a round number. Oh yes, always end a presentation with a comic strip.
No comments:
Post a Comment