Monday, 22 October 2018

What I leaned from a career in data science about myself that, and that I think might apply generally:


What I leaned from a career in data science about myself, and that I think might apply   more generally:

Someone on Quora asked me this, so I thought it was worth spending a few minutes on it.  I have worked in what is now called “data science” since my mid-20’s, so for over 30 years.  Until about 10 years ago, I generally used the terms “data analysis”, “statistical analysis” or simply “statistics”, when describing my work.  Those mutated into the term “data science”, as many new methods and algorithms were devised and/or perfected (e.g. neural nets aren’t new, but they are far more effective than they were in the past).  Besides, people seem to find the term interesting and forward-thinking, rather than stodgy old statistics.

My undergrad was actually in physics and math, but I supplemented that post-degree with the equivalent of a Masters in inferential statistics and related social science methodologies.  I worked for the non-profit sector, then government, then the university sector.  The research was operational – i.e. not for publication, other than conference proceedings and the like.  It involved a lot of coding in various languages, report writing, regression analysis, logistic regressions, survey design and analysis, clustering, factor analysis, text analysis and similar methods.  Generally speaking, the purpose was to solve immediate problems that the institution that I worked for had, rather than pure curiosity driven research.  But, over time, I managed some of both in my data science jobs.

So, I think my career path is of the sort that most people interested in data science will actually experience during their careers (i.e. lots of interesting and important studies, but nothing dramatically exciting or new).  Here’s a list of things I learned about the profession, and about myself, in terms of working in the profession:

1.     The thing that drives you forward is curiosity about the problem that you have to solve, and discovering whatever it is that the dataset you have can help you in your task.
2.     For example, I am currently doing text analysis on survey data, and am curious about whether the sentiment scores generated by the algorithms correlate with some of the other variables in the survey, and whether these sentiment scores will predict important real world outcomes, in this case whether university students will complete their programs.
3.     Curiosity about learning new methods of analysis is also important. It is good to be innately curious about the various techniques that come along (e.g. does a neural net work as well as a logistic regression, and how do either of these compare to decision trees, for prediction purposed).
4.     That being said, you need to maintain a reservoir of skepticism, and not get swept up by every new technique that comes along. Companies want to sell products to make money, and they don’t always care about solving data problems for you.
5.     Open source is ok. Don’t assume you always have to get an expensive new data mining package from a big company. On the other hand, they have their place, so don’t always try to go cheap or reinvent the wheel.
6.     A dataset usually reveals an underlying story. A big part of your job is to uncover that story and present it in such a way that non-data science people (e.g. managers) can see that story too, and understand and act on it, where appropriate.
7.     Communication is important, written, oral and visual.  Sometimes, one really effective graph can go a long way in telling your story.  Other times, it may take a long and detailed report.  It depends both on the story and the audience.
8.     Sometimes you will be expected to find evidence to back up a notion that management wants to see supported.  At other times you will be given free reign to come to completely unbiased conclusions, with no “advocacy issues” to worry about.  The former can be frustrating, while the latter is a lot more satisfying.  But, in the real world, you have to try to thread the needle, protecting your integrity as a professional, while managing to keep the boss on your side. That’s usually attainable, as most bosses are reasonable people, and will accept what the data is telling them.
9.     There is a good chance you will have to give presentations in meetings, conferences and the like. Many people drawn to data science are not extroverts, so that can be difficult. But, it gets easier with experience, and people want you to do well, so the audience will accept a bit of nervousness from you, as long as you really know your content.
10. Use statistical, mathematical, and scientific rigor, but only present as much of that as is needed for your audience. You can lose people if you use excessively technical language or get bogged down in explaining the minor points of an analysis.
11. A good educational foundation is, of course, useful and important. You should know the underlying math behind the principles of statistical inference as well as being adept with the data science algorithms, and coding in general. Learning how to write well is important, too, so pay close attention in courses that involve reading a lot and writing essays, even if you are not inclined to the literary arts. That said, you will always have to learn on the job.  Products and processes that will become important later in your career haven’t even been invented or discovered yet.
12. Accept the fact that data is usually not clean, and you will have to do a lot of tedious data wrangling, munging and cleaning. It’s boring but necessary.
13. Learn SQL, as getting data out of systems can be just as important as analysing it.  Even if you have IT people who specialize in this (e.g. data warehouse specialists), it is always good to be comfortable with SQL yourself (and, yes, there may be other query languages developed in the future, but SQL is the most important one now).
14. Data science exists on a continuum (extracting data, cleaning data, validation, descriptive analysis, visual presentation, inferential and predictive analysis, developing new algorithms and methods). They are all important.
15. You will usually work in a team environment, and everyone has an important role to play. Everyone deserves respect.
16. Try to remember, you want to make your boss’s job easier, not harder.  Do that and you will prosper in your career.
17. Money is important, but so is enjoying your job and fellow employees. More money doesn’t always make you happier.
18. Don’t let work prevent you from other important goals in life. Go ahead and start a family, if you are inclined that way. It may seem like it will get in the way of your career, but you can work it out.
19. Take work seriously, but don’t burn yourself out with too much work.
20. I could think of more, but that’s enough for now.  Besides, it’s nice to end on a round number.  Oh yes, always end a presentation with a comic strip.



No comments:

Post a Comment