Deep dive into data analysis tools, theory and projects

Category: Tableau

Top 10 Most Valuable Data Science Skills in 2020

The first month of the new decade is almost at an end. It’s also “job-hunting” time when students start looking for internships and employees think about switching roles and companies, in search of better salaries and opportunities. If you fall into one of these categories, then here are the Top 10 skills your resume absolutely needs to include, to get noticed by employers and land your dream job.

Data Science Skills for 2020

Methodology:

I looked at 200 job descriptions for jobs posted on LinkedIn in 7 major US/Canada cities – San Francisco, Seattle, Chicago, New York, Philadelphia, Atlanta, Toronto. Let’s face it – LinkedIn is the go-to platform for job seekers and recruiters. So looking at any other site seems a waste of time.

The job listings included many of the top Global brands in tech (Microsoft, Amazon, etc.), product (AirBnb, Uber, Visa), consulting (Deloitte, Accenture), banks (JP Morgan, Capital One) and so on. I only considered jobs with the title “Data scientist” or “Data Analyst”, with 150+ in the former. It took a while, but doing this manually also allowed me to exclude repetitive postings, since some companies post same role for multiple locations.

Ultimately, this allowed me to quickly identify patterns and repeated skills, which I am presenting in this blogpost.

I’ve categorized the skills into 2 parts: Core and Advanced. Core skills are the absolute minimum you should have, recruiters and automated job application systems will simply disqualify you without them. Advanced skills are those “preferred” competencies that make you look more valuable as a candidate, so make sure to highlight them with examples on your resume. So, if you are trying to transition to a career in Data Science, then I would highly recommend learning these first, and then jumping into the others. Needless to say, everyone working (or entering) this field needs to have a portfolio of projects.

Disclaimer – having all the 10 skills does NOT guarantee a job but vastly improves your chances. You’ll still need to do some legwork, to get considered and my book “Data Science Jobs” can help you shorten this process. The book is also on SALE for $0.99 this weekend, Jan 25th to Jan 28th, at a 92% discount.

Core Skills:

Minimum qualifications for Data Scientist roles

[1] Programming (R/Python): This is a no-brainer, you need to be an expert in either R/Python. Some jobs will list SAS or other obscure languages, but R or Python was a constant and mandatory requirement in 100% of all the jobs I parsed.

I am not going to argue the merits of one over the other in this post, but I will emphasize that R is still very much a in-demand skill. Plus, for most entry level roles, a candidate with only Python is not going to be considered more favorably (or declined!) over someone who knows only R. In fact, at my current and previous 2 roles, R-programming was the preferred language of choice. If you’d like to know my true views on the R vs Python debate, read this post.

[2] SQL: Most colleges and bootcamps do not teach this, but it is inordinately valuable. You cannot find insights without data, and 99% of companies predominantly use SQL databases of some kind. Fancy stuff like MongoDb, NoSQL or Hadoop are excellent keywords to add to your bio, but SQL is the baseline. You don’t need to know stored procedures or admin level expertise, but please learn basics of SQL for pulling in data with filters and optimizing table joins. SQL is mandatory to thrive as a data scientist.

[3] Basic math & Stats: By this I mean basic high-school stuff, like calculating confidence intervals and profit-loss calculations. If you cannot distinguish between mean and median, then no self-respecting manager will trust your numbers, or believe your insights have excluded those pesky outliers. Profits, incremental benefits in $ are other useful formulae you should know too, so brush up on your business math.

[4] Machine Learning Algorithms: Knowing to code algorithms is expected, but so is knowing the logic behind them. If you cannot explain it in plain English, you really don’t know what you are talking about!

[5] Data Visualization: Tableau is the preferred technology, although I’ve seen people find success with Excel charts ( Excel will never die! ) and R libraries, too. However, I definitely see Tableau dominating everything else in the coming years.

Advanced Skills:

Advanced Data Science Skills that make you indispensable!

[6] Communication skills: A picture is worth 1000 words; and being able to present data in meaningful, concise ways is crucial. Too many newbies get lost in the analysis itself, or hyper-focused on their beautiful code. Most managers want to see recommendations and insights that they can apply in practice! So being able to think like a “consultant” is crucial whether you are entry-level or the lead data scientist.

Good presentation skills (written and verbal) are important, more so for any dashboards or visualization reports, and I don’t mean color palettes or chart-types. Instead, make sure your dashboards are not “data-vomit”, a very practical (and apt!) term coined by Avinash Kaushik. If users cannot make head or tail of the dashboard without your handholding, or if the most important take-away is not obvious within 5 seconds, then you’ve done a poor job.  

[7] Cloud services: Most companies have moved databases to AWS/Azure, and many are implementing production models in the cloud. So, learn those basics about Docker, containers, and deploying your models and code to the cloud. This is still a niche skill, so having it will definitely help you stand apart as most companies make the move towards automation.

[8] Software engineering: You don’t need to become a software engineer but knowing basic architecture and data flow Qs will help you troubleshoot better, write better code that is easily moved to production. Some Qs to start – what is the data about, where (all) is it coming from? Learn about scheduler jobs and report automation, these have helped me automate the most boring repetitive tasks and look like a superstar to my managers! The infrastructure teams do extremely valuable work (keeping things running smoothly) so learn about “rules” and expectations, and make sure your code conforms to everything. I always do, and my requests are treated much better! ?

[9] Automated ML: This is slowly getting popular, as companies try to cut costs and improve efficiencies with automation. H20.ai and DataRobot are just 2 names off my head, but there are many more vendors in the market. If possible, learn how to work with those, as they can reduce your time for analysis and speed up production deployment. They won’t replace good data scientists, but they do magnify the disparity between someone who is mindless copy/pasting code and the truly efficient data scientists. So make sure your “core” skills are impeccable.  

[10] Domain expertise: Nothing beats experience, but even if you are new to the company (or field) learn as much as you can from senior colleagues and partner teams. Find out the “why/how/what” Qs – who is using the analysis results, why do they truly want it? How will it be applied? How does it save the company money or increase profits? How can I do it faster while maintaining accuracy, and also adding to the bottom line? What metric does the end user (or my manager) really care about?

As Machine learning software add more automation and features, this blend of technology and domain expertise will ensure you are never a casualty of layoffs or cost-cutting! I’ve put this at the end, but really you should be thinking about this from DAY ONE!

For example, my current role involves models for credit card fraud prediction. However, once I learned the end-to-end process of card customer lifecycle (incoming application, review, collections, payments, etc.) my models have become much better. Plus, I have deeper understanding of Fair banking and privacy laws which can prevent many demographic variables from being used in modes. Similarly, a friend working in the petrochemical industry realized that his boss cared more about preventing true negatives (Overlooking or NOT maintaining end-of-life or faulty sensors that can potentially cause leaks or explosions ) than false positives (unnecessary maintenance for good sensors), even though both models can give you similar accuracy.

So build these skills, and see your career and salary potential sky-rocket in 2020!

PHLAI – Philly Artificial Conference Summary

I had the opportunity to attend the third annual PHLAI 2019 conference on Artificial Intelligence and Machine Learning. Organized by Comcast, this was an amazing full-day event (on Aug 15th) with a theme of using AI to improve customer experience.

ai-cust-service
AI for customer service

For those who are not in the Greater Philadelphia area, I have to point out that most AI/Machine learning or even tech conferences typically happen either on the West Coast or in NYC. So the cost of travel/lodging can quickly add up if you want to attend as an individual.

Always been a point of contention to me that there aren’t any interesting conferences held in the Philly area. After all, we are home to a large number of startups (RJMetrics, Blackfynn, Stitch, Seer Interactive), Fortune500 employers like Nasdaq, Comcast, Vanguard and most of the big banks (JPMorgan, Barclays, etc.)

So here is a big shoutout to Comcast (esp. Ami Ehlenberger) for making the magic happen in Philly!

PHLAI 2019 Conference tracks and schedule

Coming back to the conference, here is a summary of my takeaways of the event:
1) Keynote – morning & evening
2) Speaker & Workshop Sessions – DataRobot, H20, Humane AI, Tableau and more..
3) Networking
4) Key Takeaways
5) Overall

1. Keynote

There were 2 keynotes – morning and afternoon. Both were brilliant, and engaging and filled with useful information.

Morning Keynote

This was given by Nij Chawla of Cognitive Scale. Snippets from his talk that resonated most with me:

  • Practical tips on how to use AI to create individual customization, especially to improve conversions on business websites.
  • For getting most ROI on AI investments, begin with smaller AI projects and deploy/iterate. Most projects fail because they aim at initiating a massive overhaul that take so long to move to production, that half the assumptions and data become invalid. Divide and conquer.
  • Don’t use AI just for the sake of it.
  • Real time data is crucial. 360 degrees view by aggregating multiple sources will help to drive maximum effectiveness, especially in banking and finance.
Morning Keynote – Cognitive Scale

He also presented some interesting examples of AI projects that are already in use today in various domains:

  • Early screening of autism & parkinson using front camera of phone. Apparently the typing/ tapping on phone is different for such patients and when combined with facial recognition technology, the false positives have been very low.
  • Drones in insurance. One company used drones to take pictures to assess damage after hurricane Harvey and proactively filed claims on behalf of their customers. This meant that the papaperwork and claim money was released for affected customers. Home repair contractors are in heavy demand after such natural disaster but the customers from this company had no trouble getting help because everything was already set up! What better example of extraordinary customer service than helping your customers re-build their life and homes? WOW!
  • Drone taxi in Dubai. Basically a driverless helicopter. Scary yet fascinating!
  • VR glasses for shopping. Allows customers to interact with different products and choose color/ design/ style. Ultimate customization without the hassles of carrying gigantic amounts of inventory.

Go back to Top of Page

Evening Keynote

This was presented by Piers Lingle & David Monnerat from Comcast. They spoke about the AI projects implemented at Comcast including projects to proactively present options to customers that increase customer experience, reduce call center volumes and complaints. I loved the idea of using automation to present 2-way dialog using multiple means of communication. The also spoke about chatbots and strategies that help chatbots learn at scale.

Using AI to augment customer service & wow them!

Some other salient points:

  • Automation is a part of the experience, not an add-on or an extra bother
  • Voice remote for Comcast TV users allows customers to fast forward for commercial breaks. Cool! 🙂
  • Think beyond NLP. AI works better in troubleshooting tasks when customers are struggling with describing problems, for example troubleshooting internet connectivity issues.
  • Measure from the beginning . Measure the right thing. Easy to say, hard to do.
  • Done correctly, AI does seem like magic. However, it takes a lot of work, science and complexity. So appreciate it! 🙂 Above all, trust the tech experts hired for this role instead of pigeoning them with hyper-narrow constraints but high expectations.

Go back to Top of Page

2. Speaker & Workshop sessions

The afternoon had 3 different tracks of speaker sessions with 3 talks each – Product, Customer Service & Real-World Experience. Attendees were allowed to attend talks from all 3 tracks which was what I did. The speaker lineup was so spectacular, I noticed a lot of folks being spoilt for choice and scratching their heads after every talk, on where to go! 🙂

I am only writing about the ones I attended. However, I did receive positive feedback overall for all the other sessions too! (No offense to the folks who presented but are not mentioned on here! )

Workshop by Datarobot.

This is an interesting software that helps you explore data, automate data cleansing and most important test by running multiple models in parallet. So you could test with hundreds of models and algorithms. The workshop was presented by the VP of DataScience, Gourab De and was extremely appealing.

It does have features for tweaking parameters, and excellent documentation so personally I think it would greatly augment the work of an experienced data scientists. Given the shortage of talent, tools like these would allow smaller teams to become faster and more efficient.

The availability of logs is another fabulous feature for banks (since I work in one) due to the stringent regulatory requirements around compliance, ethics and model governance.

Chaos Engineering & Machine Learning

Presented by Sudhir Borra of Comcast itself. The talk was mainly about the concept of chaos engineering; used to test resiliency of systems. This is quite crucial for serverless AI and automated CI/CD pipelines when models are tested and deployed in the cloud. This was quite technical yet practical to teams and companies looking to implement robust AI systems within their organizations.

Chaos Engg – initially developed at Netflix

Data Stories Tableau Session

Presented by Vidya Setlur of Tableau, who spoke of some interesting features with Tableau. For example, did you know Tableau gives logical color palettes where it makes sense, rather than to default colors? For example for a dataviz on vegetables, brocolli is assigned color green, whereas carrots would be assigned orange. She also spoke about “stroop” effect, which I urge every data scientist or Tableau developer to look up and understand.

There were also some interesting Qs about expanding the natural langauge questions to be activated via voice. The answer was that it is currently not available because the folks who would truly find this useful are senior executives who would possibly only use it over phone (not desktop) and hence also require touchscreen capabilities for using drill-down features.

This was a perspective I had not thought about, so I was quite impressed at how far the company has thought about features users might want far into the future. Like Apple, Tableau’s elegant design, ease of use and intuitive features make it a class apart in terms of functionality and customer experience.

The Humane Machine: Building AI Systems with Emotional Intelligence

Extremely engaging talk by Arpit Mathur of Comcast on some interesting work that Comcast is doing.

He also spoke of LIWC library – linguistic inquiry and word count and how to apply those on natural langauge data to gauge sentiments, similar to the Plutchiks wheel of emotions. Also on how to use these for creating more “sentient” chatbots and applications, since the context matters as much as words when we (humans) communicate. For example, how some cultures use words literally, whereas others could use it to mean the exact opposite!

He also discussed ethics in AI, which naturally was a “hot” topic and sparked multiple questions and discussions.

Handson Labs – H20 – Automated AI/ML

I’d heard many rave reviews about this product, so it was wonderful to be able to attend a handson workshop on how to use the tool, available features and check effectiveness of the models it can generate. Personally I found it to have just the right mix of automated model generation (great for newbies) and model customization (for advanced users).

Go back to Top of Page

3. Networking

The conference included two rounds of networking time, once with free breakfast, and second at 5:15 pm in a formal networking reception. Both were great opportunities to meet with other attendees and speakers.

I was not attending with any aim for job hunting, but I did see a list of open roles at Comcast. Plus, I did see folks talking about roles in their org, and describing their work, so certainly the event was an excellent venue to have informational interviews and reach out for potential collaboration.

Not to mention the new contacts will be useful in the future, should I choose to look.

I met a wide variety of folks – from students to software developers to product leads and even senior engineering leaders who wanted to learn best practices on how best to get ROI on future AI investments. There were folks who were just interested in learning more about AI, some looking for their next big idea and everyone in between. It was not just tech folks who attended, and folks were attending from domains like adtech, marketing, automobiles, healthcare and of course banking.

Amazing to see the turnout and community. Overall, I think job seekers should attend such events!

Oh, and I have to mention the gorgeous Comcast building – amazing working space and the central atrium was both futuristic and functional. I definitely regretted not taking the Comcast tour! (In fairness, the workshop I attended was fabulous too! )

Of course there was also time to chitchat between sessions and I am thankful to everyone who interacted with me! Thanks for your sparkling conversations.

Networking opportunities galore

Go back to Top of Page

4. Key Takeways

  • Focus on the customer and the end goal. Technology is only a tool to enable better outcomes and efficiencies. AI tools have matured, so make full use of it.
  • For the best return and impacts from AI projects, quick action is key. Choose a specific usecase, choose a metric to evaluate success and then proceed. If the model works, BRAVO! If they do not, you get valuable feedback on your initial assumptions, so you can make better decisions in the future.
  • Clean, reliable data is paramount. Garbage in, garbage out!
  • Collaboration is important. You do need to get the blessings from multiple stakeholders like IT, business, product, customer service, etc. But do not let collaboration equate to decision paralysis.
  • Done right, AI can be magical and live up to the hype. The critical component is getting started.
  • Data and user preferences change quickly, so think about automating model deployments and/or implementing CI/CD using technologies like kubernetes or the myriad tools and software available on the market to get to production quicker.
  • There is a huge talent shortage of experienced machine learning and artificial intelligence developers, so think about “upskilling” existing employees who already know the business and customer pain points.

Go back to Top of Page

5. Overall

Overall I found the event to be incredibly useful and I plan to attend again next year. I am also looking forward to use the takeaways in my own role and evangelize AI products within my org.

Go back to Top of Page

Sberbank Machine Learning Series – Post 1 – Project Introduction

For this month’s tutorials, we are going to work on the Kaggle Sberbank housing set, to forecast house price prices in Russia. This is a unique dataset from the Sberbank, an old and eminent institution in Russia, in that they have provided macroeconomic information along with the training and test data. The macro data includes variables like avg salary information, GDP, average mortgage rates by year, strength of Russian ruble versus Euro/Dollar, etc by month and year. This allows us to incorporate relevant political and economic factors that may create volatility in housing prices.

You can view more detailed information about the dataset, and download the files from the Kaggle website link here.

House price predictions

House price predictions

We are going to use this dataset in a series of posts to perform the following:

  1. Mindmaps for both Data exploration and solution framework.  In this dataset, there are 291 variables  in the training set, and 100 variables in the macro set. So for this project, we are going to use both Tableau and R for exploring the data.
  2. Initial Hypothesis testing to check for variable interactions, and help create meaningful derived variables.
  3. Baseline prediction models using 5 different machine learning algorithms.
  4. Internal and external validation. Internal validation by comparing models by sensitivity, accuracy and specificity . External validation by comparing scores on the Kaggle leaderboard.
  5. Ensemble (hybrid) models using combination of the baseline models.
  6. Final model upload to Kaggle.

 

Until next time, happy Coding!

Twitter
Visit Us
Follow Me
LinkedIn