Journey of Analytics

Deep dive into data analysis tools, theory and projects

Author: JOURNEYOFANALYTICS (page 1 of 4)

How to Become a Data Scientist in 2020

Despite the spike in the interest related to Data Science and Machine Learning roles and courses, it is still possible to become a fully functional data scientist with minimal resources.

Some caveats, (1) be committed to invest hours of effort building your expertise. (2) The job market has gotten quite competitive, so be mentally prepared to work strategically and accept that finding a job will require sweat equity.

Note, the title of this post is “Data Scientist” but the steps below are true even if your aim is to become a data analyst, data engineer, analytics consultant or machine learning engineer.

data science job types
Data Science job types

Steps to Data Science Expertise

At its core, becoming a data scientist will require three steps (in sequence):

  1. Learn the skills
  2. Build your portfolio
  3. Apply to jobs strategically.

Step 1 – Learn the skills.

The list of skills below are mandatory.

  • Programming in R or Python.
  • Programming in SQL. Most courses never talk about SQL, but it is critical.
  • Machine learning algorithms. Know the code and also which one fits for what use case.
  • If you search Google, you will find free courses and books on all the above topics. Or go for a low-cost option from Udemy. Essentially you can learn the skills for <$100, even now in 2020.

Step 2 – Build your portfolio.

  • You can add 100 certifications, but you also do need to showcase the learning by way of projects. Use Github to host your projects or create a free wordpress website. If you have the capacity, explore low cost website hosting from Wix or Squarespace.
  • The project should be unique to you. Pick any free public dataset, and apply your perspective to slice and dice the data, and extract insights. This is what will set you apart from the 10,000 other candidates who completed the same free bootcamp or Coursera class. Sample project idea list here., based on beginner or advanced skill levels.
  • Free tutorials are available on many sites, including my own journeyofanalytics.

Step 3 – Apply to jobs strategically.

  • The job market is getting heated up, as people enter this field in thousands. Getting job leads is hard, getting to interview stage is even harder.
  • Make sure your profile on LinkedIn is “all-star”, with at least 500 connections.
  • You can significantly improve your odds by leveraging niche job sites, and hunting on LinkedIn content tabs and Twitter. Both are highly manual, which is why they work! No one else wants to pursue those methods! 🙂 A detailed how-to guide, full list of niche job boards and interview question sets are all available in my job search book which I keep updating every quarter. These strategies work, hence the blatant plug-in!
  • Be prepared to face a lot of rejections, especially for landing the first job. In the beginning, don’t be afraid to accept a low-paying job or work internships. It is easier to get a job when you are already hired!
  • Initially you may be hired as a “data analyst” – accept! A lot of companies are using the terms analyst and scientist intermittently, or use the “data scientist” title to designate more experienced hires.
  • Note, there are other job types in the data science domain apart from “data scientist” so check if you can leverage your previous experiences for other role types.

Book Offer:

Note, I realize a lot of students are graduating soon and the global pandemic is making it hard to find jobs. Some employers are already reneging on confirmed offers, which increases pressure on students. Hence I’ve reduced my ebook price to $0.99 for the month of May 2020.

Note, the book will NOT be marked free to deter folks who just download books and guides but do not intend to put in any effort!

All the best for an exciting new career!

This question was previously answered (by me) on Quora under the question – “Is it possible to become a data scientist in 2020 with only a few resources?

50+ Free DataSets for DataScience Projects

free datasets

Hello All,

This is just a short note to specify that the list of FREE datasets is updated for 2020. There are 50+ sites and links to the newly released Google Dataset search engine. So, have fun exploring these data repositories to master programming, create stunning visualizations and build your own unique project portfolios.

Some starter projects with these datafiles are available on the Projects page, using R-programming.

Happy coding!

KubeCon – Preparation Checklist for Attendees

Just 5 days left to KubeCon + CloudNativeCon North America! 🙂 I am quite excited to finally attend this awesome conference and the chance to visit sunny San Diego! 🙂 Whether you are a first time attendee as well, or just looking to get your money’s worth from the conference here is a list of to-dos to make the most of this experience.

If you have not heard about KubeCon, then this is a conference aimed at Kubernetes and related container technologies, a way to get software applications running with cloud services. This is an entire ecosystem, and in the next few years will change software infrastructure concepts for all companies. Myriad companies including Uber, Google, Shopify, JPMorgan are already on board and deploying using these new methods.

These technologies are also a huge part of how machine learning models and AI applications are implemented successfully and at scale, which is why I (as owner of this datascience blog) got interested in Kubernetes. If you’ve run machine learning models using cloud services, you might have also used some of these tools without ever being aware of it.

This is (obviously) my first time attending this conference and visiting the city, so I had tons of Qs and thoughts. The amazing list of speakers and conference tracks also make it hard to choose which sessions to attend. Thankfully, I was able to get some excellent advice from the dedicated Slack channels for the conference and past attendees.

Since the countdown clock has started, I’ve summarized the tips for others, so you can make the most of this experience.

1. Get on Slack

  • I am so thankful to Wendi West, Paris Pittman and other moderators in the Slack channels for patiently answering questions, hotel recommendations, sending event reminders and building some great vibes for the conference! the organizers for this conference .
  • I found a lot of useful information on the channel specific to the Diversity scholarship recipients, followed by the events channel. If you still have last minute Qs, then post on this channel or DM the organizers.
  • The Slack channels are great to connect with folks before the event, so you have some familiar faces to meet at the conference.
  • If you have not checked the Slack channel – look it up via cloud-native.slack.com

2. Which Sessions to Attend

  • By now everyone should have created an itinery for themselves. If not please use the “sched” app with the following workspace url https://kccncna19.sched.com/
  • Note that you should have one broad agenda for the conference – either info for attempting a certification, networking for a job, discuss case studies so you can apply concepts at your job, or something else. This will allow you to make better selections without feeling overwhelmed by the sheer number of (fantastic) choices!
  • For me, the main goals is understand how Kubernetes is deployed for machine learning & AI projects. Since I work for a bank, security issues and migration of legacy/mainframe software into cloud services are also relevant topics, as are case studies. Having this theme allowed me to quickly decide and create a meaningful list of sessions to attend. I also hope to network with 25+ new people, a goal that should be quite easy in a conference with 8000+ attendees and spanning 5 days.
  • PRO TIP: Couple of past attendees advised not to over-schedule and to look at room locations. Although the conference is mainly happening at the San Diego convention center, there are some sessions that are being held at the Marriot and other hotels. So look them up and make sure you have enough time to walk to different venues.
    • Plus, if a session is quite interesting, you might want to hang back and chat with the speaker or ask additional clarifications. This might cause you to miss the next session, so design your schedule carefully.

3. Networking

Networking
Networking
  • Being part of a distince Diversity scholarship Slack channel means that I’ve already connected with 5-10 other recipients. After all these online discussions, it will great to meet these talented and ambitious folks in person!
  • Most past attendees have emphatically stated that folks attending this conference are very generous with their time, so please make the most of their expertise and knowledge.
  • Dont be shy! Speak up.
  • Speakers are amazing, and human too! So feel free to say hello after the session, and ask follow up Qs or just thank them for an interesting discussion.
  • For those who are extremely nervous about networking, here is a unique tip that someone told me years ago. Pick a color and talk to at least 5 people wearing clothes in that color. This might seem crazy, but it is a very practical way of overcoming self-bias and prejudices and talking to people we would not normally approach (feeling shy, out of place or other reasons) I’ve used it at other events and conferences and made some fabulous connections!
  • Use LinkedIn app and connect immediately. If you met someone interesting, then send them an invite during the conversation itself. No one ever says no, and if you wait you will forget to followup, either because you forgot their name, used the wrong spelling or misplaced their business card. Plus, at such large conferences it is terribly hard to keep track of all the people you meet. I used this at the Philly AI conference very successfully, and can’t wait to connect with folks at KubeCon too!

During the conference:

Conference attendees
  • Keep a notepad handy for broad keywords and ideas that are directly applicable to your role (and conference goal) .
  • List the session date, speaker name and time. This will help you look it up later, esp as I’ve heard many speakers post their ppt and videos after the conference.
  • Tweet! Use the tags #KubeCon #CloudNativeCon and #DiversityScholarship.
  • Connect with people on LinkedIn (reiterating from above)
  • Check out the sponsored coffee/breakfast sessions and after hour meetups; I’ve heard they are amazing, as are the “lightning talks” post 5 pm.
  • Attend the sponsor expo and booths. Apart from the cool swag (tees, pens, stickers, etc) you will get to see some interesting demos, hobnob with folks from companies both large and small. Basically everyone from startups to large enterprises like Microsoft and Palo Alto, and everything in between. Great way to learn what’s happening in this space – you might even get your own unicorn startup idea! 🙂

Post Conference

  • Reconnect with the folks you’ve met on LinkedIn.
  • Add a blog post summarizing info you’ve learnt and takeaways from the conference. Everyone has a unique perspective, so don’t feel as if everything has already been said! Ideally do this within a week, when you are still fresh with your ideas. Remember to use the hashtags.
  • Add pictures from the event on LinkedIn. Make sure to tag your new friends too!
  • If possible, present a brown bag or session to your team (or group) at office. This is a great way of disseminating information to others who could not attend, improve your public speaking skills and also score some brownie points for your next employee appraisal! Win-win all around.
  • Use what you’ve learnt. Even if it is just a little portion!
  • Plan ahead to attend next year’s conference!

That’s it from me, see you all at the conference!

November Thanksgiving – Data Science Style!

Hello All,

November is the month of Thanksgiving, and vacations and of course deals galore! As part of saying thanks to my loyal readers, here are some deals specific to data science professionals and students, that you should definitely not miss on.

Book deals:

  1. If you are exploring Data Science careers or preparing for interviews for a winter graduation, then take a look at my ebook “Data Science Jobs“. It is currently part of a Kindle countdown deal and priced 50% off from its normal price. Currently only $2.99 and prices will keep increasing until Friday morning when it goes back to full price.
  2. Want a FREE book on Statistics, as related to R-programming and machine learning algorithms? I am currently looking to giveaway FREE advanced reviewer copies (ARC) . You can look at the book contents here, and if it seems interesting then please sign up here to be a reviewer.
  3. If you are deploying machine learning models on the cloud, then chances are you work with Kubernetes or have at least heard of it. If you haven’t and you are an aspiring data scientist/ engineer, then you should compulsorily learn about tho

Nov projects:

  1. The R-programming project for November is a sentiment analysis on song lyrics by different artists. There is lots of data wrangling involved to aggregate different lyrics, and compare the lyrics favored by 2 different artists. The code repository is added to the Projects page here. I’ve written the main code in R, and used Tableau to generate some of the visuals, but this can be easily tweaked to create an awesome Shiny dashboard to add to a data science portfolio.

Until next time, Adieu for now!

Social Network Visualization with R

In this month’s we are going to look at data analysis and visualization of social networks using R programming.

Social Networks – Data Visualization

Friendster Networks Mapping

Friendster was a yesteryear social media network, something akin to Facebook. I’ve never used it but it is one of those easily available datasets where you have a list of users and all their connections. So it is easy to create a viz and look at whose networks are strong and whose are weak, or even the bridge between multiple networks.

The dataset and code files are added on the Projects Page here , under “social network viz”.

For this analysis, we will be using the following library packages:

  • visNetwork
  • geomnet
  • igraph

Steps:

  1. Load the datafiles. The list of users is given in the file named “nodes” as each user is a node in the graph. The connection list is given in the file named “edges” as a 1-to-1 mapping. So if user Miranda has 10 friends, there would be 10 records for Miranda in the “edges” file, one for each friend. The friendster datafile has been anonymized, so there are numbers (id) rather than names.
  2. Convert the dataframes into a very specific format. We do some prepwork so that we can directly use the graph visualization functions.
  3. Create a graph object. This will also help to create clusters. Since the dataset is anonymized it might seem irrelevant, but imagine this in your own social network. You might have one cluster of friends who are from your school, another bunch from your office, one set who are cousins and family members and some random folks. Creating a graph object allows us to look at where those clusters lie automatically.
  4. Visualize using functions specific to graph objects. The first function is visNetwork() which generates an interactive color coded cluster graph. When you click on any of th nodes (colored circles), it will highlight all the connections radiating from the node. (In the image below, I have highlighted node for user 17. nwk-viz-highlight
  5. You can also use the same function with a bunch of different parameters, as shown below:

In the image below you can see the 3 colored clusters and the central (light blue) node. The connections in blue are the ones that do not have a lot of direct connections. The yellow and red clusters are tigher, indicating they have internal connections with each other. (similar to a bunch of classmates who all know each other)

network clusters
network clusters

That’s it. Again the code is available on the Projects Page.

Code Extensions

Feel free to play around with the code. One extensions of this idea would be to download Facebook or LinkedIn data (premium account needed) and create similar visualizations.

Or if you have a list of airports and routes, you could create something like this as a flight network map, to know the minimum number of hops between 2 destinations and alternative routes.

You could also do a counter to see which nodes have the most number of friends and increase the size of the circle. This would make it easier to view which nodes are the most well-connected.

Of course, do not be over-mesmerized by the data. In real-life, the strength of the relationship also matters. This is hard to quantify or collect, even though its easy to depict once you have the data in hand/ For example, I have a 1000 connections who I’ve met at conferences or random events. If I needed a job, most may not really be useful. But my friend Sarah has only 300 but super-loyal friends who literally found her a job in 2 days when she had to move back to her hometown to take care of a sick parent.

With that thought, do take a look at the code and have fun coding! 🙂

Older posts
Facebook
LinkedIn