Journey of Analytics

Deep dive into data analysis tools, theory and projects

KubeCon – Preparation Checklist for Attendees

Just 5 days left to KubeConΒ + CloudNativeCon North America! πŸ™‚ I am quite excited to finally attend this awesome conference and the chance to visit sunny San Diego! πŸ™‚ Whether you are a first time attendee as well, or just looking to get your money’s worth from the conference here is a list of to-dos to make the most of this experience.

If you have not heard about KubeCon, then this is a conference aimed at Kubernetes and related container technologies, a way to get software applications running with cloud services. This is an entire ecosystem, and in the next few years will change software infrastructure concepts for all companies. Myriad companies including Uber, Google, Shopify, JPMorgan are already on board and deploying using these new methods.

These technologies are also a huge part of how machine learning models and AI applications are implemented successfully and at scale, which is why I (as owner of this datascience blog) got interested in Kubernetes. If you’ve run machine learning models using cloud services, you might have also used some of these tools without ever being aware of it.

This is (obviously) my first time attending this conference and visiting the city, so I had tons of Qs and thoughts. The amazing list of speakers and conference tracks also make it hard to choose which sessions to attend. Thankfully, I was able to get some excellent advice from the dedicated Slack channels for the conference and past attendees.

Since the countdown clock has started, I’ve summarized the tips for others, so you can make the most of this experience.

1. Get on Slack

  • I am so thankful to Wendi West, Paris Pittman and other moderators in the Slack channels for patiently answering questions, hotel recommendations, sending event reminders and building some great vibes for the conference! the organizers for this conference .
  • I found a lot of useful information on the channel specific to the Diversity scholarship recipients, followed by the events channel. If you still have last minute Qs, then post on this channel or DM the organizers.
  • The Slack channels are great to connect with folks before the event, so you have some familiar faces to meet at the conference.
  • If you have not checked the Slack channel – look it up via cloud-native.slack.com

2. Which Sessions to Attend

  • By now everyone should have created an itinery for themselves. If not please use the “sched” app with the following workspace url https://kccncna19.sched.com/
  • Note that you should have one broad agenda for the conference – either info for attempting a certification, networking for a job, discuss case studies so you can apply concepts at your job, or something else. This will allow you to make better selections without feeling overwhelmed by the sheer number of (fantastic) choices!
  • For me, the main goals is understand how Kubernetes is deployed for machine learning & AI projects. Since I work for a bank, security issues and migration of legacy/mainframe software into cloud services are also relevant topics, as are case studies. Having this theme allowed me to quickly decide and create a meaningful list of sessions to attend. I also hope to network with 25+ new people, a goal that should be quite easy in a conference with 8000+ attendees and spanning 5 days.
  • PRO TIP: Couple of past attendees advised not to over-schedule and to look at room locations. Although the conference is mainly happening at the San Diego convention center, there are some sessions that are being held at the Marriot and other hotels. So look them up and make sure you have enough time to walk to different venues.
    • Plus, if a session is quite interesting, you might want to hang back and chat with the speaker or ask additional clarifications. This might cause you to miss the next session, so design your schedule carefully.

3. Networking

Networking
Networking
  • Being part of a distince Diversity scholarship Slack channel means that I’ve already connected with 5-10 other recipients. After all these online discussions, it will great to meet these talented and ambitious folks in person!
  • Most past attendees have emphatically stated that folks attending this conference are very generous with their time, so please make the most of their expertise and knowledge.
  • Dont be shy! Speak up.
  • Speakers are amazing, and human too! So feel free to say hello after the session, and ask follow up Qs or just thank them for an interesting discussion.
  • For those who are extremely nervous about networking, here is a unique tip that someone told me years ago. Pick a color and talk to at least 5 people wearing clothes in that color. This might seem crazy, but it is a very practical way of overcoming self-bias and prejudices and talking to people we would not normally approach (feeling shy, out of place or other reasons) I’ve used it at other events and conferences and made some fabulous connections!
  • Use LinkedIn app and connect immediately. If you met someone interesting, then send them an invite during the conversation itself. No one ever says no, and if you wait you will forget to followup, either because you forgot their name, used the wrong spelling or misplaced their business card. Plus, at such large conferences it is terribly hard to keep track of all the people you meet. I used this at the Philly AI conference very successfully, and can’t wait to connect with folks at KubeCon too!

During the conference:

Conference attendees
  • Keep a notepad handy for broad keywords and ideas that are directly applicable to your role (and conference goal) .
  • List the session date, speaker name and time. This will help you look it up later, esp as I’ve heard many speakers post their ppt and videos after the conference.
  • Tweet! Use the tags #KubeCon #CloudNativeCon and #DiversityScholarship.
  • Connect with people on LinkedIn (reiterating from above)
  • Check out the sponsored coffee/breakfast sessions and after hour meetups; I’ve heard they are amazing, as are the “lightning talks” post 5 pm.
  • Attend the sponsor expo and booths. Apart from the cool swag (tees, pens, stickers, etc) you will get to see some interesting demos, hobnob with folks from companies both large and small. Basically everyone from startups to large enterprises like Microsoft and Palo Alto, and everything in between. Great way to learn what’s happening in this space – you might even get your own unicorn startup idea! πŸ™‚

Post Conference

  • Reconnect with the folks you’ve met on LinkedIn.
  • Add a blog post summarizing info you’ve learnt and takeaways from the conference. Everyone has a unique perspective, so don’t feel as if everything has already been said! Ideally do this within a week, when you are still fresh with your ideas. Remember to use the hashtags.
  • Add pictures from the event on LinkedIn. Make sure to tag your new friends too!
  • If possible, present a brown bag or session to your team (or group) at office. This is a great way of disseminating information to others who could not attend, improve your public speaking skills and also score some brownie points for your next employee appraisal! Win-win all around.
  • Use what you’ve learnt. Even if it is just a little portion!
  • Plan ahead to attend next year’s conference!

That’s it from me, see you all at the conference!

November Thanksgiving – Data Science Style!

Hello All,

November is the month of Thanksgiving, and vacations and of course deals galore! As part of saying thanks to my loyal readers, here are some deals specific to data science professionals and students, that you should definitely not miss on.

Book deals:

  1. If you are exploring Data Science careers or preparing for interviews for a winter graduation, then take a look at my ebook “Data Science Jobs“. It is currently part of a Kindle countdown deal and priced 50% off from its normal price. Currently only $2.99 and prices will keep increasing until Friday morning when it goes back to full price.
  2. Want a FREE book on Statistics, as related to R-programming and machine learning algorithms? I am currently looking to giveaway FREE advanced reviewer copies (ARC) . You can look at the book contents here, and if it seems interesting then please sign up here to be a reviewer.
  3. If you are deploying machine learning models on the cloud, then chances are you work with Kubernetes or have at least heard of it. If you haven’t and you are an aspiring data scientist/ engineer, then you should compulsorily learn about tho

Nov projects:

  1. The R-programming project for November is a sentiment analysis on song lyrics by different artists. There is lots of data wrangling involved to aggregate different lyrics, and compare the lyrics favored by 2 different artists. The code repository is added to the Projects page here. I’ve written the main code in R, and used Tableau to generate some of the visuals, but this can be easily tweaked to create an awesome Shiny dashboard to add to a data science portfolio.

Until next time, Adieu for now!

Social Network Visualization with R

In this month’s we are going to look at data analysis and visualization of social networks using R programming.

Social Networks – Data Visualization

Friendster Networks Mapping

Friendster was a yesteryear social media network, something akin to Facebook. I’ve never used it but it is one of those easily available datasets where you have a list of users and all their connections. So it is easy to create a viz and look at whose networks are strong and whose are weak, or even the bridge between multiple networks.

The dataset and code files are added on the Projects Page here , under “social network viz”.

For this analysis, we will be using the following library packages:

  • visNetwork
  • geomnet
  • igraph

Steps:

  1. Load the datafiles. The list of users is given in the file named “nodes” as each user is a node in the graph. The connection list is given in the file named “edges” as a 1-to-1 mapping. So if user Miranda has 10 friends, there would be 10 records for Miranda in the “edges” file, one for each friend. The friendster datafile has been anonymized, so there are numbers (id) rather than names.
  2. Convert the dataframes into a very specific format. We do some prepwork so that we can directly use the graph visualization functions.
  3. Create a graph object. This will also help to create clusters. Since the dataset is anonymized it might seem irrelevant, but imagine this in your own social network. You might have one cluster of friends who are from your school, another bunch from your office, one set who are cousins and family members and some random folks. Creating a graph object allows us to look at where those clusters lie automatically.
  4. Visualize using functions specific to graph objects. The first function is visNetwork() which generates an interactive color coded cluster graph. When you click on any of th nodes (colored circles), it will highlight all the connections radiating from the node. (In the image below, I have highlighted node for user 17. nwk-viz-highlight
  5. You can also use the same function with a bunch of different parameters, as shown below:

In the image below you can see the 3 colored clusters and the central (light blue) node. The connections in blue are the ones that do not have a lot of direct connections. The yellow and red clusters are tigher, indicating they have internal connections with each other. (similar to a bunch of classmates who all know each other)

network clusters
network clusters

That’s it. Again the code is available on the Projects Page.

Code Extensions

Feel free to play around with the code. One extensions of this idea would be to download Facebook or LinkedIn data (premium account needed) and create similar visualizations.

Or if you have a list of airports and routes, you could create something like this as a flight network map, to know the minimum number of hops between 2 destinations and alternative routes.

You could also do a counter to see which nodes have the most number of friends and increase the size of the circle. This would make it easier to view which nodes are the most well-connected.

Of course, do not be over-mesmerized by the data. In real-life, the strength of the relationship also matters. This is hard to quantify or collect, even though its easy to depict once you have the data in hand/ For example, I have a 1000 connections who I’ve met at conferences or random events. If I needed a job, most may not really be useful. But my friend Sarah has only 300 but super-loyal friends who literally found her a job in 2 days when she had to move back to her hometown to take care of a sick parent.

With that thought, do take a look at the code and have fun coding! πŸ™‚

PHLAI – Philly Artificial Conference Summary

I had the opportunity to attend the third annual PHLAI 2019 conference on Artificial Intelligence and Machine Learning. Organized by Comcast, this was an amazing full-day event (on Aug 15th) with a theme of using AI to improve customer experience.

ai-cust-service
AI for customer service

For those who are not in the Greater Philadelphia area, I have to point out that most AI/Machine learning or even tech conferences typically happen either on the West Coast or in NYC. So the cost of travel/lodging can quickly add up if you want to attend as an individual.

Always been a point of contention to me that there aren’t any interesting conferences held in the Philly area. After all, we are home to a large number of startups (RJMetrics, Blackfynn, Stitch, Seer Interactive), Fortune500 employers like Nasdaq, Comcast, Vanguard and most of the big banks (JPMorgan, Barclays, etc.)

So here is a big shoutout to Comcast (esp. Ami Ehlenberger) for making the magic happen in Philly!

PHLAI 2019 Conference tracks and schedule

Coming back to the conference, here is a summary of my takeaways of the event:
1) Keynote – morning & evening
2) Speaker & Workshop Sessions – DataRobot, H20, Humane AI, Tableau and more..
3) Networking
4) Key Takeaways
5) Overall

1. Keynote

There were 2 keynotes – morning and afternoon. Both were brilliant, and engaging and filled with useful information.

Morning Keynote

This was given by Nij Chawla of Cognitive Scale. Snippets from his talk that resonated most with me:

  • Practical tips on how to use AI to create individual customization, especially to improve conversions on business websites.
  • For getting most ROI on AI investments, begin with smaller AI projects and deploy/iterate. Most projects fail because they aim at initiating a massive overhaul that take so long to move to production, that half the assumptions and data become invalid. Divide and conquer.
  • Don’t use AI just for the sake of it.
  • Real time data is crucial. 360 degrees view by aggregating multiple sources will help to drive maximum effectiveness, especially in banking and finance.
Morning Keynote – Cognitive Scale

He also presented some interesting examples of AI projects that are already in use today in various domains:

  • Early screening of autism & parkinson using front camera of phone. Apparently the typing/ tapping on phone is different for such patients and when combined with facial recognition technology, the false positives have been very low.
  • Drones in insurance. One company used drones to take pictures to assess damage after hurricane Harvey and proactively filed claims on behalf of their customers. This meant that the papaperwork and claim money was released for affected customers. Home repair contractors are in heavy demand after such natural disaster but the customers from this company had no trouble getting help because everything was already set up! What better example of extraordinary customer service than helping your customers re-build their life and homes? WOW!
  • Drone taxi in Dubai. Basically a driverless helicopter. Scary yet fascinating!
  • VR glasses for shopping. Allows customers to interact with different products and choose color/ design/ style. Ultimate customization without the hassles of carrying gigantic amounts of inventory.

Go back to Top of Page

Evening Keynote

This was presented by Piers Lingle & David Monnerat from Comcast. They spoke about the AI projects implemented at Comcast including projects to proactively present options to customers that increase customer experience, reduce call center volumes and complaints. I loved the idea of using automation to present 2-way dialog using multiple means of communication. The also spoke about chatbots and strategies that help chatbots learn at scale.

Using AI to augment customer service & wow them!

Some other salient points:

  • Automation is a part of the experience, not an add-on or an extra bother
  • Voice remote for Comcast TV users allows customers to fast forward for commercial breaks. Cool! πŸ™‚
  • Think beyond NLP. AI works better in troubleshooting tasks when customers are struggling with describing problems, for example troubleshooting internet connectivity issues.
  • Measure from the beginning . Measure the right thing. Easy to say, hard to do.
  • Done correctly, AI does seem like magic. However, it takes a lot of work, science and complexity. So appreciate it! πŸ™‚ Above all, trust the tech experts hired for this role instead of pigeoning them with hyper-narrow constraints but high expectations.

Go back to Top of Page

2. Speaker & Workshop sessions

The afternoon had 3 different tracks of speaker sessions with 3 talks each – Product, Customer Service & Real-World Experience. Attendees were allowed to attend talks from all 3 tracks which was what I did. The speaker lineup was so spectacular, I noticed a lot of folks being spoilt for choice and scratching their heads after every talk, on where to go! πŸ™‚

I am only writing about the ones I attended. However, I did receive positive feedback overall for all the other sessions too! (No offense to the folks who presented but are not mentioned on here! )

Workshop by Datarobot.

This is an interesting software that helps you explore data, automate data cleansing and most important test by running multiple models in parallet. So you could test with hundreds of models and algorithms. The workshop was presented by the VP of DataScience, Gourab De and was extremely appealing.

It does have features for tweaking parameters, and excellent documentation so personally I think it would greatly augment the work of an experienced data scientists. Given the shortage of talent, tools like these would allow smaller teams to become faster and more efficient.

The availability of logs is another fabulous feature for banks (since I work in one) due to the stringent regulatory requirements around compliance, ethics and model governance.

Chaos Engineering & Machine Learning

Presented by Sudhir Borra of Comcast itself. The talk was mainly about the concept of chaos engineering; used to test resiliency of systems. This is quite crucial for serverless AI and automated CI/CD pipelines when models are tested and deployed in the cloud. This was quite technical yet practical to teams and companies looking to implement robust AI systems within their organizations.

Chaos Engg – initially developed at Netflix

Data Stories Tableau Session

Presented by Vidya Setlur of Tableau, who spoke of some interesting features with Tableau. For example, did you know Tableau gives logical color palettes where it makes sense, rather than to default colors? For example for a dataviz on vegetables, brocolli is assigned color green, whereas carrots would be assigned orange. She also spoke about “stroop” effect, which I urge every data scientist or Tableau developer to look up and understand.

There were also some interesting Qs about expanding the natural langauge questions to be activated via voice. The answer was that it is currently not available because the folks who would truly find this useful are senior executives who would possibly only use it over phone (not desktop) and hence also require touchscreen capabilities for using drill-down features.

This was a perspective I had not thought about, so I was quite impressed at how far the company has thought about features users might want far into the future. Like Apple, Tableau’s elegant design, ease of use and intuitive features make it a class apart in terms of functionality and customer experience.

The Humane Machine: Building AI Systems with Emotional Intelligence

Extremely engaging talk by Arpit Mathur of Comcast on some interesting work that Comcast is doing.

He also spoke of LIWC library – linguistic inquiry and word count and how to apply those on natural langauge data to gauge sentiments, similar to the Plutchiks wheel of emotions. Also on how to use these for creating more “sentient” chatbots and applications, since the context matters as much as words when we (humans) communicate. For example, how some cultures use words literally, whereas others could use it to mean the exact opposite!

He also discussed ethics in AI, which naturally was a “hot” topic and sparked multiple questions and discussions.

Handson Labs – H20 – Automated AI/ML

I’d heard many rave reviews about this product, so it was wonderful to be able to attend a handson workshop on how to use the tool, available features and check effectiveness of the models it can generate. Personally I found it to have just the right mix of automated model generation (great for newbies) and model customization (for advanced users).

Go back to Top of Page

3. Networking

The conference included two rounds of networking time, once with free breakfast, and second at 5:15 pm in a formal networking reception. Both were great opportunities to meet with other attendees and speakers.

I was not attending with any aim for job hunting, but I did see a list of open roles at Comcast. Plus, I did see folks talking about roles in their org, and describing their work, so certainly the event was an excellent venue to have informational interviews and reach out for potential collaboration.

Not to mention the new contacts will be useful in the future, should I choose to look.

I met a wide variety of folks – from students to software developers to product leads and even senior engineering leaders who wanted to learn best practices on how best to get ROI on future AI investments. There were folks who were just interested in learning more about AI, some looking for their next big idea and everyone in between. It was not just tech folks who attended, and folks were attending from domains like adtech, marketing, automobiles, healthcare and of course banking.

Amazing to see the turnout and community. Overall, I think job seekers should attend such events!

Oh, and I have to mention the gorgeous Comcast building – amazing working space and the central atrium was both futuristic and functional. I definitely regretted not taking the Comcast tour! (In fairness, the workshop I attended was fabulous too! )

Of course there was also time to chitchat between sessions and I am thankful to everyone who interacted with me! Thanks for your sparkling conversations.

Networking opportunities galore

Go back to Top of Page

4. Key Takeways

  • Focus on the customer and the end goal. Technology is only a tool to enable better outcomes and efficiencies. AI tools have matured, so make full use of it.
  • For the best return and impacts from AI projects, quick action is key. Choose a specific usecase, choose a metric to evaluate success and then proceed. If the model works, BRAVO! If they do not, you get valuable feedback on your initial assumptions, so you can make better decisions in the future.
  • Clean, reliable data is paramount. Garbage in, garbage out!
  • Collaboration is important. You do need to get the blessings from multiple stakeholders like IT, business, product, customer service, etc. But do not let collaboration equate to decision paralysis.
  • Done right, AI can be magical and live up to the hype. The critical component is getting started.
  • Data and user preferences change quickly, so think about automating model deployments and/or implementing CI/CD using technologies like kubernetes or the myriad tools and software available on the market to get to production quicker.
  • There is a huge talent shortage of experienced machine learning and artificial intelligence developers, so think about “upskilling” existing employees who already know the business and customer pain points.

Go back to Top of Page

5. Overall

Overall I found the event to be incredibly useful and I plan to attend again next year. I am also looking forward to use the takeaways in my own role and evangelize AI products within my org.

Go back to Top of Page

DataScience Portfolio Ideas for Students & Beginners

A lot has been written on the importance of a portfolio if you are looking for a DataScience role. Ideally, you should document your learning journey so that you can reuse code, write well-documented code and also improve your data storytelling skills.

DataScience Portfolio Ideas

However, most students and beginners get stumped on what to include in their portfolio, as their projects are all the same that their classmates, bootcamp associates and seniors have created. So, in this post I am going to tell you what projects you should have in your portfolio kitty, as well as a list of ideas that you can use to construct a collection of projects that will help you stand out on LinkedIn, Github and in the eyes of prospective hiring managers.

Job Search Guide

You can find many interesting projects on the “Projects” page of my website JourneyofAnalytics. I’ve also listed 50+ sources for free datasets in this blogpost.

In this post though, I am classifying projects based on skill level along with sample ideas for DIY projects that you can attempt on your own.

On that note, if you are already looking for a job, or about to do so, do take a look at my book “DataScience Jobs“, available on Amazon. This book will help you reduce your job search time and quickly start a career in analytics.

Since I prefer R over Python, all the project lists in this post will be coded in R. However, feel free to implement these ideas in Python, too!

a. Entry-level / Rookie Stage

  1. If you are just starting out, and are not very comfortable with even syntax, your main aim is to learn how to code along with DataScience concepts. At this stage, just try to write simple scripts in R that can pull data, clean it up and calculate mean/median and create basic exploratory graphs. Pick up any competition dataset on Kaggle.com and look at the highest voted EDA script. Try to recreate it on your own, read through and understand the hows and whys of the code. One excellent example is the Zillow EDA by Philipp Spachtholz.
  2. This will not only teach you the code syntax, but also how to approach a new dataset and slice/dice it to identify meaningful patterns before any analysis can begin.
  3. Once you are comfortable, you can move on to machine learning algorithms. Rather than Titanic, I actually prefer the Housing Prices Dataset. Initially, run the sample submission to establish a baseline score on the leaderboard. Then apply every algorithm you can look up and see how it works on the dataset. This is the fastest way to understand why some algorithms work on numerical target variables versus categorical versus time series.
  4. Next, look at the kernels with decent leaderboard score and replicate them. If you applied those algorithms but did not get the same result, check why there was a mismatch.
  5. Now pick a new dataset and repeat. I prefer competition datasets since you can easily see how your score moves up or down. Sometimes simple decision trees work better than complex Bayesian logic or Xgboost. Experimenting will help you figure out why.

Sample ideas –

  • Survey analysis: Pick up a survey dataset like the Stack overflow developer survey and complete a thorough EDA – men vs women, age and salary correlation, cities with highest salary after factoring in currency differences and cost of living. Can your insights also be converted into an eye-catching Infographic? Can you recreate this?
  • Simple predictions: Apply any algorithms you know on the Google analytics revenue predictor dataset. How do you compare against the baseline sample submission? Against the leaderboard?
  • Automated reporting: Go for end-to-end reporting. Can you automate a simple report, or create a formatted Excel or pdf chart using only R programming? Sample code here.

b. Senior Analyst/Coder

  1. At this stage simple competitions should be easy for you. You dont need to be in the top 1%, even being in the Top 30-40% is good enough. Although, if you can win a competition even better!
  2. Now you can start looking at non-tabular data like NLP sentiment analysis, image classification, API data pulls and even dataset mashup. This is also the stage when you probably feel comfortable enough to start applying for roles, so building unique projects are key.
  3. For sentiment analysis, nothing beats Twitter data, so get the API keys and start pulling data on a topic of interest. You might be limited by the daily pull limits on the free tier, so check if you need 2 accounts and aggregate data over a couple days or even a week. A starter example is the sentiment analysis I did during the Rio Olympics supporting Team USA.
  4. You should also start dabbling in RShiny and automated reports as these will help you in actual jobs where you need to present idea mockups and standardizing weekly/ daily reports.
Yelp College Search App

Sample ideas –

  • Twitter Sentiment Analysis: Look at the Twitter sentiments expressed before big IPO launches and see whether the positive or negative feelings correlated with a jump in prices. There are dozens of apps that look at the relation between stock prices and Twitter sentiments, but for this you’d need to be a little more creative since the IPO will not have any historical data to predict the first day dips and peaks.
  • API/RShiny Project: Develop a RShiny dashboard using Yelp API, showing the most popular restaurants around airports. You can combine a public airport dataset and merge it with filtered data from the Yelp API. A similar example (with code) is included in this Yelp College App dashboard.
  • Lyrics Clustering: Try doing some text analytics using song lyrics from this dataset with 50,000+ songs. Do artists repeat their lyrics? Are there common themes across all artists? Do male singers use different words versus female solo tracks? Do bands focus on a totally different theme? If you see your favorite band or lead singer, check how their work has evolved over the years.
  • Image classification starter tutorial is here. Can you customize the code and apply to a different image database?

c. Expert Data Scientist

DataScience Expert portfolio
  1. By now, you should be fairly comfortable with analyzing data from different datasource types (image, text, unstructured), building advanced recommender systems and implementing unsupervised machine learning algorithms. You are now moving from analyze stage to build stage.
  2. You may or may not already have a job by now. If you do, congratulations! Remember to keep learning and coding so you can accelerate your career further.
  3. If you have not, check out my book on how to land a high-paying ($$$) Data Science job job within 90 days.
  4. Look at building Deep learning using keras and apps using artificial intelligence. Even better, can you fully automate your job? No, you wont “downsize” yourself. Instead your employer will happily promote you since you’ve shown them a superb way to improve efficiency and cut costs, and they will love to have you look at other parts of the business where you can repeat the process.

Sample project ideas –

  • Build an App: College recommender system using public datasets and web scraping in R. (Remember to check terms of service as you do not want to violate any laws!) Goal is to recreate a report like the Top 10 cities to live in, but from a college perspective.
  • Start thinking about what data you need – college details (names, locations, majors, size, demographics, cost), outlook (Christian/HBCU/minority), student prospects (salary after graduation, time to graduate, diversity, scholarship, student debt ) , admission process (deadlines, average scores, heavy sports leaning) and so on. How will you aggregate this data? Where will you store it? How can you make it interactive and create an app that people might pay for?
  • Upwork Gigs: Look at Upwork contracts tagged as intermediate or expert, esp. the ones with $500+ budgets. Even if you dont want to bid, just attempt the project on your own. If you fail, you will know you still need to master some more concepts, if you succeed then it will be a superb confidence booster and learning opportunity.
  • Audio Processing: Use the VOX celebrity dataset to identify the speaker based on audio/speech dataset. Audio files are an interesting datasource with applications in customer recognition (think bank call centers to prevent fraud), parsing for customer complaints, etc.
  • Build your own package: Think about the functions and code you use most often. Can you build a package around it? The most trending R-packages are listed here. Can you build something better?

Do you have any other interesting ideas? If so, feel free to contact me with your ideas or send me a link with the Github repo.

« Older posts
Facebook
LinkedIn