R is an amazing tool to perform advanced statistical analysis and create stunning visualizations. However, data scientists and analytics practitioners do not work in silos, so these analysis have to be copied and emailed to senior managers and partners teams. Cut-copy-paste sounds great, but if it is a daily or periodic task, it is more useful to automate the reports. So in this blogpost, we are going to learn how to do exactly that.
The R-code uses specific library packages to do this:
- RDCOMClient – to connect to Outlook and send emails. In most offices, Outlook is still the defacto email client, so this is fine. However, if you are using Slack or something different it may not work.
- r2excel – To create an excel output file.
The screenshot below shows the final email view:
As seen in the screenshot, the email contains the following:
- Custom subject with current date
- Embedded image
- Attachments – 1 Excel and 1 pdf report
The code and supporting input files are available here, under the Projects page under Nov2018. The code has 4 parts:
- Prepare the work space.
- Pull the data from source.
- Cleaning and calculations
- Create pdf.
- Create Excel file.
- Send email.
Prepare the work space
I always set the relative paths and working directories at the very beginning, so it is easier to change paths later. You can replace the link with a shared network drive path as well.
Load library packages and custom functions. My code uses the r2excel package which is not directly available as an R-cran package. So you need to install using devtools using the code below.
It is possible to do something similar using the “xlsx” package, but r2excel is easier.
Some other notes:
- you need the first 2 lines of code only for the first time you installation. From the second time onwards, you only need to load the library.
- r2excel seems to work only with 64-bit installations of R and Rstudio.
- you do need Java installed on your computer. If you see an error about java namespace, then check the path variables. There is a very useful thread on Stackoverflow, so take a look.
- As always, if you see errors Google it and use the Stack Overflow conversations. In 99% of cases, you will find an answer.
Pull the data from source
This is where we connect to an Excel CSV (or text) file. In practice, most people connect to a database of some kind. The R-script I am using connects to a .csv file, but I have added the code to a connect to a SQL database.
That code snippet is commented out, so feel free to substitute your own sql database links. The code will also work for Amazon EC2 cluster.
Some points to keep in mind:
- If you are using sqlquery() then please note that if your query has an error then R sadly shows only a standard error message. So test your query on SQL server to ensure that you are not missing anything.
- Some queries do take a long time, if you are pulling from a huge dataset. Also the time taken will be longer in R compared to SQL server direct connection. Using the Sys.time() command before and after the query is helpful to know how long the query took to complete.
- If you are only planning to pull the data randomly, it may make sense to pull from SQL server and store locally. Use the fread() function to read those files.
- If you are using R desktop instead of R-server, the amount of data you can pull may be limited to what your system configuration.
- ALWAYS optimize your query. Even if you have unlimited memory and computation power, only pull the data you absolutely need. Otherwise you end up unnecessarily sorting through irrelevant data.
Cleaning and calculations
For the current data, there are no NAs, so we don’t need to account for those. However, the read.csv() command creates factors, which I personally do not like, as they sometimes cause issues while merging.
Some of the column names have “.” where R converted the space in the names. So we will manually replace those with an underscore using the gsub() function.
We will also rank the apps based on categories of interest, namely:
- Most Popular Apps – by number of Reviews
- Most Popular Apps – by number Downloads and Reviews
- Most Popular Categories – Paid Apps only
- Most popular apps with 1 billion installations.
We are going to use the pdf() function to paste all graphs to a pdf document. Basically what this function does is write the graphs to a file rather than show on the console. So the only thing to remember is that if you are testing graphs or make an incorrect graph, everything will get posted to the pdf until you hit the “dev.off()” function. Sometimes if the graph throws an error you may end up with a blank page, or worse, with a corrupt file that cannot be opened.
Currently, the code I am only printing 2 simple graphs using ggplot() and barplot() functions, but you can include many other plots as well.
Create Excel file.
The Excel is created in the sequence below:
- Specify the filename and create an object of type .xlsx This will create an empty Excel placeholder. It is only complete when you save the Workbook using the saveWorkbook() at the end of the section.
- Use the sheets() to create different worksheets within the Excel.
- The xlsx.addHeader() adds a bold Header to each sheet which will help readers understand the content on the page. The r2excel package has other functions to add more informative text in smaller (non-header) font as well, if you need to give some context to readers. Obviously, this is optional if you don’t want to add them.
- xlsx.addTable() – this is the crucial function that adds the content to Excel, the main “meat” of what you need to show.
- saveWorkbook() – this function will save the Excel to the folder.
- xlsx.openFile() – this function opens the file so you can view contents. I typically have the script running on automated mode, so when the Excel opens I am notified that the script completed.
The email is sent using the following functions:
- OutApp() – creates an Outlook object. As I mentioned earlier, you do need Outlook and need to be signed in for this to work. I use Outlook for work and at home, so I have not explored options for Slack or other email clients.
- outmail[[“To”]] – specify the people in the “to” field. You could also read email addresses from a file and pass the values here.
- outmail[[“cc’]] – similar concept, for the cc field.
- outmail[[“Subject”]] – I have used the paste0() function to add the current date to the subject, so recipients know it is the latest report.
- outMail[[“HTMLBody”]] – I used the HTML body so that I can embed the image. If you don’t know HTML programming, no worries! The code is pretty intuitive, you should be able to follow what I’ve done. The image basically is an attachment which the HTML code is forcing to be viewed within the body of the email. If you are sending the email to people outside the organization, they may see a small box instead of the image with a cross on the top left (or right) of the box. Usually, when you hover your mouse near box and right click, it will ask them to download images. You may have seen similar messages in gmail, along with a link to “show images” or ‘always show images from this sender’. You obviously cannot control what the recipient selects, but testing by sending to yourself first helps smoothing out potential aesthetic issues.
- outMail[[“Attachments”]] – function to add attachments.
- outMail$Send() – until you run this command, the mail will not be send. If you are using this in office, you may get a popup asking you to do one of the following. Most of these will generally go away after the first use, but if they don’t, please look up the issue on StackOverflow or contact your IT support for firewall and other security settings.
- popup to hit “send”
- popup asking you to “classify” the attachments (internal / public/ confidential) Select as appropriate. For me, this selection is usually “internal”
- popup asking you to accept “trust” settings
- popup blocker notifying you to allow backend app to access Outlook.
That is it – and you are done! You have successfully learned how to send an automated email via R.