Getting Started with Analytics Labs
Introduction Overview: Oracle Analytics Hands-On Labs
Description
Connect to the right data, at the right time. Leverage the most complete platform in the cloud analytics market Choose your own journey to the cloud. Deliver the technology your business stakeholders need to become innovators with Oracle Analytics Cloud. Collaborative, providing a efficient methods to interact and share information.
The Oracle Analytics End-to-End Hands-on Lab includes five lab tasks designed to help the learner understand the capabilities that Oracle Analytics delivers to customers. These lab tasks are designed to give the learner basic hands-on experience with Oracle Analytics.
To support this hands-on lab learning experience, we will employ a business case scenario. KoolKart is an e-commerce company that sells 4 categories of products, and they are looking for innovative ways to increase revenues.
Over the past 6 months KoolKart has launched several social media campaigns.
We want to investigate the effectiveness of those campaigns and ensure that their resources are being effectively utilized.
Lab Objectives
- Get started with Oracle Analytics
- Understand the core features of Oracle Analytics
- Blend and wrangle Data
- Visualize data and present insights
- Use Machine Learning within Oracle Analytics
Lab Overview Video
- Watch the video for a description of the lab
Intended Audience
- Anyone interested in Oracle Analytics
Prerequisites
- N/A
Resources
Contact us
Having an issue with the lab? Have an idea on how it could be made better? Want to tell us how awesome the lab is? Click the icon below to contact the team and let us know your feedback.
Task 1 - Getting Started with Oracle Analytics
This section focuses on the user interface.
Objectives
- Starting an Analytics instance
- Learning and getting acclimatized to the Analytics interface
Oracle Analytics
Oracle Analytics Cloud provides fast and flexible analysis of any data from any source. It is built on the industry-leading Oracle Business Intelligence platform and OCI.
Oracle Analytics delivers scalability, high availability, state-of-the-art security, and operational simplicity. This combination of proven technologies, world-class infrastructure, broad data access, and deep analytic capabilities makes Oracle Analytics Cloud the best solution for every user.
The goal of todays workshop is to introduce you to the visualization capabilities. We will be analyzing data using local Excel files.
Creating visualizations is easy. Your data analysis work is an individual experience in exploration and discovery that can also be shared with other users. Visualization enables you to experiment with a wealth of different options for how to view your data. During this experimentation process, you can find correlations, discover patterns, and see trends in your content.
Data Visualization provides you with tools for faster and simpler assembly of detailed reports arranged together in an appealing and meaningful display. Data Visualization goes even further, to give you dynamic views for focused, exploratory interaction with your data.
Visualization provides the following:
Guidance: Grammar-centric approach to visualizations combined with powerful keyword search and pattern detection to aid all users making new discoveries.
Richness: Robust visualization library and streamlined dashboard construction provide all the tools needed for constructing sophisticated analysis across many different perspectives of data.
Visual Grammar: Visualizations automatically created and updated by applying visual grammar to data selections made by user. All visualization types share foundation in visual grammar.
Keyword Search: All relevant artifacts are indexed for search. Unfamiliar data models can be intuitively accessed using keywords.
Pattern Brushing: Sophisticated technique to highlight correlations between visualizations. Patterns highlighted across all components on the canvas.
Data Blending: Combining two or more data sources for analysis.
Now letʼs get started!
Provisioning an Oracle Analytics Instance
- Click the OCI Console service menu at the top left, select Analytics & AI and then click Analytics Cloud.
- In the List scope section on the left side, click Choose a compartment and use the +/- signs to navigate to your assigned compartment under Luna-Labs. Then click the Create Instance button.
- Use the following information to fill out the Create Analytics Instance form then click Create
Field | Information |
---|---|
Name | OACLAB |
Description | Optional |
Create in Compartment | Verify your assigned compartment is selected |
Capacity | OCPU |
OCPU Count | 1 (Non-production) |
License | License Included |
Edition | Enterprise Edition |
- The process to create your instance will take about 10 – 15 minutes. While the instance is being prepared, the OAC icon will be orange with the label CREATING.
- When the instance is ready the icon will turn green and the label will read ACTIVE. Click the Analytics Home Page button to access the instance.
Note: You can also use the url listed under the Access Information field to access your deployed Analytics instance from anywhere. But remember it will only exist while this lab is running.
Reviewing the Home page and the primary menus
On logging into OAC, you will see the home page.
Start by clicking the stack or action menu in the top-left of the UI.
This will open the drawer menu.
You can use this menu to navigate through the application.
The application follows standard web and application interface protocols, thus supporting both left and right click interactions. In terms of general navigation, there are 4 key menus accessible at the top of the screen or via a stack menu in the top left (not all screens will show the top menu bar but the stack navigation is always available).
Home: Application start up page from where you can view existing projects, data sets, data flows or create new ones.
Catalog: Collections of visualizations and the underlying data sources. Folders are simply means by which to organize projects.
Data: Display or create Data Sets (instances of data such as a specific Excel file). Ability to curate data from data sources including adding calculations, merging multiple sources, managing columns, and more.
Machine Learning: This page shows all the available machine learning models ready for use in projects.
Jobs: This page shows all the status of data replication, data flow, and sequence operations.
Console: Administrative menu for managing Custom Plugins (e.g., new types of analyses obtained from the Oracle Analytics Store or custom built), Maps layers (e.g., new backgrounds for map-based analyses) and other administrative tasks.
At the top-right of the Oracle Analytics Cloud home page UI, click the Create button.
This window allows you to create a Visualization project, a data set, a connection to an external source, a data flow, or a sequence.
Task 2 - Intro to Analytics Core Features
This section walks you through the analysis of a single data set.
Objectives
- Creating workbooks with data sets and visualizations
- Understanding the visualization capabilities
- Telling a story with annotated visualizations
Creating a Workbook with a Data Source and Visualizations
Scenario
KoolKart is an e-commerce company that sells 4 categories of products and is looking for innovative ways to increase revenues. As part of that initiative, you and your team have launched several social media campaigns within the past 6 months.
Given the level of investment your company has made, you are looking to ensure that resources are being effectively utilized. Hence, you are investigating the effectiveness of those campaigns.
After reaching out to the sales team, you have been provided with a spreadsheet with thousands of rows of information and are struggling to make heads or tails of what you are evaluating. Let's see how Analytics can help you!
Download the spreadsheet containing the sales figures used in this lab
- Download the KoolKart Sales Data.xlsxfile.
http://api-v1:3000/api/v1/labs/a1fc2175-9720-4345-8cb8-c0270a08c9d1/gitlab/tutorial/files/KoolKartSalesData.xlsx
Click copy the link to the file and in Luna you will have to open a new tab on your browser window and see if you can paste the link. The files are saved under Home/Download.
Create a Workbook to evaluate the sales data and upload the Excel file.
- Select Create and then Workbook from the Home screen.
-->
Every workbook requires a data source, so you will be prompted to add one.
- From the Add Data window select Create Dataset on the top right to load the Excel file.
If you had created connections to external systems (e.g. an Oracle Database), you would see those as options here. For the purpose of this lab, we will just upload a file.
- To create the dataset, select Drop data file here or click to browse.
- Navigate to select the KoolKart Sales Data.xlsx file which is stored in the Downloads directory.
Following the upload of the file, which will be available for other workbooks as a Data Source, you have the ability to make changes to the data so it can be easily used within analyses.
- Click OK then click Save As and name the dataset KoolKartSalesData.
- To create the workbook, Click Create Workbook.
- Select Save As and name the workbook KoolKart.
Validate the uploaded data
- Toggle the Data workbook tab to begin validating the data and review recommendations.
- Open the KoolKartSalesData Excel image by clicking the ellipsis and selecting Open.
The system will determine if the columns are Attributes or Measures.
Attributes: Indicated by A, attributes are generally non-numeric fields.
Measures: Indicated by #, measures are generally numeric fields.
If a field is determined to be a Measure, the system will also determine the default aggregation method for when it is used in an analysis (e.g., average, sum, count). If desired, you can change an Attribute to a Measure and vice-versa. For Measures, you can also change the default aggregation rule.
Clock: Fields may also be indicated by a Clock icon. These will be your date and time-based fields, which can be handled differently in certain visualizations. However, they still need to be defined as Attributes or Measures. If the Clock icon appears, the date is being treated as an Attribute, otherwise it will have a # icon.
No changes are needed for this data.
Oracle Analytics uses its intelligence to recommend changes for the dataset. These system-generated Recommendations can be used to enrich your dataset for analysis.
The recommendations for KoolKartSalesData are on the right side of the screen.
- If system generated recommendations are not displayed on the right side of the screen, click the Toggle Column Action Panel on the bottom right of the screen.
Selecting a column will filter the recommendations to show only those related to that column.
- Select Customer Country to see the recommendations for the column.
Although, we won't be using the recommendations now, it is a very powerful feature which we will take advantage of in the last lab task.
Create an analysis of monthly revenues
- Click the Back Arrow on the upper left.
Select and open your KoolKart Workbook from the OAC home page.
Click Edit (top right of the screen) to enter Author Mode.
- Select the Visualize workbook tab (on top of the screen).
- From the Data Panel on the left side of the screen, select the data element Month. Press and hold the Control Key and click the data element Revenue. Then, right click and select Create Best Visualization.
As an intelligent analytics platform, the system is able to provide recommendations for how best to visualize information. The system has determined that a vertical bar chart is the best way to display the information that you are evaluating.
- If you would like to change the graph type, click the graph icon and select Horizontal Bar Chart.
Before we continue, let's take a step back and better understand the interface of the Visualize tab and its capabilities.
Understanding the Visualize Interface
- Review the Visualize interface and menus.
We have previously established that a Workbook consists of data sources, and visualizations related to those data sources. We have already walked through the process of adding a data source in the previous section. Now, we will learn more about visualizations.
Visualizations are displayed on a canvas. A Canvas can contain multiple visualizations and a Workbook can include multiple Canvases.
When you open the Visualize interface, you should see the Canvas in addition to a few other menus. The Data Panel will be on the left side. To the right of the Data Panel is the Grammar Panel. To the right of the Grammar Panel is a Canvas (in this case with a single visualization). The Workbook Tabs will appear at the center top of the screen.
Let's start by reviewing the Workbook Tabs, which allow you to switch between editing data sources, creating visualizations, and building a story about the visualizations.
You can switch between the tabs as needed and each has a specific purpose:
Data – Validate and make changes to the data you wish to use in your visualizations.
Visualize – Create your visualizations.
Present – A nice feature where you can view, compile, and manage meaningful "stories" from your visualizations by picking certain ones to highlight and add commentary.
On the left side of the screen, the Data Panel displays the available Data Sources, Data Elements, Analytics, or Visualizations within the workbook (depending upon which one has been selected).
The icons on the top filter the items listed. From top to bottom they are:
Data – Displays the actual data available for analysis (e.g., the columns from your data sources).
Visualizations – Visual elements available for use in the Canvas.
Analytics – Provides access to advanced analytics that can be added to visualizations such as trend lines, clusters, and outliers.
Based on which icon is selected, you will see different results. The example below shows the Data Elements in our current workbook.
The Search box allows you to search through the data sets. For example, you can search for Data Elements that are related to the customers by typing in customer.
- Click the X to remove any filters (please do so, now).
Now that you have been exposed to the core features of the interface, let's dive into some further analyses.
Analyzing Sales Figures by Product and Customer Segments
Thus far, you have brought data in and generated a basic analysis of revenues over the past six months. However, this is a basic analysis and, as CMO, you really need to dig further into the data. Specifically, you are interested in seeing how different product categories are performing and how different marketing segments (e.g., gender) may have an impact on those figures.
Create a breakdown of revenue by product category.
- Drag and drop Category into the Color property of the previously created bar chart.
You can now see monthly revenue broken out by product category.
Built-in intelligence can even attempt to determine the best way to add Category to your visualization by simply double-clicking it.
You can also interact with the results further by hovering over various categories. For example, if you move your mouse over the column of 2015-11 (which represents Electronics & Computers), you will see all other categories dimmed for easier viewing. You will also be able to see more detailed information on the actual data (e.g., the actual number of orders).
After looking at each product category, you can now see that Electronics & Computers and Clothing & Shoes categories are consistently the major contributors to sales over the past 6 months.
This helps in better understanding which categories drive the most revenues, but you still need to better understand your customer demographics.
Create a new type of visualization to evaluate revenues by age group.
- From the Data Panel select Customer Age Group. Press and hold the Control(Windows) or Command(Mac) key and click Revenue. Now, right click and select Pick Visualization.
- Select Donut.
As you can see, there is a great variety of visualizations typed available. We won't be covering every one, but we have made an attempt to provide a few examples during the course of this lab. Please feel free to try out the different types!
You should see the new visualization, automatically placed next to the bar chart.
It looks like our customers in the 25-31 and 32-39 age range are the largest contributors to our revenues in the past 6 months. Let's look at revenues over time to see if there is a noticeable increase over time (which we'd hope based upon our investments in the social media campaigns).
Create a new type of visualization to evaluate revenues by date (e.g., to view a trend over time as opposed to just monthly totals).
- From the Data Panel select Date. Press and hold Control(Windows) or Command(Mac). Select Revenue. Now, right click and select Create Best Visualization.
You should now see a canvas that looks like the following:
It looks like we have seen some increased revenues, though we will likely need to drill in even further to identify the impact of the social media campaigns. However, before we do that, let's "clean-up" the canvas so it's easier to understand in case we need to share the results.
Formatting Visualizations and the Canvas
Rearrange the visualizations to make them easier to view.
- Click the Revenue by Date timeline visualization. A thin border will indicate that the image has been selected. Hover your cursor near the top border until you see a 4-way arrow icon.
- Left click and drag the visualization into the top right of the canvas above the Revenue by Customer Age Group visualization until you see a thick green line.
Make sure that the thick green line does not extend above Revenue by Month, Category, otherwise the visualization you are moving will appear above both the bar chart and donut.
Click the Customer Age Group visualization. Just like the previous time, the image will have a border when selected. Hover your cursor until you see a 4-way arrow icon.
Left-click and drag the visualization into the bottom left of the canvas under the Revenue by Month, Category visualization until you see a thick green line.
Your canvas should look like the following:
Expand one of the visualizations.
Click the Revenue by Date visualization. Hover your cursor over the left border until you see the double-sided arrow icon.
Left-click and drag the edge of the visualization to the left to increase the size. Your canvas should now look like the following:
Adjust the project colors used in the visualizations.
- Select Workbook Properties from the Workbook Menu on the top right of the screen.
.
- Click Default in the Color Series option.
- Scroll down and select Tokyo. Then select OK.
All the visualizations will now reflect the new color scheme, across all canvases!
You may have noticed that in addition to a wide variety of options and configurations, there is also the ability to create custom palettes, which enables users to display results however they wish. For this example, we made changes at the project level for a quick and easy change, but colors and display options are also available at the visualization level to allow for full flexibility.
Reset the color palette.
To avoid confusion with coloring in prior exercises, let's go ahead and reset the color palette.
- Select the Workbook Properties option in the Workbook Menu in the top right of the screen and select Tokyo in the Color Series, then select Default (Redwood) and select OK.
Rename the Canvas.
Since Workbooks can have multiple Canvases, it makes sense to give each canvas a descriptive name.
- Select the down arrow icon in the Canvas 1 tab on the bottom left of the screen.
- Rename Canvas 1 with a meaningful name such as Revenue Analysis by clicking in the box and typing it in. Press Enter to confirm your change.
Analyzing Visualizations with Trend Lines and Trellis Rows
As mentioned earlier, as the KoolKart CMO, you have now started to evaluate your revenue data. However, you want to make sure that it has some relation to the social media campaigns that you launched during November, January, February, and April. At first glance, it does look like revenues are increasing following those campaigns. But you want to make sure things are also trending upwards overall.
Luckily, Oracle Analytics can help. So, let's take a closer look at our numbers!
Add a trend line to your Revenue by Date visualization.
- Right-click anywhere in the Revenue by Date visualization and select Add Statistics, then select Trend Line.
You should now see a trend line within the Revenue by Date visualization.
- Click the Trash Can icon in the Grammar Panel to remove the trend line.
Upon adding the trend line, hopefully you also notice the ability to add other types of analyses such as: clusters, outliers, reference lines and forecasts, as well as the ability to fine tune those analyses. The platform is highly intelligent and robust, and allows for the deepest levels of analysis, some of which we will cover in later tasks.
It looks like revenues are trending upwards and have been healthy for the past 6 months! But let's see if that holds true for all product categories or if our campaigns have had more success in certain areas than others.
Further analyze your Revenue by Date visualization by breaking it out, according to product category.
Trellis Rows and Trellis Columns are mechanisms by which we can breakdown an analysis by creating multiple versions of that analysis for distinct entities. In the example being used, there is an initial line chart looking at revenues over time. Adding a Trellis Row for product category creates a line chart of revenues for each product category. See below for a screenshot.
- Make sure the Revenue by Date visualization is selected by clicking it. Now, select Category from the Data Panel on the left. Drag and drop it into the Trellis Rows property of the visualization.
Your line chart should now be comprised of 4 rows, each of which is its own line chart for a specific product category.
Now, this is interesting!
We knew overall revenues were trending upward and that certainly seems true for the Books & Audible and Movies & Music categories. However, Clothing & Shoes seems to have remained flat while Electronics & Computers are trending down! Electronics & Computers makes up a good portion of our overall revenues too, so we should really take a closer look at all of this.
We should talk with our team, but what is the best way to share this information?
Sharing a Story with Visualizations and Insights
Add an Insight to your project.
- Click the Present workbook Tab in the top center of the screen.
You will see the following screen:
Create a presentation for your Insight.
- Click the canvas menu icon and select Add To Story.
The Canvas Properties Panel can be found on the lower left of the screen.
This panel exposes properties such as the Page Title and Description.
- Click the Description box in the properties panel and type Clothing & Electronics are not following the overall sales trend in the description area. Click anywhere outside the box.
Let's also give the Insight a meaningful name which by default is the canvas name.
- Click Page Title in the panel if you would like to change the name to something like Clothing & Electronics.
The Insight is now ready to be shared, which we will cover later. If you have not already saved your workbook, let's make sure to save our work at this point.
Save your workbook.
Select the Save button from the Workbook Menu in the top right of the screen.
Select Save As in the pop-up menu and type in KoolKart Sales as the Name and click Save.
Congratulations! You have now successfully created compelling visualizations with Oracle Analytics. Proceed to the next task to learn and try even more robust features!
Modifying visualizations properties
Now, let's review the property panel for accessing visualization properties. You will find yourself using them in the upcoming labs, so now is a good time to visit them.
- Click the Visualize workbook tab.
- Click the Properties icon in the Grammar Panel.
We now see properties for the selected visualization on the canvas. Changing or modifying default properties is a great way to add value to visualizations.
General: Allows you to edit the title, visualization type, and line type.
Edge Labels: Allows you to edit labels within your visualizations such as Trellis Rows or Trellis Columns.
Axis: Allows you to adjust the labels for different axes on your visualization. This includes the labels axis, values axis, and secondary values axis.
Values: Allows you to adjust the measures you have included in your data set. This includes the aggregation method, number format, and Y2 axis (if it applies).
Date/Time Format: Allows you to adjust how date/time column data is shown in your visualizations.
Filter: Allows you to filter your visualizations.
Task 3 – Analytics Data Blending
In order to complete all the steps of this task, you must have completed Task 2.
Objectives
- Learning to Wrangle Data Sets
- Joining Multiple Data Sets for analysis
Scenario
As KoolKarts CMO, you have been evaluating revenue data to see if your team's social media campaigns over the past 6 months have had a positive impact.
So far, you have found that Clothing & Shoes and Electronics & Computers product categories are the main contributors to sales in the past 6 months. However, they are not following the overall positive sales trends despite your social media campaign efforts.
This is concerning. You want to look at what is going on in the social media campaigns to understand the reason behind this.
Let's look at the past 6 months of Oracle Data as a Service (DaaS) feeds on a KoolKart topic mentioning its product categories. These feeds come from numerous public social media sources including Facebook, Twitter, and others.
Adding Additional Data Sources to a Workbook
- Download the DaaS Social Feeds-KoolKart.xlsxfile.
http://api-v1:3000/api/v1/labs/a1fc2175-9720-4345-8cb8-c0270a08c9d1/gitlab/tutorial/files/DaaSSocialFeedsKoolKart.xlsx
This is the same process that you performed earlier when downloading the [KoolKart Sales Data.xlsx] file.
Click copy the link to the file. In Luna you will have to open a new tab on your browser window and see if you can paste the link. The files are saved under Home/Download.
Before adding another data source to your workbook, the dataset must be created.
- Click the Data workbook tab.
- Click the Add Dataset button on the upper right.
- Select Create Dataset.
- Select Drop data file here or click to browse.
When you upload a file, it is treated as a dataset. When you add a new data source to an existing workbook, the system displays all the datasets previously uploaded. This feature allows re-use of datasets previously uploaded for other workbooks. For this task, we need to upload a new spreadsheet.
In the file browser that pops up, select the DaaS Social Feeds-KoolKart.xlsx file.
This file contains Indicator words from posts to your social media pages or posts mentioning your business. It includes the total number of times that the Indicator word was mentioned and whether it was mentioned in a positive tone, neutral tone, or negative tone. It is broken down by Date and Category. We are going to look at this data in conjunction with the already loaded revenue data to see if we can identify how social media might be impacting sales.
- Click OK.
Save the dataset with an appropriate name.
Click the back button to access the workbook.
Open the workbook.
Click the Edit button on the upper right to enter Author Mode.
- Click the Data workbook tab to add the newly created dataset to the workbook.
- Click the Add Dataset button on the upper right.
Select the DaaS Social Feeds-KoolKart dataset.
Click Add to Workbook.
The Data Panel now displays data elements from both datasets. In addition, notice that the system was intelligent enough to see that the Date column is common between the first dataset (i.e. KoolKart Sales spreadsheet) you uploaded into your workbook and the new dataset (DaaS Social Feed KoolKart spreadsheet) that was just uploaded. Therefore, the system sets this column to Match. This, in turn, will enable you to create visualizations combining attributes and measures from both datasets at the same time.
No further changes are needed with this step. This is just to allow you to recognize that the system will auto match datasets where possible.
Now that we have added a second dataset to the workbook, be sure to save the workbook.
- Save the workbook.
Updating Aggregation Logic within a Data Source
Modify how Analytics will aggregate some of the fields by default.
Oracle Analytics will try to identify the best aggregation method (e.g., Average, Sum, Count) for Measures within a dataset. However, if you want to use an alternate aggregation method, the system allows you to set the logic for every column. Now, we want to look at averages as opposed to sums when evaluating what percentage of social media comments are of a certain tone.
- Open the new DaaS Social Feeds-KoolKart dataset with a right click and select open.
- Click the Percent Positive Tone column header.
- From the Percent Positive Tone properties in the Data Panel, select Sum (to the right of Aggregation ) and then select Average to change the aggregation method.
- Repeat the above steps for Percent Neutral Tone and Percent Negative Tone.
Because we will be using Total Number of Snippets in a visualization, let's check how it is aggregated.
- From the Total Number of Snippets check that the aggregation method is Sum.
- Save the workbook.
Update an Attribute to a Measure.
- Upon importing data, Oracle Analytics will automatically identify numeric columns as measures. If the Total Number of Snippets column is not a measure, continue with the steps below. Else, skip this step.
There are two methods to do this:
- Select the A icon on the Total Number of Snippets header and select Measure.
- or go to the Data Panel, select Treat As and change it to Measure.
- Save the workbook.
Add a new visualization to a new Canvas.
- You can always use the back button to return to the Oracle Analytics homepage.
- Open the workbook and click the + sign next to the current canvas name at the bottom of the screen.
- Select the Visualize tab at the top right of the screen.
- Select Mentions Category from the DaaS Social Feeds KoolKart data source from the Data Panel.
Hold down the Control key and select Total Number of Snippets. Right-click and select Create Best Visualization.
The system will have determined that a Bar Chart is the best option for displaying this data.
We seem to have enough volume of activities (i.e., snippets) for each category, so that's good. However, the format of the Mentions Category data seems to be different from the format of the Category data in our KoolKart Sales Data (e.g., Clothing, Shoes vs. Clothing & Shoes). So, it looks like we need to figure out a way to "adjust" the data.
Wrangling Sales Data to Align it with Social Media Data
Modify the data in the social media data source to allow it to be merged with the sales data.
Data wrangling enables you to manipulate the data in an existing raw dataset for easier consumption by tools such as analytics and reporting platforms. This eliminates the task of having to go back and make changes to your source tables/files (in our case, the Excel file), and then re-uploading again.
- Select the Data workbook tab.
- Click the Edit Icon, pencil icon to edit the KoolKart Sales Data dataset.
As previously stated, the way Category is displayed here (i.e. with ampersands aka &) is different from how it is displayed in the DaaS Social Feeds KoolKart dataset.
- Select the Ellipsis icon in the Category column of the KoolKart Sales Data dataset.
- Select Edit.
You will see an expression editor, which we will use to replace the ampersands in Category with commas.
We will walk you through the exact expression for this lab. Please notice the editor supports a wide range of functionality for structuring data to suit your analytical needs.
- Delete the existing content in the Expression Box (e.g., Category). Expand the values under String in the Function Menu.
- Scroll down and select (e.g., double click) Replace.
You can also search Replace in the search bar.
The system will show you the required format of the function. Click the first occurrence of expression and type in "Categ". The pop-up menu will provide suggestions.
Click the A Category KoolKartSalesData suggestion. The A indicates that the data item is an attribute, and Category is from the KoolKartSalesData* dataset.
Click the next occurrence of expression and type in '&' (include the single quotes).
Click the next occurrence of expression and type in ',' (include the single quotes).
The expression should now appear as: REPLACE(Category, '&', ','). The function should be validated, automatically.
- If not, click the Validate button.
- After validation, click Add Step to apply your changes.
The Category field should now appear with commas.
- Save the workbook.
Using Data Source Diagrams and Creating Additional Join Logic
In the last exercise, we modified some of the data in the KoolKart Sales Data dataset to ensure that it aligned with the DaaS Social Feeds-KoolKart dataset. Now that Category / Mentions Category appears in a similar way in both datasets, we can tell the system to join the two datasets.
Let's build an additional join (e.g., matching logic) between the KoolKart sales and KoolKart social feeds datasets.
- Click the Data workbook tab at the top of the screen.
The workbook Data Diagram is now displayed.
The Data Diagram displays a visualization of the datasets being utilized within the workbook. Currently, we are using two Excel files, therefore you will see those represented. You can also see the joins between the two datasets. The system automatically matched the files according to date. The number 1 indicates that there is one join between the datasets. Now that we have made some changes to the data, we can also join the datasets on Category and Mentions Category.
Click the number 1 in the diagram.
In the Blend Data pop-up, click Add Another Match.
You can now identify the logic to join the datasets.
- In the first column of the new row, use the dropdown to select Mentions Category.
- In the second column, select Category.
You may have noticed that only Attributes are displayed. This is by design as Measures are for aggregation - not dimensional analysis or joins. If need be, you can always modify a column or create a copy of it in another format (e.g., a dataset could have a column defined as an Attribute and a copy of that column defined as a Measure with count aggregation).
- Click OK.
The data source diagram will now show there are 2 joins between the datasets.
Task 4 – Visualize and Present Insights
Objectives
- Learn how to use calculations
- Using different visualizations to obtain insights that tell a meaningful story
Analyzing Sales and Social Media Data
Creating a calculated field for use in visualizations.
Now, we have data that spans both social media feeds and revenue figures. This will help us dig into their relationship.
After creating some visualizations, we will now dive into more advanced capabilities. For starters, activity on social media is subject to large spikes. To make it easier to evaluate, we can convert it to a logarithmic scale.
Large and varied datasets such as social media activity are often best handled using a log scale, where values are evaluated based upon multiplicative factors as opposed to static values.
- Navigate to the Visualize workbook tab.
- Click the + icon on the top right of the Data Panel, then click Add Calculation.
In the Expression Editor, type in Activity(log scale) as the Name.
Enter the following expression into the Expression Box: log(Total.
You must type this in for the auto-complete suggestions to display. You could also use the function list on the right and field search as demonstrated in previous exercises.
- Select Total Number of Snippets from the pop-up.
Select Validate.
Assuming that the calculation was valid, select Save.
Use the calculation in a visualization to analyze social media volume on a logarithmic scale.
Select the Canvas on the bottom of the screen.
Click the drop-down arrow and then select Clear Canvas on the pop-up menu.
- Select Date from the DaaS Social Feeds-KoolKart dataset in Data Panel.
Press the Control(Windows) or Command(Mac) key and also select Activity(log scale). Right-click and select Create Best Visualization.
The system will determine that a line chart is the best way to visualize this data.
Now, create a copy of the visualization, so that we can quickly analyze the date information in our dataset.
- Right-click in the white space of the visualization (or use the Visualization Properties Menu in the top right of the visualization). Select Edit then Copy Visualization.
- Right-click in the white space of the visualization (or use the Visualization Properties Menu in the top right of the visualization). Select Paste Visualization.
You should now have two copies of the line chart on your canvas.
- Rearrange the visualizations by dragging one above the other.
- Click to select the Activity(log scale) visualization, then select Revenue from KoolKart Sales Data in the Data Panel (make sure nothing else is selected) and drag it directly on top of Activity(log scale) in the Grammar Panel under Values (Y-axis) of the top visualization.
Doing so should have replaced Activity(log scale) with Revenue. If it was added as another Y-axis measure, just hover over the Activity(log scale) measure and click the X that appears in the right corner to remove it.
In our earlier analysis, we established that although certain product categories have negative revenue trends, the overall trends are positive and would appear to correlate positively with our social media campaigns.
In this visualization, we can see the Revenue by Date and Activity (log scale) by Date visualizations seem quite close in terms of trending, thus helping validate the hypothesis that the social media campaigns were helpful overall.
Now, we will drill into the categories that were not performing well to see if we can find out what is going on. Specifically, let's look at Clothing & Shoes and Electronics & Computers.
- Include Category in the visualization as a Trellis Column. Select Category from KoolKart Sales Data in the Data Panel (make sure nothing else is selected) and drag it into the Trellis Columns property.
Your visualization should now look like the following:
Add Category to the Activity(logscale) by Date visualization.
- Select the Activity(log scale) by Date visualization, then select Category from KoolKart Sales Data in the Data Panel (make sure nothing else is selected) and drag it to the Trellis Columns property.
Letʼs look at the visualizations.
In looking at the Activity(log scale) by Date, Category visualization, the overall trend is flat - meaning there isn't much social media activity for the Clothing, Shoes category despite our social media campaigns. Revenues for that category also seems flat.
Surprisingly, there's quite a bit of social media activity for the Electronics, Computers category and it is growing, but the revenues are trending down.
This is confusing! We should check the tonality/sentiment of the social media activities because while there is social media activity, it does not seem to be all positive.
Create a new visualization to evaluate the tone of social media posts.
- Select the Activity(log scale) by Date, Category visualization. Right-click in any blank area and select Edit then click Copy Visualization.
Right-click below the Activity(log scale) by Date, Category visualization and select Edit then click Paste Visualization.
Select the copy of the visualization that you have just created.
Then, select Percent Positive Tone (make sure nothing else is selected). Press and hold Control(Windows) or Command(Mac) and select Percent Neutral Tone and Percent Negative Tone.
Drag and drop them directly on Activity(log scale) in the Values (Y-axis).
This should have replaced Activity(log scale) with the 3 new columns. Instead, if they were added as additional Y-axis measures, just hover over the Activity(log scale) measure and click X to remove it.
Change the visualization type.
- Select the drop-down menu next to the current visualization type (which is Line) and select Area.
The visualization should look like this.
Letʼs take a closer look at this visualization.
- Select Maximize Visualization in the top right of the visualization.
There seems to be a lot of negative sentiment towards Electronics & Computers, starting in the 2nd week of January. We should analyze the indicators to try to further understand the problem.
Evaluate the tone of specific social media posts using a Tag Cloud visualization.
- Click the X on the upper right of the visualization to minimize it.
- Select Indicators. Press and hold Control(Windows) or Command(Mac). Select Total Number of Snippets. Right-click and select Pick Visualization.
- Select Tag Cloud.
A tag cloud visualization is an interesting way to breakdown KoolKart's most common Indicators on social media.
Let's narrow our focus down to just Electronics, Computers and filter the dates to after mid-January.
- Select Mention Category, drop it on Filters in the Grammar Panel and select Electronics, Computers.
Click outside the filter to exit the menu.
Select Date and drag it into the Filter area. Don't replace the Mentions Category filter.
Set the Start Date (using the calendar or by typing) to January 1, 2016. Set the End Date to April 29, 2016. Click anywhere outside the filter to exit the menu.
Color the Tag Cloud by intensity of negative sentiments.
- Select Percent Negative Tone and drag it into the Color property.
We now have a visualization of Electronics, Computers related social media posts between January and April 2016 focused on identifying issues from negative feedback.
In looking at the size of the tag, we can see key indicators like Defective, Delivery, and Recall are frequently mentioned and by looking at the darker color, we can see that Customer Service and MEDIA: Viewing seem to be contributing to the bulk of negative sentiment. This is another insight to share with the team.
Depending on the size of your monitor and your screen resolution, the tag cloud could be laid out slightly different, but relative size and color should be the same as above.
Finalizing the Project and Insights on Sales and Social Media Data
Data Action links are responsible for passing context values from visualizations as parameters to external URLs or filters to other projects. When a link navigates to a workbook, the data context is displayed in the form of canvas scope filters in the filter bar. The links data context may include attributes associated with the selections or cell from which the link was initiated.
- Accessing the Data Actions DialogBox. Click the Present button on the top of the screen.
- Hover over Canvas2 on the right side of the screen. Click the ellipsis then select Add To Story.
Give the Insight a meaningful name.
- Double-click the Title Box and type Social Media Sentiment.
- Navigate to the Canvas Settings and check the Description Box.
- Double-click the Description Text Box and type in the following 2 lines (or copy this and paste):
Seeing increased social media activity for Electronics & Computers starting mid-January.
Negative sentiment seems to be around Customer Service and MEDIA: Viewing.
- Click anywhere outside the box to save it.
Save the workbook.
- Select the Save Button. Select Save.
- Select the Preview button on top of the screen to present the findings using Presentation Mode.
This will remove all the editing sections from the interface and convert your Insights into a dashboard style presentation that you can share with others.
- Exit Presentation Mode by selecting Edit on top of the screen.
- Save the workbook.
You are now ready to share your visualizations as a file or you can print a hard copy.
- This can be done by clicking the Export button and choosing your desired option (for our purposes it is not necessary to export).
Task 5 – ML and Prediction
Objectives
- Learning how to create data flows using the Machine Learning capabilities of Oracle Analytics
What is Machine Learning?
Machine learning, at its most basic, is the usage of algorithms to parse data, learn from it, and then make a determination or prediction about something. The Machine Learning capabilities of Oracle Analytics allow you to make predictions based on your dataset. Data Visualization also provides scripts to train data models that you can add to other data sets to predict trends and patterns in data.
Scenario
When customers purchase products through a partner, KoolKart would like to include a discount coupon in the shipping confirmation email as an incentive for users to subscribe to the KoolKart mailing list. Now, Isla Stuart, who is a Digital Marketing Specialist, needs to understand the customer demographics and figure out how big the incentive should be.
To do this, Isla has already obtained 4 different data sets: Orders, Customers, Subscriptions, and Partner Sales. She wants to look at subscription patterns on the KoolKart website and develop a model that looks at past subscription data to determine how Customer Age Group, Customer Gender, Customer Country, and purchased Product Category influence the likeliness to subscribe.
Isla knows that she needs to use KoolKart Orders and KoolKart Customers data sets together. Most importantly, she decides to utilize the Data Flow functionality of OAC that allows her to create a data set tailored to her needs.
Data flows enable you to organize and integrate data to produce curated datasets that users can visualize.
Data flows can:
- Create a dataset
- Combine data from different sources
- Aggregate data
- Train machine learning models or apply a predictive machine learning model to data
Creating a Data Flow for KoolKart Orders and Customers
- Download the KoolKart Orders.xlsxfile.
http://api-v1:3000/api/v1/labs/a1fc2175-9720-4345-8cb8-c0270a08c9d1/gitlab/tutorial/files/KoolKartOrders.xlsx
This is the same process that you performed earlier when downloading the [KoolKart Sales Data.xlsx] and [DaaS Social Feeds-KoolKart.xlsx] files.
Click copy the link to the file. In Luna you will have to open a new tab on your browser window and see if you can paste the link. The files are saved under Home/Download.
Add Datasets and create a Data Flow.
Open the Oracle Analytics homepage.
Click the Create button.
Select Create Dataset.
Select Drop data file here or click to browse and select the KoolKart Orders dataset.
- Click OK to add the KoolKart Orders.
- Save the KoolKart Orders dataset.
We will need another dataset.
- Download the KoolKart Customers.xlsxfile.
http://api-v1:3000/api/v1/labs/a1fc2175-9720-4345-8cb8-c0270a08c9d1/gitlab/tutorial/files/KoolKartCustomers.xlsx
Again, this is the same process that you performed earlier when downloading the Excel files.
Add the KoolKart Customers file by selecting Create Dataset, followed by Drop data file here or click to browse and then selecting Add, like we did in previous steps.
Save the KoolKart Customers dataset.
Create a Data Flow.
From the Oracle Analytics homepage, click the Create button.
Select Data Flow.
- From the Add Data window select KoolKart Orders and click Add.
You are now in the Data Flow Editor and have one of the two datasets that will make up our data flow. Let’s add the second dataset.
- From the Data Flow Editor, click the Add Data icon and drag it to the editor.
From the Add Data window select KoolKart Customers and click Add.
The Data Flow Editor should look like this:
- Click Join and add it to the data flow at the + sign.
Ideally, Oracle Analytics should detect the joining logic, but in case it doesn't, you will see an Incomplete join message.
- Click the circle to complete the join.
If you've successfully made the join, the data flow will reflect the image below.
The Movies & Music category is not required for the analysis. So, we can filter it out of the Data Flow.
- Select Filter and drag it between KoolKart Orders and Join.
- Select the + and select the Category attribute.
- Select Movies & Music to move it to the Selections. Click the Exclude Selections box.
You will want to exclude duplicated data from the data flow. To do this, add Select Columns to the Data flow and exclude the duplicated CustomerID.
- Select Select Columns and drag it to the + after Join.
- Select Customer ID_1 and click Remove Selected.
Group Sales data into Regions.
- Select + next to Select Columns and then select Group.
- Click Select column and select Customer Country.
- Set the Group Name to Sales Region. Then, name the first group NA and add Canada and USA to the group.
Click + Group to add a second group and name it EMEA. Add France, United Kingdom, and Germany to the group.
Click + Group to add a third group and name it APAC. Add Japan to the group.
- Now, drag Save Data Set into the editor.
Save the data set as KoolKart Orders & Customers.
Set Order ID and Customer ID to attributes.
- Save the Data Flow by clicking the Save button in the top corner.
- Name the data flow KoolKart Dataflow.
- Click Run Data Flow to execute the data flow.
This message confirms that the data flow has run.
Using the KoolKart Dataflow for Sales Analysis
Return to the Oracle Analytics homepage.
Create a new workbook by clicking the KoolKart Orders & Customers dataset ellipsis menu or clicking the Create button and selecting Create Workbook.
- Click the Data workbook tab.
- Click the KoolKart Orders and Customer ellipsis menu to Open the dataset.
- Select the Customer Country column.
- Select the recommendation Enrich Customer Country with continent.
Notice, how the recommendation allows grouping of countries into sales regions with just one click. Since, we have already grouped our data, let's return to visualizing.
Save the dataset.
You still need to create this workbook. Click the Create Workbook button.
- Click the Visualize workbook tab.
Create Visualizations Order Amount by Date and Category.
- Select Order Amount and Order Date from the Data Panel, then select Pick Visualization.
- Select the Line chart.
- Double-click the Category attribute to add it to your visualization.
- Switch the chart from a Line chart to an Area chart.
- Click the arrow to the left of Order Date in the Data Panel. This will expose additional Order Date attributes.
- Select Month and drag it to Category (X-Axis) to replace Order Date.
Create a visualization for Order Amount by Category and SubCategory.
- Select Category, Sub Category, and Order Amount. Right-click and select Pick Visualization. Select Tree Map.
Create a Visualization for Order Amount by Order Date.
- Select Order Amount and Order Date. Right-click and select Create Best Visualization.
- Right-click the visualization and select Add Statistics then select Trend Line.
- Right click in the visualization and select Add Statistics then select Forecast.
The visualization should look like the graphic below.
- With the Order Amount by Order Date visualization still selected, go to the visualization settings in the Grammar Panel and select the arrow icon on the right that represents the Analytics option.
- Select the Forecast dropdown and enter 6 in the Periods textbox. Change the Model to ARIMA.
Your canvas should reflect the graphic below.
- Hover over the Canvas 1 tab with your cursor and select the downward arrow to Rename. You could also right-click to select Rename.
- Rename the canvas to "Sales".
- Save the workbook and name it "KoolKart Machine Learning".
Creating a visualization for Customer Analysis
- Click the + next to the Sales canvas at the bottom of the screen to add a new canvas to the workbook.
Create a visualization of Order Amount by Customer Age Group.
- Select Order Amount and Customer Age Group. Right-click and select Create Best Visualization which will display a bar chart.
- Change the chart type to a Donut chart.
Notice how Customer Age Group is now displayed as color slices of the donut.
- Select Order Amount and Customer Country. Right-click and select Pick Visualization.
- Select the Map Chart.
Your canvas should reflect the graphic below:
- Navigate to the property settings in the Grammar Panel and select Data Layers. Here you can also adjust the map layers, the layer type, and transparency of the map.
No changes need to be made to the data layers section.
- Select Order Amount and Customer Gender. Right-click and select Pick Visualization.
Select the Pie Chart.
Hover your mouse in the Color property box in the Grammar Panel until the drop-down arrow appears on the right side. Click the drop-down arrow for the Drop Target Options and then select Manage Assignments.
- Click the F and then select a different color from the default Redwood palette (or select a color hex-value in the box on the right), then click Done. Repeat the process for M.
- Drag-and-drop the pie chart under the donut chart.
- Name the canvas Customers and save the workbook.
Using the Explain feature to create a visualization
The Explain feature allows you to uncover insights and patterns about a particular column within a dataset.
- Right-click the Category attribute and select Explain Category.
Look at Explain Category window and observe the basic insights.
The Explain feature parses the dataset and presents different insights about the Category column. Listed below is a description of what Basic Facts, Key Drivers, Segments, and Anomalies entail:
Basic Facts - Shows basic distribution of the data element (attribute or measure) values across the dataset and its breakdown against each one of the measures in the dataset.
Key Drivers - Shows data elements (attributes or measures) that are more highly correlated to the outcome for the selected data. You will see charts showing the distribution of the selected attribute value across each of the correlated attribute values.
Segments that Explain - Shows segments or group in the dataset, after examining all the records that can predict the value of the selected data element. You can select a particular segment or group and then continue to analyze it.
Anomalies - Shows group of anomalies or unusual values in the dataset that you can relate to the selected data element (attribute or measure). You can review and select specific anomaly groups.
Add a visualization to the canvas using the explain feature.
- Hover or mouse over the top-right corner of the Basic Facts about Category visual to expose a check mark, click the check mark, then click Add Selected.
If you've successfully added the visualization, your canvas should reflect the graphic below.
- Delete the Explain Category canvas by right-clicking the Canvas tab drop-down arrow at the bottom of the screen and selecting Delete Canvas.
Creating a Dataflow and Predicting Subscriptions
Isla now wants to predict the likeliness of new customers to subscribe and to directly correlate the amount of the coupon to how much they are likely to subscribe anyway.
She wants to take advantage of existing data gathered by looking at subscription patterns on KoolKart website and to develop a model that looks at past subscription data to determine the influence of Customer Age Group, Customer Gender, Customer Country, and purchased Product Category into the likeliness to subscribe.
Download the
KoolKart Subscription.xlsxfile.http://api-v1:3000/api/v1/labs/a1fc2175-9720-4345-8cb8-c0270a08c9d1/gitlab/tutorial/files/KoolKartSubscriptions.xlsx
Select the Create button on the Oracle Analytics homepage.
Select Create Dataset.
Select Drop data file here or click to browse and choose the KoolKart Subscription dataset.
Select OK to add the KoolKart Subscription dataset.
- Save the KoolKart Subscription dataset.
Use the Data Flow Editor to create a model.
Return to the Oracle Analytics homepage and click Create.
Click Data Flow.
Select the KoolKart Subscription dataset and click Add..
From the Data Flow Panel, select Train Binary Classifier and drag it to the + next to the KoolKart Subscription dataset.
- Select the Naïve Bayes for Classification script from the pop-up menu. Click OK.
Your data flow should look like this:
- Set a prediction target by clicking Select a column and select Subscribed.
- Click Save Model and name the model Subscribed Prediction Model.
- Save the Data Flow and name it Subscribed Predictions and click OK.
- Click the Run Data Flow button on the upper right to execute the data flow.
- Download the KoolKart Partner Sales.xlsxfile.
http://api-v1:3000/api/v1/labs/a1fc2175-9720-4345-8cb8-c0270a08c9d1/gitlab/tutorial/files/KoolKartPartnerSales.xlsx
Create a Scenario that will predict subscription confidence.
Return to the Oracle Analytics homepage.
Click the Create button and select Dataset.
In the Create Dataset pop-up, select the Drop data here or click to browse and navigate to the KoolKart Partner Sales file or drag and drop the file onto the file icon.
In the Create Dataset Table from KoolKart Partner Sales window click OK.
- In the OrderID column click the # and change Measure to Attribute.
Save the dataset as KoolKart Partner Sales.
Return to the Oracle Analytics homepage and open the Machine Learning workbook.
Be sure that you are in Author Mode.
- Create a new canvas and rename the canvas as Subscription Prediction.
- Click the + in the Data Panel and select the Add Data option.
- Select the KoolKart Partner Sales dataset and click Add to Workbook.
The new dataset will now be displayed in the Data Panel.
- Click the + on the Data Panel.
- Select the Create Scenario option.
- In the Create Scenario – Select Model pop-up, select the Subscribed Prediction Model. Click OK.
- Select the KoolKart Partner Sales dataset from the Create Scenario – Map Your Data menu.
Select Done.
Select Customer Name and PredictionConfidence from the Data Panel, then right-click to select Pick Visualization.
- Select Pivot Table.
- Drag Subscribed Prediction to Color in the Grammar Panel.
If you have successfully completed these steps, you will see the visualization below. From looking at the chart, we can predict which customers are more likely to get a subscription.
Congratulations! You have completed the workshop
In this workshop, you were able to quickly assess the effectiveness of your social media campaigns on sales and how sales trends correspond to social media tone/sentiment. We also used machine learning abilities to predict the likelihood of customer subscribing to KoolKart's mailing list.