Clustering Analysis on Online Retail Sales Homework 3
Let’s revisit the online sales data. Here is the code to load data and some additional variables we created. Use the following code to start your work. Notice that there some additional variables:
- itemID = 4 digit code to identify the product sold
- l_date = current date to count number of days after the latest purchase (as of December 31, 2011)
- Recency = the average of number of days of latest purchase by item
- Frequency = the number of transactions by item
- Monetary = total amount of sales by item
- Exploring the data and answer the following questions (using “sales1” data)
- Find the best 5 items performed in terms of number of transaction and total amount sales.
- Let’s consider make a category variablesusing the unit price; 0-1.99, 2-9.99,10-99.99, and 100+
Find the total amount total number of transactions and total sales amount by the category.
- Find the bestselling item in terms of transactions and identify who (customer) bought the most and where (counties except UK) it was sold the most.
- Two Variable Clustering Analysis (Use the “item_summary” data)
Let’s find the best way to classify the item using two variables, frequency and monetary. Use the clustering method to find the best suitable clusters. (Need to show all the results from SAS and explain how you come up with the number of clusters and describe why you prefer the one you chose)
Minimum Required work:
- Potential issues on outliers or problem of the data
- Show the best number of clusters using various setting of clusters
- Use graphs to illustrate the different clusters
- Name each group using the summary statistics by cluster
- Three Variable Clustering Analysis (Use the “item_summary” data)
Let’s find the best way to classify the item using three variables; recency, frequency and monetary. Use the clustering method to find the best suitable clusters. (Need to show all the results from SAS and explain how you come up with the number of clusters and describe why you prefer the one you chose)
Minimum Required work:
- Potential issues on outliers or problem of the data
- Show the best number of clusters using various setting of clusters
- Use graphs to illustrate the different clusters
- Name each group using the summary statistics by cluster
- Let’s compare the two and three variable clustering analysis you did in 2. and 3. As a final product on the clustering analysis, answer the following business questions using the clusters:
- The most important group to maximize the sales.
- The most important group to maintain the volumes of transactions.
- Possible promotions to increase sales for each group (creative ideas not from data)
- All questions need to be type with appropriate graphs and tables from SAS in a PDF file.
- Submit your SAS code as a separate file.