Data Mining Project using RapidMiner software

School of Business & Accountancy

(Diploma in Business Practice)

(Administration and Management)

Certificate in Business Applications

BUSINESS ANALYTICS

1      Final Individual Assignment

 

SingaTel is a local telecommunication providing mobile, internet, TV and fixed line telephony services.

This year, SingaTel is concerned about the number of customers leaving and subscribing to its competitors. It needs to understand who is leaving as customer acquisition is a life-and-death matter for most companies, and so are customer retention and its opposite, churn.

The Sales & Marketing Director has engaged the analytics team to identify the common and key attributes that contributed to those who have left, likely to leave in the near future, and why.

To conduct a data mining analysis, a complete dataset with attributes as shown in Table 1, comprising information on customers who have left the telco since Jan 2017 as well as existing customers, has been made available to you.

In this assignment, you are required to by yourself on this data mining project with the help of the RapidMiner product suite.

A series of tutorials in RapidMiner are in place to scaffold your learning as you embark on this data mining project.

You are required to use the CRISP-DM framework shown below as a guide to complete the project.

 

 

2      Data

The dataset (SingaTel_Customers.xlsx) can be downloaded from the PolyMall portal. Table 1 gives a full description of the dataset including the attributes of the data.

 

Attribute Description Labels/ Values Data Definitions
CustomerID Customer number 4-digit by dash and 4letters  
Gender Customer’s gender Male or Female  
SeniorCitizen Customer type who is a senior citizen 0, 1 0  = Not senior citizen

1  = Senior citizen

Tenure Number of month(s) of customer’s subscription 0, 1, 2, 3, …

 

0 = Started

subscription in current

month

MobileService If the customer subscribed to Mobile service Yes or No  
InternetService If the customer subscribed to Internet service Yes or No  
TVService If the customer subscribed to Cable TV service Yes or No  
OnlineSecurity If the customer subscribed to Online Security service Yes or No  
DeviceProtection If the customer subscribed to Device Protection service Yes or No  
TechSupport Did the customer contact Technical Support before Yes or No  
Contract Is the customer on contract Yes or No  
MonthlyCharges Monthly charge invoiced to customer Number  
TotalCharges Total charge invoiced to customer Number TotalCharges =

Tenure x

MonthlyCharges

Churn If the customer has left Yes or No  

Table 1: Customers Data

 

 

 

 

 

3      Primary Assignment Tasks

You are required to answer the following questions listed below. You are required to provide relevant screenshots, where applicable, as part of the answer:

 

3.1. Business Understanding (10%)

  1. Define the Business Problem and Business Objectives for this project.

 

3.2. Data Understanding (15%)

  1. Import the given dataset into RapidMiner using the ‘Read Excel’ and ‘Store’ operator by starting a new process. Note: Use the ‘Import Configuration Wizard’.
  2. Use the ‘Retrieve’ operator to load the dataset and run the process and comment on the results to identify any issues.

 

3.3. Data Preparation (20%)

  1. Proceed to use the necessary operator(s) to fix the issues identified in Task

(3).

  1. Use an operator to remove attribute(s) that you think will not be a good predictor. Support your decision with strong justifications.
  2. Set the necessary role using an appropriate operator for the attribute that you will need to predict.

 

3.4. Modelling (35%)

  1. In this step we will need split the dataset into two different sets, one for training the model and the other for testing the model. Use ‘Split Data’ operator to do the split. In your own words, explain why there is a need for this step to be done and explain your selection of data for the split.
  2. Add the predictive operator, ‘Decision Tree’ into the process model. Note: Please do not use any validation operator for this task.
  3. Test the accuracy of the process model using the necessary operator(s).

 

3.5. Evaluation (20%)

  1. Evaluate the performance of the model describing in detail every part of the confusion matrix.
  2. Evaluate the Decision Tree results and make any recommendations to the Sales & Marketing Director.

 

 

 

4      Secondary Assignment Tasks

Personally, based on what you have completed in Paragraph 3 above, you are required to identify ONE (1) additional attribute that contributed to those who have left SingaTel, likely to leave soon, and why. You are not required to repeat all the steps in Paragraph 3. Also, highlight this additional attribute in your Churn Analysis report.

 

Deliverables and Milestones

Deliverables Weightage Milestone
P2a Churn Analysis

i.         Churn Analysis Report in MS Word (not more than 30 pages with

images and appendices,

if any)

 

 

60%

 

 

 

 

 

By Sunday

17February2019

23:59 PM

P2b Attachments

ii.         Rapidminer files

iii.        Any other additional files that have been used in the Churn Analysis Report

 

40%

 

By Sunday

17 February 2019

23:59 PM

 

All submitted materials must be original and created by the individual student. Sources of previously published contents, if used, must be properly-attributed to the original author.

 

 

Submissions

 

  1. Submit onPolyMall the following (in softcopy):
    • RapidMiner process files with the extension ‘.rmp’ and ‘.properties’
    • Project Document which contains the written answers and screenshots

 

Submission details will be announced on PolyMall when the deadline draws close.

Note:

You are encouraged to use the following file naming convention for submitted files.

[StudentID]_[YourName]_Project2.doc [StudentID]_[YourName]_Project2.rmp and [StudentID]_[YourName]_Project2.properties

For example, a student’s files may be named as follows:

  • LimAhBee_TA000999A_ Project2.doc
  • LimAhBee_TA000999A_ Project2.rmp
  • LimAhBee_TA000999A_ Project2.properties

 

 Marking Rubrics

 

Project Tasks 1-11 (in Para 3)
CRISP-DM Tasks Criteria Weightages
Business Understanding 1 Able to identify appropriate Business Problem that correctly illustrate the problems that the company is facing.

Able to identify correct Business Objectives that will help the company solve the business problem.

5%
Data Understanding 2 Able to import the given excel file into

RapidMiner and store it in the repository.

5%
3 Able to use the ‘Retrieve’ operator to load the dataset from the repository and identify the correct issues pertaining to the results set. 5%
Data Preparation 4 Able to identify what are the operators that can be added in the process model to fix the issues identified previously. 6%
5 Able to identify any attributes(s) that will not help in building the predictive model. Decisions to omit the attributes must come with strong justifications. 6%
6 Able to set the correct role for the attribute that we need to predict. Target attribute must also be the correct one. 3%
Modelling 7 Able to use the ‘Split Data’ operator and set the correct settings. The decision on the settings must be explained. 10%
8 Able to use the ‘Decision Tree’ operator and set the correct settings. 10%
9 Able to use the correct operator(s) to test the training model. 10%
Evaluation 10 Able to evaluate the performance from the confusion matrix. Every part of the matrix must be explained. 20%
11 Able to interpret the decision tree results and find any critical insights which can help the

Sales & Marketing Director to make decisions.

20%
    Total: 100%

 

Last Updated on February 14, 2019 by Essay Pro