Essay Writer » Essay Blog » Business Essays Help » Data Mining Project using RapidMiner software

Data Mining Project using RapidMiner software

School of Business & Accountancy

(Diploma in Business Practice)

(Administration and Management)

Certificate in Business Applications

BUSINESS ANALYTICS

1      Final Individual Assignment

 

SingaTel is a local telecommunication providing mobile, internet, TV and fixed line telephony services.

This year, SingaTel is concerned about the number of customers leaving and subscribing to its competitors. It needs to understand who is leaving as customer acquisition is a life-and-death matter for most companies, and so are customer retention and its opposite, churn.

The Sales & Marketing Director has engaged the analytics team to identify the common and key attributes that contributed to those who have left, likely to leave in the near future, and why.

To conduct a data mining analysis, a complete dataset with attributes as shown in Table 1, comprising information on customers who have left the telco since Jan 2017 as well as existing customers, has been made available to you.

In this assignment, you are required to by yourself on this data mining project with the help of the RapidMiner product suite.

A series of tutorials in RapidMiner are in place to scaffold your learning as you embark on this data mining project.

You are required to use the CRISP-DM framework shown below as a guide to complete the project.

 

 

2      Data

The dataset (SingaTel_Customers.xlsx) can be downloaded from the PolyMall portal. Table 1 gives a full description of the dataset including the attributes of the data.

 

AttributeDescriptionLabels/ ValuesData Definitions
CustomerIDCustomer number4-digit by dash and 4letters 
GenderCustomer’s genderMale or Female 
SeniorCitizenCustomer type who is a senior citizen0, 10  = Not senior citizen

1  = Senior citizen

TenureNumber of month(s) of customer’s subscription0, 1, 2, 3, …

 

0 = Started

subscription in current

month

MobileServiceIf the customer subscribed to Mobile serviceYes or No 
InternetServiceIf the customer subscribed to Internet serviceYes or No 
TVServiceIf the customer subscribed to Cable TV serviceYes or No 
OnlineSecurityIf the customer subscribed to Online Security serviceYes or No 
DeviceProtectionIf the customer subscribed to Device Protection serviceYes or No 
TechSupportDid the customer contact Technical Support beforeYes or No 
ContractIs the customer on contractYes or No 
MonthlyChargesMonthly charge invoiced to customerNumber 
TotalChargesTotal charge invoiced to customerNumberTotalCharges =

Tenure x

MonthlyCharges

ChurnIf the customer has leftYes or No 

Table 1: Customers Data

 

 

 

 

 

3      Primary Assignment Tasks

You are required to answer the following questions listed below. You are required to provide relevant screenshots, where applicable, as part of the answer:

 

3.1. Business Understanding (10%)

  1. Define the Business Problem and Business Objectives for this project.

 

3.2. Data Understanding (15%)

  1. Import the given dataset into RapidMiner using the ‘Read Excel’ and ‘Store’ operator by starting a new process. Note: Use the ‘Import Configuration Wizard’.
  2. Use the ‘Retrieve’ operator to load the dataset and run the process and comment on the results to identify any issues.

 

3.3. Data Preparation (20%)

  1. Proceed to use the necessary operator(s) to fix the issues identified in Task

(3).

  1. Use an operator to remove attribute(s) that you think will not be a good predictor. Support your decision with strong justifications.
  2. Set the necessary role using an appropriate operator for the attribute that you will need to predict.

 

3.4. Modelling (35%)

  1. In this step we will need split the dataset into two different sets, one for training the model and the other for testing the model. Use ‘Split Data’ operator to do the split. In your own words, explain why there is a need for this step to be done and explain your selection of data for the split.
  2. Add the predictive operator, ‘Decision Tree’ into the process model. Note: Please do not use any validation operator for this task.
  3. Test the accuracy of the process model using the necessary operator(s).

 

3.5. Evaluation (20%)

  1. Evaluate the performance of the model describing in detail every part of the confusion matrix.
  2. Evaluate the Decision Tree results and make any recommendations to the Sales & Marketing Director.

 

 

 

4      Secondary Assignment Tasks

Personally, based on what you have completed in Paragraph 3 above, you are required to identify ONE (1) additional attribute that contributed to those who have left SingaTel, likely to leave soon, and why. You are not required to repeat all the steps in Paragraph 3. Also, highlight this additional attribute in your Churn Analysis report.

 

Deliverables and Milestones

Deliverables Weightage Milestone
P2aChurn Analysis

i.         Churn Analysis Report in MS Word (not more than 30 pages with

images and appendices,

if any)

 

 

60%

 

 

 

 

 

By Sunday

17February2019

23:59 PM

P2bAttachments

ii.         Rapidminer files

iii.        Any other additional files that have been used in the Churn Analysis Report

 

40%

 

By Sunday

17 February 2019

23:59 PM

 

All submitted materials must be original and created by the individual student. Sources of previously published contents, if used, must be properly-attributed to the original author.

 

 

Submissions

 

  1. Submit onPolyMall the following (in softcopy):
    • RapidMiner process files with the extension ‘.rmp’ and ‘.properties’
    • Project Document which contains the written answers and screenshots

 

Submission details will be announced on PolyMall when the deadline draws close.

Note:

You are encouraged to use the following file naming convention for submitted files.

[StudentID]_[YourName]_Project2.doc

[StudentID]_[YourName]_Project2.rmp and

[StudentID]_[YourName]_Project2.properties

For example, a student’s files may be named as follows:

  • LimAhBee_TA000999A_ Project2.doc
  • LimAhBee_TA000999A_ Project2.rmp
  • LimAhBee_TA000999A_ Project2.properties

 

 Marking Rubrics

 

Project Tasks 1-11 (in Para 3)
CRISP-DM Tasks CriteriaWeightages
Business Understanding1Able to identify appropriate Business Problem that correctly illustrate the problems that the company is facing.

Able to identify correct Business Objectives that will help the company solve the business problem.

5%
Data Understanding2Able to import the given excel file into

RapidMiner and store it in the repository.

5%
3Able to use the ‘Retrieve’ operator to load the dataset from the repository and identify the correct issues pertaining to the results set.5%
Data Preparation4Able to identify what are the operators that can be added in the process model to fix the issues identified previously.6%
5Able to identify any attributes(s) that will not help in building the predictive model. Decisions to omit the attributes must come with strong justifications.6%
6Able to set the correct role for the attribute that we need to predict. Target attribute must also be the correct one.3%
Modelling7Able to use the ‘Split Data’ operator and set the correct settings. The decision on the settings must be explained.10%
8Able to use the ‘Decision Tree’ operator and set the correct settings.10%
9Able to use the correct operator(s) to test the training model.10%
Evaluation10Able to evaluate the performance from the confusion matrix. Every part of the matrix must be explained.20%
11Able to interpret the decision tree results and find any critical insights which can help the

Sales & Marketing Director to make decisions.

20%
  Total: 100%

 

Last Updated on February 14, 2019

Don`t copy text!
Scroll to Top