School of Business & Accountancy
(Diploma in Business Practice)
(Administration and Management)
Certificate in Business Applications
1 Final Individual Assignment
SingaTel is a local telecommunication providing mobile, internet, TV and fixed line telephony services.
This year, SingaTel is concerned about the number of customers leaving and subscribing to its competitors. It needs to understand who is leaving as customer acquisition is a life-and-death matter for most companies, and so are customer retention and its opposite, churn.
The Sales & Marketing Director has engaged the analytics team to identify the common and key attributes that contributed to those who have left, likely to leave in the near future, and why.
To conduct a data mining analysis, a complete dataset with attributes as shown in Table 1, comprising information on customers who have left the telco since Jan 2017 as well as existing customers, has been made available to you.
In this assignment, you are required to by yourself on this data mining project with the help of the RapidMiner product suite.
A series of tutorials in RapidMiner are in place to scaffold your learning as you embark on this data mining project.
You are required to use the CRISP-DM framework shown below as a guide to complete the project.
The dataset (SingaTel_Customers.xlsx) can be downloaded from the PolyMall portal. Table 1 gives a full description of the dataset including the attributes of the data.
|Attribute||Description||Labels/ Values||Data Definitions|
|CustomerID||Customer number||4-digit by dash and 4letters|
|Gender||Customer’s gender||Male or Female|
|SeniorCitizen||Customer type who is a senior citizen||0, 1||0 = Not senior citizen
1 = Senior citizen
|Tenure||Number of month(s) of customer’s subscription||0, 1, 2, 3, …
|0 = Started
subscription in current
|MobileService||If the customer subscribed to Mobile service||Yes or No|
|InternetService||If the customer subscribed to Internet service||Yes or No|
|TVService||If the customer subscribed to Cable TV service||Yes or No|
|OnlineSecurity||If the customer subscribed to Online Security service||Yes or No|
|DeviceProtection||If the customer subscribed to Device Protection service||Yes or No|
|TechSupport||Did the customer contact Technical Support before||Yes or No|
|Contract||Is the customer on contract||Yes or No|
|MonthlyCharges||Monthly charge invoiced to customer||Number|
|TotalCharges||Total charge invoiced to customer||Number||TotalCharges =
|Churn||If the customer has left||Yes or No|
Table 1: Customers Data
3 Primary Assignment Tasks
You are required to answer the following questions listed below. You are required to provide relevant screenshots, where applicable, as part of the answer:
3.1. Business Understanding (10%)
- Define the Business Problem and Business Objectives for this project.
3.2. Data Understanding (15%)
- Import the given dataset into RapidMiner using the ‘Read Excel’ and ‘Store’ operator by starting a new process. Note: Use the ‘Import Configuration Wizard’.
- Use the ‘Retrieve’ operator to load the dataset and run the process and comment on the results to identify any issues.
3.3. Data Preparation (20%)
- Proceed to use the necessary operator(s) to fix the issues identified in Task
- Use an operator to remove attribute(s) that you think will not be a good predictor. Support your decision with strong justifications.
- Set the necessary role using an appropriate operator for the attribute that you will need to predict.
3.4. Modelling (35%)
- In this step we will need split the dataset into two different sets, one for training the model and the other for testing the model. Use ‘Split Data’ operator to do the split. In your own words, explain why there is a need for this step to be done and explain your selection of data for the split.
- Add the predictive operator, ‘Decision Tree’ into the process model. Note: Please do not use any validation operator for this task.
- Test the accuracy of the process model using the necessary operator(s).
3.5. Evaluation (20%)
- Evaluate the performance of the model describing in detail every part of the confusion matrix.
- Evaluate the Decision Tree results and make any recommendations to the Sales & Marketing Director.
4 Secondary Assignment Tasks
Personally, based on what you have completed in Paragraph 3 above, you are required to identify ONE (1) additional attribute that contributed to those who have left SingaTel, likely to leave soon, and why. You are not required to repeat all the steps in Paragraph 3. Also, highlight this additional attribute in your Churn Analysis report.
Deliverables and Milestones
i. Churn Analysis Report in MS Word (not more than 30 pages with
images and appendices,
ii. Rapidminer files
iii. Any other additional files that have been used in the Churn Analysis Report
17 February 2019
All submitted materials must be original and created by the individual student. Sources of previously published contents, if used, must be properly-attributed to the original author.
Submit onPolyMall the following (in softcopy):
- RapidMiner process files with the extension ‘.rmp’ and ‘.properties’
- Project Document which contains the written answers and screenshots
Submission details will be announced on PolyMall when the deadline draws close.
You are encouraged to use the following file naming convention for submitted files.[StudentID]_[YourName]_Project2.doc [StudentID]_[YourName]_Project2.rmp and [StudentID]_[YourName]_Project2.properties
For example, a student’s files may be named as follows:
- LimAhBee_TA000999A_ Project2.doc
- LimAhBee_TA000999A_ Project2.rmp
- LimAhBee_TA000999A_ Project2.properties
|Project Tasks 1-11 (in Para 3)|
|Business Understanding||1||Able to identify appropriate Business Problem that correctly illustrate the problems that the company is facing.
Able to identify correct Business Objectives that will help the company solve the business problem.
|Data Understanding||2||Able to import the given excel file into
RapidMiner and store it in the repository.
|3||Able to use the ‘Retrieve’ operator to load the dataset from the repository and identify the correct issues pertaining to the results set.||5%|
|Data Preparation||4||Able to identify what are the operators that can be added in the process model to fix the issues identified previously.||6%|
|5||Able to identify any attributes(s) that will not help in building the predictive model. Decisions to omit the attributes must come with strong justifications.||6%|
|6||Able to set the correct role for the attribute that we need to predict. Target attribute must also be the correct one.||3%|
|Modelling||7||Able to use the ‘Split Data’ operator and set the correct settings. The decision on the settings must be explained.||10%|
|8||Able to use the ‘Decision Tree’ operator and set the correct settings.||10%|
|9||Able to use the correct operator(s) to test the training model.||10%|
|Evaluation||10||Able to evaluate the performance from the confusion matrix. Every part of the matrix must be explained.||20%|
|11||Able to interpret the decision tree results and find any critical insights which can help the
Sales & Marketing Director to make decisions.
Last Updated on February 14, 2019 by Essay Pro