Explain the use of cluster analysis in data science.
Sprockets Corporation designs high-end, specialty machine parts for a variety of industries. You have been hired by Sprockets to assist them with their data analysis needs. Sprockets Corporation management is curious about the leveraging of unstructured data. You are convinced that, if presented with a demonstration of the types of analysis that you can perform and the value added to their bottom line, they will give you more work in this lucrative field. Given a sample data set, you are going to present a cluster analysis using the Python language using two separate techniques: horizontal clustering and vertical clustering (k-means)
In a presentation to John Sprocket, CEO and the leadership team at Sprockets Corporation, prepare a presentation showcasing the types of analysis you can perform on their existing data and what the benefits are of said analysis. Include in your presentation all source code created in your analysis.
- Use the clusters.py python module from the collective intelligence text to perform a hierarchical clustering model.
- Generate a cluster representation (image). Note that you might want to explore a subset of your data in order to support a smaller cluster representation.
- Leverage the same module to perform a k-means clustering model. In this model you are not required to print out the cluster but rather the groups of the clusters (which rows are clustered together). Again, you may use a subset of the data in order to represent a more tractable output.
Last Updated on