Clustering can be considered the most important unsupervised learning problem . So, as every other problem of this kind,
        it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be “the process
        of organizing objects into groups whose members are similar in some way”.   A cluster is therefore a collection of objects which are
            “similar” between them and are “dissimilar” to the objects belonging to other clusters.   
 
         
 
        Distance-based Clustering :  
 
        Given a set of points, with a notion of distance between points, grouping the points into some number of clusters, such that 
 
        1. internal (within the cluster) distances should be small i.e members of clusters are close/similar to each other. 
        2. external (intra-cluster) distances should be large i.e. members of different clusters are dissimilar. 
 
        The goal of clustering is to determine the internal grouping in a set of unlabeled data. 
 
         Clustering Algorithms : 
        Clustering algorithms may be classified as listed below: 
 
        1. Exclusive Clustering 
        2. Overlapping Clustering 
        3. Hierarchical Clustering 
        4. Probabilistic Clustering 
 
         Most used clustering algorithms are :  
        1. K-means 
        2. Fuzzy K-means 
        3. Hierarchical clustering 
        4. Mixture of Gaussians 
 
        K-means is an exclusive clustering algorithm, Fuzzy K-means is an overlapping clustering algorithm, Hierarchical
            clustering is obvious and lastly Mixture of Gaussians is a probabilistic clustering algorithm. 
 
        K Means Clustering
          
         
 
 
        Hierarchical clustering (Agglomerative and Divisive clustering) :  
 
        hierarchical clustering analysis is a method of cluster analysis which seeks to build a hierarchy of clusters i.e.
        tree type structure based on the hierarchy. 
 
        A Hierarchical clustering method works via grouping data into a tree of clusters. Hierarchical clustering begins by treating
        every data points as a separate cluster. Then, it repeatedly executes the subsequent steps: 
         1. Identify the 2 clusters which can be closest together, and 
          2. Merge the 2 maximum comparable clusters. We need to continue these steps until all the clusters are merged together.  
 
        In Hierarchical Clustering, the aim is to produce a hierarchical series of nested clusters. A diagram called  Dendrogram 
        (A Dendrogram is a tree-like diagram that statistics the sequences of merges or splits) graphically represents this hierarchy
        and is an inverted tree that describes the order in which factors are merged (bottom-up view) or cluster are break up (top-down view).
        
 
         The basic method to generate hierarchical clustering are :  Agglomerative and Divisive 
 
 
        
 
 
 
 
        
 
 
 
 
         Hierarchical Agglomerative vs Divisive clustering –  
 
        1. Divisive clustering is more complex as compared to agglomerative clustering. 
 
        2. Divisive clustering is more efficient if we do not generate a complete hierarchy all the way down to individual data leaves. 
 
        3. Time complexity of a naive agglomerative clustering is O(n3) but it can be brought down to O(n2). Whereas for divisive clustering given
        a fixed number of top levels, using an efficient algorithm like K-Means, divisive algorithms are linear in the number
        of patterns and clusters. 
 
        4. Divisive algorithm is also more accurate.