Friend to Groucho Marx: “Life is difficult”
Groucho Marx to Friend: “Compared to what?”
Comparison is intrinsic to evaluation. Not only between one thing and another, but more often, between one category of things and another. Typologies are involved in many of the comparisons evaluators have to make (between people, activities, outcomes, locations, et cetera). Typologies can spring forth from our minds, but there are also systematic methods for developing them, broadly described as clustering methods. It is this second grouping that I want to discuss here.
What types of clustering methods are there?
Using one prompt to start off with, Claude identified for me 33 different clustering methods, which were organised into 9 different categories. These categories were defined largely on the basis of differences in computational mechanisms and mathematical principles which are involved in the operation of the clustering methods. But there were a couple of groups which were organised using different criteria relating to who does the task (human or algorithm) and what the method is for.
I then did some experimentation of my own, getting Claude to cluster
the methods, using two different clustering methods. One is called a “maximum spanning tree”, which connects methods according to which method is most
similar to which other method. You can see this in figure 1 below, which is
best read by double clicking on the image to get greater magnification. The
second experiment used an agglomerative hierarchical clustering to produce what
is called a dendrogram i.e. a tree structure displaying nested categories of
methods. You can see this in figure 2 below, again probably best inspected by
magnifying the image first. I also experimented with some other types of
classifications of clustering methods, but I won’t go into detail with those
here.
Figure 1: Maximum Spanning Tree
|
Figure 2: Dendrogram |
Introducing the Text
Cluster Analysis Lab
I developed this recently, in May this year, with very substantial help from Claude AI. You can take a quick preview of the app by following this link.
How does it compare? When I asked Claude to compare this app with the list of 33 clustering methods, it identified similarities with a number of methods, including hierarchical clustering, latent class analysis, and pile sorting. But its best fitting answer was that the method is a hybrid. “Its uniqueness comes precisely from joining a human-sorting logic (criteria generated from the material, framed by the researcher's domain statement) to an algorithmic back end (binary scoring + agglomerative clustering). No single method in the set occupies that position”. The Lab's nearest relatives — hierarchical clustering, LCA, pile/card sorting — sit in three different families i.e. separate branches of the spanning tree and dendrogram. In this respect the Lab can claim novelty.
If this is the solution, what was the problem?
Eight of the nine cluster analysis categories (29 of the 33 methods) identified by Claude can come up with identifiable clusters of texts through replicable transparent deterministic processes. But they then require a subjective human judgement to name and describe those clusters. That seems slightly bizarre and contradictory to me. Four others are subjective throughout, because of their ethnographic orientation (pile/card sorting, Q Methodology and Repertory Grid).
The second problem have been concerned with is the lack of interpretability, of the dimensions within which identify clusters are graphically represented. At least eight of these use abstract derived axes — the dimensions exist and could in principle be examined, but they are statistical composites a non-statistician can't readily interpret (PCA, Factor analysis, ICA, t-SNE, UMAP, MDS, Spectral clustering, and SOM).
Another problem relates to unaccountable input choices, such as the number of clusters, topics, classes or dimensions to be identified. The extent to which this is a problem does depend on the nature of the choices: Are they visible, are they checkable, is their failure visible, and are they documented as part of the document's "provenance"?
My assessment is that the lab successfully addresses the cluster naming challenge in two complimentary ways. Its use of a dendrogram, structured around a recognised and easy to understand view of similarity as an overlapping set of attributes, addresses the interpretability problem. The lab does offer users a comparable number of input choices to other methods — but the most consequential input choice is the one most exposed to inspection. This is the identification of criteria used to rate different texts similarities and how they are rated.
Now for a summary of some of the features of the lab...
The inputs: These are multiple bodies of texts. By Claude's estimate around 75% of cluster analysis methods work with quantitative data only, the rest work with text only (15%) or text or numbers (10%). More specifically, the focus of the lab is on similarities and differences between texts, not on internal structure within a text (as is the case with much thematic coding).
The outputs.There are five types. The first is a matrix, where rows describe the texts, columns describe attributes of those texts and cells describe their presence or absence. The attributes are identified by a Claude AI search and comparison of the texts contents, operating within a user-defined context and scope. The completed matrix is supported by a full description of each text attribute (not shown below)
A second matrix is constructed as the results of a "back-translation' type of rater reliability test. This enables users to remove from use those attributes which are unreliably identifiable and to identify texts whose analyses are less reliable than others.
The third output is a dendrogram, representing a nested classification of the texts using the user's selection of relevant text attributes. This is built using an agglomerative process, firstly finding pairs of text which are most similar, then pairs of pairs of texts which are most similar, et cetera. Each branch of the tree structure includes tool-tip information on how the attributes of that cluster of texts differ from it sibling branch. This is systematically identified by the app, and humanly verifiable. It is not a subjective judgement - as is the case a number of other types of clustering processes.
The fourth output is an open-ended Claude chat-tpe query facility, where the user can ask questions about a cluster of texts on their own, or in comparison to their sibling cluster in the dendrogram, or to the whole set of texts.
The fifth output is a set of exportable products, including a detailed provenance statement describing how all the products have been produced, a listing of all the identified text attributes, a copy of the text x attributes matrix, and the rater reliability assessment, a copy of the dendrogram, and a copy of any query dialogue. The results of the complete analysis process can be saved as a JSON file and reimported for reuse.
Other features of note
1. Users can choose which of the different Claude LLM models to use for which tasks in the workflow
2. The estimated and actual token costs are tracked at a number of stages. These costs are incurred through the app's use of a user specific API. In my experience so far, working with about a dozen different documents, these costs can range between US$1 and US$3 per complete analysis
Fir much more detail, go to this Introduction tab, on the Lab site

