Dead sea scrolls unsupervised clustering

Integrating Semantic and Statistical Features for Authorial Clustering of Qumran Scrolls

Abstract

We present a novel framework for authorial classification and clustering of the Qumran Dead Sea Scrolls (DSS). Our approach combines modern Hebrew BERT embeddings with traditional natural language processing features in a graph neural network (GNN) architecture.

Our results outperform baseline methods on both the Dead Sea Scrolls and a validation dataset of the Hebrew Bible. In particular, we leverage our model to provide significant insights into long-standing debates, including the classification of sectarian and non-sectarian texts and the division of the Hodayot collection of hymns.

Integrating Semantic and Statistical Features for Authorial Clustering of Qumran Scrolls

Clustering by sectarian/non-sectarian

The colors in this visualization shows the distribution of text clusters across sectarian and non-sectarian.

Clustering by Composition

The colors in this visualization shows the distribution of text clusters across different compositions.