Abstract
We present a novel framework for authorial classification and clustering of the Qumran Dead Sea Scrolls (DSS). Our approach combines modern Hebrew BERT embeddings with traditional natural language processing features in a graph neural network (GNN) architecture.
Our results outperform baseline methods on both the Dead Sea Scrolls and a validation dataset of the Hebrew Bible. In particular, we leverage our model to provide significant insights into long-standing debates, including the classification of sectarian and non-sectarian texts and the division of the Hodayot collection of hymns.
Integrating Semantic and Statistical Features for Authorial Clustering of Qumran Scrolls
Clustering by sectarian/non-sectarian
The colors in this visualization shows the distribution of text clusters across sectarian and non-sectarian.
Clustering by Composition
The colors in this visualization shows the distribution of text clusters across different compositions.