Dead sea scrolls unsupervised clustering

Abstract

We present a novel framework for authorial classification and clustering of the Qumran Dead Sea Scrolls (DSS). Our approach combines modern Hebrew BERT embeddings with traditional natural language processing features in a graph neural network (GNN) architecture.

Our results outperform baseline methods on both the Dead Sea Scrolls and a validation dataset of the Hebrew Bible. In particular, we leverage our model to provide significant insights into long-standing debates, including the classification of sectarian and non-sectarian texts and the division of the Hodayot collection of hymns.

Integrating Semantic and Statistical Features for Authorial Clustering of Qumran Scrolls

Interactive Clustering Analysis

Interactive analysis of the Hodayot composition, comparing between Teacher Hymns and Community Hymns.

Launch Interactive Analysis

Clustering by sectarian/non-sectarian

The colors in this visualization shows the distribution of text clusters across sectarian and non-sectarian.

Clustering by Composition

The colors in this visualization shows the distribution of text clusters across different compositions.