Character Adjacency Graph

This paper describes a multi-channel approach to open-set cross-domain authorship attribution (AA) for the PAN-CLEF 2019 AA shared task.

The present work adapts the EACH-USP ensemble method presented at PAN-CLEF 2018 to an open-set scenario by defining a threshold value for unknown authors, and extends the previous architecture with an additional character ranking model built with the aid of the PageRank algorithm.

Of particular interest to the present work, language models may be represented as a character adjacency graph, in which the degree of influence of each node may help capture the (most influential) character sequences that denote a particular author. Influence may be measured, for instance, by using the PageRank algorithm [10,15].

Results are superior to a number of baseline systems, and remain generally comparable to those in the original closed-set ensemble approach.

~

CUSTÓDIO, José Eleandro and PARABONI, Ivandré, [no date]. Multi-channel Open-set Cross-domain Authorship Attribution.

10. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab (November 1999). page

15. Schult, D.A.: Exploring network structure, dynamics, and function using networkx. In: In Proceedings of the 7th Python in Science Conference (SciPy. pp. 11–15 (2008)