There has long been a need for more systematic work on the ef- fects on authorship attribution from parameters such as amount of data and number of candidate authors. This study uses well known features — includ- ing frequencies of words and syntactic elements — to investigate the impact of varying such parameters. The same methods are also applied to some tests of topic dependence. The results show that small feature sets are sufficient even for large numbers of candidates, but that a large amount of data is needed regardless of those things. There are also several indications that features pre- viously regarded as topic-independent, such as function words, may be highly topic-dependent after all, and that syntactic methods may be somewhat less so.
Page Responsible: Frank Drewes 2024-12-03