Abstract
Recent machine learning approaches for classifying text as humanwritten or bot-generated rely on training sets that are large, labeled
diligently, and representative of the underlying domain. While valuable, these machine learning approaches ignore programs as
an additional source of such training sets. To address this problem of incomplete training sets, this paper proposes to systematically
supplement existing training sets with samples inferred via program analysis. In our preliminary evaluation, training sets enriched with
samples inferred via dynamic symbolic execution were able to improve machine learning classifier accuracy for simple stringgenerating programs.