Conference Paper | Academic

Complementing Machine Learning Classifiers via Dynamic Symbolic Execution: Human vs. Bot Generated Tweets

Abstract Recent machine learning approaches for classifying text as humanwritten or bot-generated rely on training sets that are large, labeled diligently, and representative of the underlying domain. While valuable, these machine learning approaches ignore programs as an additional source of such training sets. To address this problem of incomplete training sets, this paper proposes to systematically supplement existing training sets with samples inferred via program analysis. In our preliminary evaluation, training sets enriched with samples inferred via dynamic symbolic execution were able to improve machine learning classifier accuracy for simple stringgenerating programs.