Active automata learning is emerging as a highly effective technique for obtaining models of protocol implementations and other reactive systems. Many algorithms have been proposed in the literature. Often variations of these algorithms exist for different classes of models, e.g. DFAs, Moore machines, Mealy machines, interface automata, and various forms of register automata. Algorithms for generation of conformance test suites often play a crucial role within active automata learning, as an oracle to determine whether a learned model is correct or not, and also here we see a wide variety of algorithms that have been proposed for different model classes.
Although there has been some excellent experimental work on evaluating algorithms for learning and conformance testing, the number of realistic models used for benchmarking is rather limited, and different papers use different industrial cases. Often the benchmarks used are either small/academic, which do not properly evaluate efficiency, or randomly generated, and it is clear from the experiments that performance of algorithms on randomly generated models is often radically different from performance on models of real systems that occur in practice.
A mature field is characterized by the presence of a rich set of shared benchmarks that can be used to compare different approaches, supported by tools for translations between different modelling frameworks. In 2019, we have therefore set up a repository with a publicly available set of benchmarks of state machines that model real protocols and embedded systems to allow researchers to compare the performance of learning and testing algorithms. Our benchmarks have already been used by several groups, for example, by researchers from l'Université Grenoble Alpes, the University of Sao Paulo, the University of Sheffield, the University of Leicester, Universidad Complutense de Madrid, Sabanci University in Turkey, Graz University of Technology, IST Austria, the Indian Institute of Science Bangelore, and Kansas State University.
We invite all our colleagues to contribute and send us (links to) other benchmarks they know of, for inclusion in the wiki.
This benchmark collection is described in the following article. Please cite this article when using these benchmarks: