Evaluation criteria

We have many different criteria for evaluating the performance of algorithms and tools for learning and testing:

Does the algorithm/tool aim at fully learning the benchmark or just at giving suitable aggregate information of data that have been gathered?
Number of inputs events required for learning/testing model
Number of test sequences required for learning/testing model
Wall clock time needed for learning/testing (reset or certain inputs may require lot of time)
Quality of intermediate hypothesis; how long it takes before you get first reasonable model?
How interpretable are the results? (e.g. is tool able to discover structure, e.g. hierarchy and parallel composition, and are generated counterexamples minimal)
How easy it is to parallelize learning/testing

Automata Wiki