Evaluation criteria
We have many different criteria for evaluating the performance of algorithms and tools for learning and testing:
- Does the algorithm/tool aim at fully learning the benchmark or just at giving suitable aggregate information of data that have been gathered?
- Number of inputs events required for learning/testing model
- Number of test sequences required for learning/testing model
- Wall clock time needed for learning/testing (reset or certain inputs may require lot of time)
- Quality of intermediate hypothesis; how long it takes before you get first reasonable model?
- How interpretable are the results? (e.g. is tool able to discover structure, e.g. hierarchy and parallel composition, and are generated counterexamples minimal)
- How easy it is to parallelize learning/testing