We applied learning with abstraction to actual TCP implementations. The aim was to infer control models of real
TCP implementations using a L* based approach augmented by abstraction. The models are parameterized by
sequence/acknowledgement numbers and TCP flags. In many ways, this work comes as a follow-up to the ns-2 case study?.
The main difference is that, instead of applying learning to a TCP simulator, we apply it to real TCP implementations.
We developed tcp-learner, a tool which uses LearnLib
(a Mealy Machine learner) to infer a Mealy Machine type model of a TCP component, either a client or a server.
The sequence and acknowledgement numbers carried by TCP packets have large domain spaces. Since learning doesn't
scale well with an increasing number of inputs, we need to reduce the input space.
For that purpose, we use an abstraction based approach, similar to the ns-2 case study.
The learning setup comprises 3 components:
The Learner
* includes both LearnLib and the Mapper component for abstraction * sends concrete inputs to the TCP Adapter * retrieves concrete outputs from the TCP Adapter
The TCP Adapter
* generates request packets from concrete inputs * sends them to Server * receives response packets from Server * parses and reads concrete outputs for the Learner
The TCP Component that can either be a TCP Server or TCP Client, encapsulated in an adapter. The role of this adapter
is to accept socket command strings over a TCP connection and call the respective methods on the TCP Server/Client.
Meanwhile, the TCP Server or Client can also be commanded by packets sent directly to them. Consequently, the TCP Component is commanded
by both TCP packets and by higher level sockets command strings. TCP packets are sent directly to the encapsulated server/client while
in order to initiate socket calls, strings are sent over a TCP connection to the TCP Component. This then calls the
corresponding socket methods on the TCP server/client.
The TCP Server Component
* can be commanded to listen, accept and close on the used socket * responds to TCP packets (from client) * in a simplified version, it is a bare TCP Server without an encapsulating adapter, that listens in a loop and does nothing else.
Note: the simplified version was used in our publication on TCP. As such, because there was no adapter, no socket higher level commands were included in the alphabet, only packet input strings.
The TCP Client Component
* can be commanded to connect and close on the used socket * responds to TCP packets (from server)
The scheme of the learning setup used for learning a TCP Server is shown below.
The systems under test in our paper were Windows 8 and Linux 12.04 implementations of the TCP stack.
We used a setup where the TCP Server listened in a loop, with an alphabet describing only packets (and not socket calls).
Hence, we did not require an adapter over the TCP Component.
The mapper is significantly more complex than for ns-2. This is partly due to optimizations made,
which ensure that a sequence of VALID inputs will trigger outputs whose parameters are also abstracted to VALID.
Details on the mapper can be found in our publication.
We obtained models for Windows 8 and Linux 12.04 implementations.
Using an alphabet of input carrying only VALID parameters, tcp-learner should also be able to infer
current operating systems and do it consistently. The main deciding factor on learning success is the mapper component,
which has been extracted to single java class.
Below we show the models obtained. Similar models can be found in our paper.