paperF13_8
Three levels of preprocessing of the data were conducted to prepare the data for use in the
training and testing of the neural network. In the first round of preprocessing nine of the event
record data elements were selected from the available set. The nine elements were selected
because they are typically present in network data packets and they provide a complete
description of the information transmitted by the packet:
· Protocol ID - The protocol associated with the event, (TCP = 0, UDP = 1, ICMP = 2, and
Unknown = 3).
· Source Port The port number of the source.
· Destination Port The port number of the destination.
· Source Address - The IP address of the source.
· Destination Address - The IP address of the destination.
· ICMP Type The type of the ICMP packet (Echo Request or Null).
· ICMP Code The code field from the ICMP packet (None or Null).
· Raw Data Length The length of the data in the packet.
· Raw Data - The data portion of the packet.
The second part of the preprocessing phrase consisted of converting three of the nine data
elements (ICMP Type, ICMP Code and Raw Data) into a standardized numeric representation.
The process involved the creation of relational tables for each of the data types and assigning
sequential numbers to each unique type of element. This involved creating DISTINCT SELECT
queries for each of the three data types and loading those results into tables that assigned a unique
integer to each entry. These three tables were then joined to the table that contained the event
records. A query was then used to select six of the nine original elements (ProtocolID, Source
Port, Destination Port, Source Address, Destination Address, and Raw Data Length) and the
unique identifiers which pertain to the remaining three elements (ICMP Type ID, ICMP Code ID,
and Raw Data ID). A tenth element (Attack) was assigned to each record based on a
determination of whether this event represented part of an attack on a network, (Table 1). This
element was used during training as the target output of the neural network for each record.
Protocol
ID
Source
Port
Destination
Port
Source
Address
Destination
Address
ICMP
Type ID
ICMP
Code ID
Raw Data Length
Data ID
Attack
0
2314
80
1573638018
-1580478590
1
1
401
3758
0
0
1611
6101
801886082
-926167166
1
1
0
2633
1
Table 1: Sample of pre-processed events query
The third round of data preprocessing involved the conversion of the results of the query into an
ASCII comma delimited format that could be used by the neural network (Table 2).
0,2314,80,1573638018,-1580478590,1,1,401,3758,0
0,1611,6101,801886082,-926167166,1,1,0,2633,1
Table 2: Sample of ASCII comma-delimited input strings