HostedDB - Dedicated UNIX Servers

paperF13_8 Three levels of preprocessing of the data were conducted to prepare the data for use in the training and testing of the neural network.  In the first round of preprocessing nine of the event record data elements were selected from the available set. The nine elements were selected because they are typically present in network data packets and they provide a complete description of the information transmitted by the packet: ·    Protocol ID  - The protocol associated with the event, (TCP = 0, UDP = 1, ICMP = 2, and Unknown = 3). ·    Source Port – The port number of the source. ·    Destination Port – The port number of the destination. ·    Source Address - The IP address of the source. ·    Destination Address - The IP address of the destination. ·    ICMP Type – The type of the ICMP packet (Echo Request or Null). ·    ICMP Code – The code field from the ICMP packet (None or Null). ·    Raw Data Length – The length of the data in the packet. ·    Raw Data - The data portion of the packet. The second part of the preprocessing phrase consisted of converting three of the nine data elements (ICMP Type, ICMP Code and Raw Data) into a standardized numeric representation. The process involved the creation of relational tables for each of the data types and assigning sequential numbers to each unique type of element.  This involved creating DISTINCT SELECT queries for each of the three data types and loading those results into tables that assigned a unique integer to each entry.  These three tables were then joined to the table that contained the event records.  A query was then used to select six of the nine original elements (ProtocolID, Source Port, Destination Port, Source Address, Destination Address, and Raw Data Length) and the unique identifiers which pertain to the remaining three elements (ICMP Type ID, ICMP Code ID, and Raw Data ID).  A tenth element (Attack) was assigned to each record based on a determination of whether this event represented part of an attack on a network, (Table 1). This element was used during training as the target output of the neural network for each record. Protocol ID Source Port Destination Port Source Address Destination Address ICMP Type ID ICMP Code ID Raw Data Length Data ID Attack 0 2314 80 1573638018 -1580478590 1 1 401 3758 0 0 1611 6101 801886082 -926167166 1 1 0 2633 1 Table 1: Sample of pre-processed events query The third round of data preprocessing involved the conversion of the results of the query into an ASCII comma delimited format that could be used by the neural network (Table 2). 0,2314,80,1573638018,-1580478590,1,1,401,3758,0 0,1611,6101,801886082,-926167166,1,1,0,2633,1 Table 2: Sample of ASCII comma-delimited input strings