#
# Copyright (C) 2001 by USC/ISI
# All rights reserved.
#
# Redistribution and use in source and binary forms are permitted
# provided that the above copyright notice and this paragraph are
# duplicated in all such forms and that any documentation, advertising
# materials, and other materials related to such distribution and use
# acknowledge that the software was developed by the University of
# Southern California, Information Sciences Institute.  The name of the
# University may not be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
# WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
#
# This text is a simple document that serves to explain how to use 
# SAMAN RAMP
#
# This work is supported by DARPA through SAMAN Project
# (http://www.isi.edu/saman/), administered by the Space and Naval 
# Warfare System Center San Diego under Contract No. N66001-00-C-8066
#

                  How to use RAMP (RApid Model Parameterization)
	         --------------------------------------

Kun-chan Lan                 Version 1.0                               05/5/02

1. Introduction

This is a quick description of how to use RAMP (RApid Model 
Parameterization) to rapidly generate simulation model from raw trace.
RAMP contains a set of C/Perl/Tcl programs. It 
takes a tcpdump file as input and outputs a set of CDF 
(Cumulative Distribution Function) files that model the application-level
statistics and underlining topology information of targeted traffic.  
For application traffic, currently RAMP supports Web and FTP traffic.
For modeling network condition, we focus on modeling RTT and
bottleneck bandwidth which are important parameters to drive simulation.
For details, please read paper.pdf included in this package.

2. Usage

First, type "make all" under Unix command line 

    to make "http_connect" and "http_active"

then type
    RAMP [-f] [-c] <tcpdump file> <threshold for user think time> <netwrok prefix>

It needs three arguments. First argument is the file name of trace.
The default trace foramt is pcaplib-format (generated by tcpdump -w option).
Addition of option "-c" will assume the trace file is in CoralReef format.
With -f option, RAMP will filter traffic based on prefix specified in
model.conf and generate seperate model for each prefix.
The second argument is the threshold time value (in millisecond) 
that distinguishes idle periods in Web traffic in order to infer
user "think" times between requests for new top-level pages. 
The last argument is the network prefix used to distinguish inbound 
traffic from outbound traffic.


3. List of Files

RAMP contains a set of programs, including
(1) http_connect
(2) http_active
(3) delay.*
(4) outputCDF
(5) time-series.pl
(6) BW*.pl
(7) dat2cdf
(8) flow.*
(9) io.*
(10) ftp.*
(11) getFTPclient.pl
(12) getFTP.pl
(13) ks.pl
(14) pair.tcl
(15) win.awk
(15) traffic-classify


The files http_connect and http_active are 2 C programs  
written and contributed by Don Smith <smithfd@cs.unc.edu> and
Felix Hernandez Campos <fhernand@cs.unc.edu> from The University
of North Carolina at Chapel Hill. These two programs were also used
for their SIGMETRICS 2001 paper "What TCP/IP Protocol Headers Can Tell 
Us About the Web". The program http_connect is used to performs an analysis
of tcpdump output and produce a summary of the TCP connections used for
HTTP. It assumes that tcpdump has been filtered for packets that are
from TCP source port 80 and the result has been sorted so that packets 
are in ascending time order within each TCP connection. The program 
http_active is used to create an activity trace (summary form) of web
browsing clients with respect to three types of activities: client
sending request data, server sending responds data, client is idle.
For more detailed description of these two programs, please see
accompanying files trace_processing.pdf and output_format.pdf (also
contributed by UNC)

The file delay.tcl is a Tcl script that estimates the delay between
each Web client and server pair. It takes a tcpdump trace as input
and assumes that trace has been filtered for packets that only originates
from or desinate to port 80. It looks at the beginning of each TCP
connection and see how much time it takes for the SYN/SYN-ACK handshake
to estimate the Round Trip time (RTT) delay for each
connection. It finally outputs a summary form of averaged RTT for
each source/destination pair (which might contain several connections
within the entire trace)

The file outputCDF is a Perl script that takes the output of
http_active to infer a set of user-level statistics, including
(a) user session inter-arrival time
(b) number of pages per user session
(c) page inter-arrival time
(d) page size
(e) object inter-arrival time
(f) object size
(g) client request size
(h) ratio between persistent and non-persistent connection
(i) server popularity
The output will be a set of CDF files that describe the above statistics
in 3-column format (take pagesize.dat.cdf for example)
1st column: page size (in KB)
2nd column: accumulated number of samples
3rd column: accumulated probability

The file time-series.pl is a Perl script that takes the output
of tcpdump trace (that ASCII print lines that tcpdump generates to
stdout) and produce a time series of traffic volume (the unit block is 
1 millisecond) for use in further scaling analysis of traffic.

the files BW*.pl are a set of perl scripts that compute the
bottleneck bandwidth of the underlying topology of the trace

dat2cdf converts a (one-column fomrat) data file into its corresponding CDF

files flow.* compute the flow statistics of the traffic, including
flow duration, flow size and flow inter-arrival time

files io.* are used to separate inbound and outbound traffic 
in the trace based on the prefix given in the input argument

getFTPclient.pl and getFTP.pl are used to retrieve the data
connections in the trace

ks.pl is used to perform Kolmogorov-Smirnov goodness of fit test 
on two CDF files to decide if they are statistically different

win.awk is used to compute TCP window size

traffic-classify is used to retrieve the protocol mix information
from the trace

4. NS scripts

The file isigen.tcl is a ns script that demonstrates how to use
the output of RAMP (the set of CDF files) to simulate the
traffic collected at ISI/USC gateway link. To execute it, 
assuming you already have ns (snapshot after 10/10/01) installed 
in your system, type "ns isigen.tcl" from the command line. It 
takes ~1 hour on a Red Hat Linux 7.0 Pentium II Xeon 450 MHz PC 
with 1GB physical memory. The complete example including the 
CDF files that model ISI traffic can be found under 
<your ns root>/tcl/ex/empweb/ of your ns distribution



Contact person

    Kun-chan Lan, USC/ISI
    email: kclan@isi.edu
    phone: 310-448-8260

