Friday, January 20, 2012

FPGA Capabilities and Concepts

FPGA Capabilities and Concept 1/20/2012

                I believe Dr. Aslan also wants to use his server’s NetFPGA capability for experimentation. In DSP, our prototyping board will be a Xilinx FPGA board.  In this report, I begin research about NetFPGA (beginning with an introduction to FPGA, although I am not highly familiar with it yet) and some about networking in general.
A general-purpose microprocessor has pre-programmed hardware with an instruction set that is immutable and must be obeyed. By comparison, FPGA (“field programmable gate array”) is a form of integrated circuit where the dominant component is interconnect which can be modified into gates by the user. Although some logic components may be present, the user can define many of them using a hardware descriptive language such as VHDL or Verilog. The fact that the behavior of its hardware can be reprogrammed is a compelling advantage of this architecture.
However, FPGA is a recent phenomenon compared to CPUs or its predecessor, ASICs. The very first patents related to programmable logic gates were granted in 1985. By the early 90s, the market was still very small, but by the end of the decade, it was in the billions of dollars. Present-day technology allows for millions of programmable gates.
The NetFPGA architecture is a method of using FPGA to create a networking tool such as a router. The task of using NetFPGA to create an IP router is a common one seen in various academic institutions; approximately 2200 of these boards have been deployed. To start, here are some links I consulted to arrive at the research in this report:
In 2009, I was employed for a few months in DSL tech support (for a local firm called teleNetwork) and we had one week of training, which I do remember being a bit frenetic. I am certain that I learned some information about the theory behind the internet and networking in general, but most of it has left me.
What is a router and how does it work? Routers are pieces of hardware that send and receive data packets between computer networks. A data packet is just a quantity of formatted data. (it is distinct from simple, bit-by-bit transfer on point-to-point networks). The data is sent in octets with a header followed by a body. The header is the forwarding information, while the body is the actual data.
Internet Protocol (IP) is the set of design principles for sending data packets over internetworks (collectively referred to as the internet). The Internet Protocol suite is abbreviate as TCP/IP (Transfer Control Protocol/Internet Protocol) because it contains another protocol (TCP) which controls the flow of information via IP, makes requests for lost data, and orders the transmission so that transfer is maximized.  
I used to work in DSL tech support and I vaguely remember some of the levels of distribution that are involved. The idea of the internet goes back to the ARPANET of the early 1970s, which was designed to connect existing small networks nationwide, and allow survivability of the whole network even if huge portions of the country were destroyed in nuclear war. Privatization of internet access occurred in 1992, and by the turn of the millennium, most Americans had access to internet, either at their residence, at work, or in libraries, schools, or universities.
Nowadays the common home internet setup includes a DSL modem and a router, or an integrated setup called a gateway. A digital signal that contains the requested information is sent (modulated)  over the phone line. A modem will receive (demodulate) the signal.  The purpose of a router is to distribute an incoming signal to a network, which implies multiple machines.
NetFPGA is available in 1G and 10G speeds. The former has a standard PCI form factor and a quad-input gigabit Ethernet NIC.

Fig. 1. NetFPGA board

The makers of NetFPGA claim that because the datapath is entirely implemented in hardware, it is capable of sending back-to-back packets at full gigabit speed, with processing latency of “just a few clock cycles”. It also has enough onboard memory to allow full line rate buffering.
NetFPGA.org has a set of detailed tutorial videos which describe the product and the uses of it:
VIDEO 1: INTRODUCTION
There are three basic uses of the NetFPGA architecture
1.       Running the router kit to achieve hardware acceleration.
An unmodified Linux machine can use the “hardware accelerated Linux router” or router kit to achieve this. It uses one program called RKD (Router Kit Daemon) which monitors the Linux routing table and allows the user to modify the route table using standard commands. If the routing software uses the NetFPGA interface for forwarding, hardware acceleration is provided without any further modifications.

2.       Enhancing existing reference designs.
The reference designs provided include 1)Network interface card (NIC); 2)Ethernet switch; 3)IPv4 router; 4) hardware accelerated Linux router (discussed above); 5) SCONE (Software Component of the NetFPGA), which uses a protocol called PW-OSPF to handle exceptions to the hardware forwarding path..  The reference designs for the NetFPGA 1G board have a pipeline of Verilog modules which can be augmented using the NetFPGA driver. If the user wants, he could create a GUI in this way to visualize the functionality of the NetFPGA hardware.

3.       Building entirely new systems.
Using Verilog or VHDL to design, simulate, synthesize, and download to the board. Not all projects are an adequate fit for existing reference designs. This is more involved than just adding modules on reference designs.

                In addition to the provided material by the NetFPGA organization, there have been contributed designs by companies and universities, including an openflow switch, packet generator, zFilter router, and a DFA-based regular expression matching engine. There is also a wiki and forums on the website.
                The NetFPGA, as labeled by its manufacturers, has two defining features.

I.                    It is a line-rate platform. It can be used to process back-to-back packets, operate on packet headers, including switching, routing and firewall processing. It can also be used for content processing and intrusion prevention of packet payloads. 
II.                  It is open-source hardware. This is similar to open-source software in that all source code is available and has a BSD license. But it is considered more difficult than a software project because hardware components must meet timing and may have complex interfaces. All levels of design must receive adequate testing to ensure that they have consistently correct and repeatable results.

VIDEO 2: HARDWARE
                the NetFPGA architecture contains a Xilinx Virtex II Pro 50 FPGA  with 53,000 logic cells. It also contains block RAMs and two embedded PowerPC processors, enabling higher-level languages to be implemented as well.
                The board also has four onboard RJ-45 gigabit Ethernet ports.
                Onboard memory includes 4.5 MB SRAM (for storing forward table data) and 64 MB DDR2 DRAM (for packet buffering).
                Board has a standard PCI connector to interface with host PC. There is also SATA connectivity, which might be used for connecting multiple boards.
                All reference designs were tested on 32-bit and 64-bit systems. In order to make one’s own designs, one needs to have Xilinx ISE and ModelSim. There are complete prebuild systems (lacking the NetFPGA card) such as the NetFPGA “Cube” (desktop) and rackmount servers, suitable for high-density and high-performance servers.

VIDEO 3: NETWORKING
                Because the NetFPGA uses Ethernet, it places packets within Ethernet frames. IPv4 headers have many fields, including the version field, source and destination fields, TTL (time to leave) field (counter to prevent data being circulated endlessly), header checksum field (makes sure that the header is not corrupted).
                A simplified view of the internet can consider just routers and hosts. The routers forward traffic between hosts. Between two hosts, one host creates a packet with the appropriate header defining where the data is to go (IP address) which is sent to the router connected to that host. The router will consult the forwarding table to find the best place to send it next. Each router will pass the packet along in the same way until it has reached its destination.
                IPv4 addresses have 32 bits, which allows for 4 billion unique addresses. The naïve approach is to simply create a forwarding table with billions of entries, and although this is possible with current memory density, it would make routers more expensive and it would make updating the table extremely costly and time-consuming.
                The actual method routers use is to use grouping. Grouping will place hosts thatare “close” to each other (in terms of the steps required to reach each other) by grouping blocks of IP addresses by closeness. There will still be matching entries for close, but not identical, IP addresses.  The forwarding tables can be improved by sorting the entries from most specific to least specific.  Doing a linear search on the forwarding table will thus always provide the most specific entry first, resulting in the best possible match.
                Going beyond the forwarding table, there is a switching element to send data to the correct port. Further in line there is a queue which buffers the data to be sent. The routing protocol will then determine more closely the topology of the network, and find the shortest possible route to the destination. The routing protocol will update the forwarding table, and maintain a routing table, which is more detailed than the forwarding table. There is also manual control available by CLI in a “management” block.
                 Grouping together only those which are responsible for forwarding traffic (forwarding table, switching component, queue) maybe called the data plane, which processes every packet through the router. Data plane is implemented in hardware.  The other elements can be grouped together as the control plane, which is much slower than the data plane. Control plane is implemented in software, and is more complex than the data plane. As concerns the NetFPGA, the control plane is carried out on the host computer, and the data plane is implemented on the FPGA. For the FPGA, instead of a forwarding table, switching component, and queue, these repsonsibilities are carried out by an “output port lookup”, “input arbiter”, and “output queues”, respectively. NetFPGA includes two versions of the control plane: SCONE and the router kit (which doesn’t actually implement in software but in hardware).
VIDEO 4: REFERENCE ROUTER
                The reference router uses FPGA hardware to achieve the functionality of the data plane. The control plane can be managed by SCONE and a Java GUI. The GUI is not strictly necessary but it makes understanding the routing table much easier.
In this example, the lecturer considers a setup of 5 computers with NetFPGA installed. They are not all connected to each other, but all of them are in the network. If one wants to stream video from one computer to another, the NetFPGA will use the router kit to stream the data along the shortest path. If a link on the shortest path is broken, the video will continue to play for a few seconds from buffered data, and then stop. The SCONE will talk to other computers on the network and recognize that the topology has changed. Each SCONE will update its routing to reflect this changed topology. Each one will consequently find the next shortest path which is available, and resume the streaming. When the broken line is reconnected, SCONE will update the routing table and resume streaming via the shortest path, albeit without interruption of the video this time.

VIDEO 5: BUFFER SIZING
                The router reference design can be modified to experiment with buffer sizing. Buffers are needed in routers to handle congestion, internal contention, and enable pipelining. Congestion buffers are the largest and most important. Congestion happens when a router is receiving packets at the same time. The buffer will hold as many as possible in the order they arrived, and drop those that simply cannot fit. The “TCP sawtooth” refers to a drop in the TCP window size (number of outstanding, unacknowledged  packets) corresponding to when a packet is lost. Buffers must be sized large enough to absorb the variations in the traffic arrival rate and ensure a constant departure rate equal to the output link capacity.
                Most commercially available equipment cannot modify buffer sizes as needed to experiment and come up with ideal sizing for the network demands.
                With NetFPGA you can adjust the buffer sizes, capture packet events (add a module for this), and rate-limit a link (also needs a new module) to experiment with buffer sizing.  They add modules for a rate limiter and event capturer. Then they use the Advanced Router GUI script to activate the rate limiter and event capturer to log the packet drops. The generated output is the “waveform” of the TCP window size against time (as packets are received).
VIDEO 6: WHERE TO GET STARTED
                Mentions resources available online as well as a hands-on teaching session available biannually at Stanford or Cambridge, for prospective users to be trained on FPGA and perform a project.
PROJECT VIDEO 1: BUFFER SIZING IN INTERNET ROUTERS
                Review of purposes of buffers. Common buffer size of a router with 10G link is 1 million packets. The rule of thumb is that Buffer size = RTT*C.
RTT = (average two-way delay between traffic sender and receiver)
C = (output link bandwidth)
                Larger buffers will be good for reducing the number of packets dropped, but oversized buffers are undesirable because of the higher complexity, more queuing delay, cost, and power consumption associated with it.
                More than 90% of internet traffic is TCP, which has a closed-loop congestion control mechanism. TCP controls transmission rate by modifying “window size”. If there are a larger number of TCP flows, the buffer size required becomes smaller. If the number of flows is great enough, the buffer size can be reduced by a factor of the square root of the number of flows without damaging throughput. With 10,000 flows, the required buffer size could be reduced to 10,000.
                This could be reduced even more on fast backbone networks that are connected to lower speed networks. The buffer size in this case could be lowered to the order of log(W) where W is the window size. It does not vary with number of flows.  In this example, 20-50 packets would be enough for 90% throughput, rising to nearly 100% by 10,000 as before.
                NetFPGA’s buffers can be changed with extreme precision (1 byte at a time, if desired). The lecturer goes on to demonstrate the event capture software by showing how one can manually tune the buffer size and the number of TCP flows over a network, and have the system automatically create data points for the 99% throughput rate for a given flow number and buffer size. In the case of their example, with 200 flows, just 20 packets of buffer space is required.
PROJECT VIDEO 2: OPENPIPES- HARDWARE SYSTEM DESIGN WITH OPENFLOW
                OpenPipes is a tool that will distribute complex designs among several subsystems, for example FPGAs and CPUs. OpenFlow is a tool that gives the user control over routing traffic through the network. A controller for the switches could be implemented in an FPGA, ASIC, or even CPU.
                OpenPipes is capable of testing assistance, by simultaneously testing the implementation of a software and hardware design, and feed the results into a comparison module that verifies the results are the same.
                In an example, the lecturer shows how OpenPipes allows the user to modify a running system by changing the flow tables inside the system. OpenPipes has a GUI that can display all the available hosts and switches on the network. From one location (say, Stanford) one could download hardware modules from the local host to the locations in Houston and LA. This software enables hardware on different physical machines to be utilized at the same time.
                In summary, the OpenPipes platform can perform the following functions:
1.       Partition a design across resources
2.       Modify a running system
3.       Utilize a mix of different resources
4.       Assist in the testing and verification process

                

No comments:

Post a Comment