13 june 2004 William Welch

JAL ethernet project, theory of operation, and code walk-through.

This first attempt at a write-up, may ramble around a bit. Eventually it will take some sort of shape and hopefully be of use.

What is the point of the project? To provide a starting point, for the JAL community, for ethernet, and in particular internet software.

What is the approach? Simplicity. We rely on the robustness of the "real" internet hosts, to make up for our lack of features and robustness.

Where it is needed, we will develop additional features to deal with problems that may happen. But we will see just how much we can accomplish, with as little software as possible.

The JAL ethernet does not use any hardware timers. Perhaps it should, but we will keep it simple for now. Also, the JAL ethernet is "single-threaded", in that we are only processing one packet at a time. We do allow seemingly concurent access of ARP, Ping, UDP, Telnet, and HTTP, but it is done in a simple, single-threaded way.

You may be surprised, to find out that a "server" is simpler to make than a "client", but it is true, as we shall see. A server doesn't speak unless spoken to, generally. This means that we can get away without implementing the ARP request and cache.

Also, it means that we can, in nearly every case, just "turn around" an incoming packet, and send it back to the client. We have to swap source and destination fields around, fill in our MAC, and of course, what ever application-specific data the client has requested, if any.

The chip we are using is the so-called "NE2000 compatible". The original NE2000 board used the National Semiconductor DP8390 chipset, and was a plug-in card for the IBM PC. One feature about the PC's "ISA" bus, was that it was, at first, only an 8-bit bus, so the NE2000 card had to support 8-bit mode. This is good news for us, to keep the interface simple-- 8 data lines, 5 address bits, read, write, and reset signals.

Note: Other cards and chips may be used, if we are willing to re-write a few subroutines in "net.jal". More about this topic later.

Why three versions of the code? Well, when I first started this project, I intended to implement only ARP, ICMP (Ping), and UDP. I had done a similar project, in C, a few years ago at a small company, so I had confidence that it could be done. Also, since JAL does not directly support 16bit indexes, I figured "small" packets would be OK. There was, however, the question of which picmicro chip to use. I had been looking for a chance to try out the 18F452, so I chose that chip. However, I did have in mind supporting the F877 also, in case the 18F452 didn't work out, and besides the F877 is cheaper.

The reaction to the first UDP-only version was fairly negative. I think partly due to the fact that TCP is so widely used and taken for-granted. Right away, it was noticed that a fast PC could over-run the ne2000 receive packet buffer.

So, I decided to attempt a "tiny" subset of TCP that would at least avoid the overflow problem. A fundamental part of TCP is that the receiving computer gets to specify some flow-control parameters, right at the start of a connection, that we could leverage to our advantage, and still be completely "legal". It didn't take long in discussing TCP, that folks started wondering if small packets would be acceptable, so I also began a version with "large packet" support.

I guess this might be a good place to talk a little bit more in depth about the NE2000. It does not handle overflow of the receive buffer very well. In fact, no two implementations, or even datasheets, seem to be alike in what to do. So, we can expect to have to work on this area of the software, from time to time.

And I guess this is a good place to describe the "key" routines in "net.jal", which I think of as the NE2000 "driver". I intended to be real strict with the contents of this file, so that someone could easily replace it for a different chipset, but it has grown to include a fair amount of general ethernet, and even some IP software. Mostly due to lazyiness on my part. But also due to issues of stack levels, and possibly efficiency/performance.

So, we will hop around in the net.jal source code for a few minutes. If we ignore the boring stuff about the ISA bus cycles, chip init, reset, etc, we are left (I am referring to the UDP-only version of the code here) with "net_txpkt", which transmits an ethernet packet, net_rxpkt, which receives a packet, and net_poll, which mostly justs calls net_rxpkt when a packet is available, but also trys to deal with overflow problems and such.

Next, a few words about what is in a JAL ethernet packet. The NE2000 operates at a fairly low level, and actually, when the NE2000 originally came out, was not used for TCP/IP at all. The "N" stands for "Novell", who had their own proprietary protocols. The TCP/IP protocols were still being hammered out in meetings, etc. Amazingly, it took 15 years or so before they got it all figured out and things really started taking off. I recall struggling very hard to find any software for TCP/IP in the 80's and even into the early 1990's, it was not a standard part of Windows.

So, back to the NE2000. From its point of view, there is a 48 bit source field, a 48 bit destination field, some application-specfic data bits, and a hardware CRC (32 bits I think). Thats it. Well, there is a little more. Some of the bits in the destination field are special, and can be used for broadcasting, and also for multi-casting.

Next, this is where most folks start talking about the 7 layer ISO networking model. Sorry, not going to do that here. We'll just say that many, many more meetings were held, and we ended up with 14 byte "ethernet" header that you will find us working with in JAL ethernet. The first 12 bytes we already covered just now (source & dest). The other two bytes are called the "ethertype" and originally, just told you which proprietary network this packet belonged two. Of course, with only 16 bits, this didn't last long, so even more people went to meetings, and we ended up with only 2 different "ethertypes" that JAL ethernet cares about. Any others, well, sorry, you are on your own, have a nice day!

ARP has its own ethertype. As a "server", we have to listen for it, and promptly reply with our MAC address. Of course we make sure that it is actually intended for us, and not for someone else.

I forgot to say, the original ethernet was coax, shared by all, half-duplex, and collisions were commonplace. That may be part of the reason that the NE2000 requires a minimum packet length of 60 bytes. Nowadays, with "switches", you probably will never hear any packets that are not for JAL ethernet, except for "broadcasts". Windows "network neighborhood" is famous for blasting out periodic messages. Perfectly legal, but really wiped me out when their packet was bigger than my buffers. Ouch.

OK, now then we have the 802.3 "ethertype". When we see this, we know that we can expect an "IP" header. You might be suprised at how short the RFC for "IP" is. But really, there isn't much to IP by itself. And, in keeping to our "simple" approach, we only support the most common form of IP packet. IP, version 4, is what we support. We also do not implement "fragmentation", and probably will never need to do so. It generally "doesn't happen" on an ethernet LAN, unless there is some sort of configuration error on the part of the system administrator.

As a "simple" server, we are pretty much going to be at the "end" of a network. What I mean is, all we care about, are incoming IP packets that are addressed to us specifically. So, by simply "turning around" the IP packet, we can reply to it.

One important field in the IP header is the "protocol" field. This tells us if the packet is ICMP (ping), UDP, or TCP. Anything else-- you are on your own.

Network Byte Order-- when you are looking at the JAL ethernet source code, or studying a RFC, you will find some fields are more than 1 byte wide. Maybe 2 bytes, or 4 bytes. So, how do we know what to expect. Well, there is a rule, in the RFC for IP, that says that the most significant byte comes first in the packet or header. In JAL ethernet, if study the "net_rxpkt" routine, you will see how the packet comes in via the 8-bit NE2000 I/O port, and then is stored in the "netbuf" (see below). Take for example, an incoming IP packet, that is "addressed" to our JAL ethernet board. The IP "destination" address, will have the "192" as the first byte that we read from the NE2000, then the "168", then the "0", and then the "11".

Also, you will see, in the RFC, that pretty much everywhere, the decimal radix is used. But often in JAL ethernet, you will see hexidecimal. It is tedious, but just be careful and all will be OK.

Back in the early days, when the internet was small, and not available to anyone except business and universities, "PING" was required to be operational, and was used to troubleshoot networking problems. Nowadays, most systems ignore it completely, to avoid hackers from even knowing that their computer even exists. But we implement it, since it is handy for us to use, within our own LAN.

I guess we may as well touch on the "internet checksum". This little problem has caused a lot of gray hairs over the years. It is a simple concept, but "the devil is in the details". After struggling several evenings to write a simple JAL version, we gave up, and with Javi's "add32" routine, we implemented the "textbook" version of the checksum.

You may well ask, if the ethernet does a hardware CRC, why bother with an IP checksum at all? Well, we can't assume that the IP packet that we received, has lived its entire life on our own LAN. It may have passed thru several different computers, routers, modems, etc before arriving in JAL ethernet. So it is good to checksum the incoming packet before we "trust" its contents.

By the way, our 'F877 does *not* bother with checking the IP header. The other two projects *do* check the checksum on incoming packets.

For transmitting, we have no choice, we must implement the IP header checksum. So, we do.

There is not much to say about UDP. We implement it, and it is (by design) connection-less and lossy. However, I have done a lot of good work with just UDP. Plenty good for simple projects. But beware that you may not get every packet, and some of your replies may get lost. But you can develop your own custom protocol on top of UDP, and probably much simpler and smaller amount of code, than TCP.

TCP, well, we will give it a section of its own.

Turning back to the source code, we can discuss "netbuf.jal". This is one part in the project, that I wish had been done differently. I wanted to shield the rest of the code from the specifics of picmicro RAM, so we came up with netbuf_read and netbuf_write. But, it was probably a mistake to use those same two functions, both for ethernet packet contents, and variable storage, for example 32bit variables used for the internet checksum.

Anyway, let's take a look at the current layout of the "netbuf" memory (I am referring to the "small packet" TCP version of the project). The ethernet "packet" is in the lowest portion of the "netbuf". Originally, it was hard-wired at offset zero, but later I added "pkt_base" to allow experimenting with moving the packet around. As it turned out, maybe it wasn't that important anyway. What was, and is, important, is to guard against allowing an incoming packet to overflow the the area defined for the packet, because if it does, it will write-over variables and such, with disasterous results.

You may be wondering, OK, so whats the big deal? Well, the problem is, that during the first steps of an incoming TCP "connection", the client at the other end may choose to supply all manner of extra TCP "options". Now then, we don't support much of anything in the way of options, except for the MSS, but we have no control over what the other guy may decide to "offer", so we must be ready to except an extra large TCP "header". So, instead of the usual 20 byte TCP, it might be much larger. Now then, we have 14 bytes for the ethernet header, 20 bytes for IP header, and then 20 bytes minimum for TCP, giving us 54 bytes at least, devoted to headers. I increased this to 70 bytes to allow for "options".

Now then, there are a number of variables in the upper part of the "netbuf" area. Since we are using 8-bit indexes, this means the highest we can place the variables is up to an index of 255.

In the middle, sandwiched by "headers" below, and "variables" above, is the area for the TCP, and UDP "payload", that is, the area usable by applications to move data back and forth over the ethernet.

I refer to this area as the "tcp_maxsegsize", which we will discuss later in the TCP section. Anyway, all of the "variables", are defined as being relative to the end of the tcp_maxsegsize. As it turns out, as new variables needed to be added, we had to keep decreasing the size of tcp_maxsegsize.

You may have noticed that the "udp_pseudo" header area, and the "tcp_pseudo" header area, are actually just two names for the same memory area. This is OK, since we are just using this area in briefly for a calculation of the UDP or TCP checksum.

It may have been overly ambitious, but we decided early on, to support a telnet and a http connection at the same time. In order to do this, we need to save some "context" information. Since TCP has some 32 bit fields, this context area is fairly large, and in fact, may need to grow larger. When it does, sadly, the tcp_maxsegsize will have to be reduced again.

TCP is implemented in tcp.jal. In typical JAL fashion, the "action" takes place down near the bottom of the file, in the routine "tcp_rx_event_processor". Wierd name, I know. Anyway, this routine is called every time an incoming TCP packet arrives. For the "state machine" and "variables", I tried to use names very similar to the terminology in RFC 793. There is a big section of the RFC devoted to "event processing". We implement a lot of it, and may need to implement more of it before we have a truly useful and robust system in JAL ethernet.

We have implemented "servers", or what the RFC calls "passive open". This means that we hang around in the "listen" state. In order to implement reliable, full-duplex transfer of data, TCP uses a somewhat complicated system of variables. For the moment, we will ignore the details of how a connection is established, and discuss "steady state" operation.

Whenever the program on either end of the connection wishes to send some data, it also sends an "acknowledgement" of the most recently, correctly received data from the opposite direction. In order to handle the possibility of duplicate, and also missing data, every packet of data is assigned a "sequence number". For some reason, the RFC decided to number every single byte of data with its own 32bit "sequence number". That's right, it is not a "packet number", but a "byte" number.

Because of a number of complicated scenarios, which I will try to explain later, the initial, starting "sequence number" is a "random" 32bit number. From then on, for every byte that is sent, the "sequence number" is incremented. Note that this number is unsigned 32bit, and it is OK, normal, even, for it to "wrap around" zero, at any time. Our JAL ethernet will need more work before we really completely handle this situation properly. But more about that later.

OK, so when we want to send some data, say part of a "web page", we include a "sequence number". The number in the packet (the RFC calls a "packet" a "segment". I don't know why, but they do). So, the sequence number, named seg_seq in the RFC, in the TCP header is the value of the "sequence number" matching the first byte of the segment/packet.

If the program on the other end of the connection, receives the segment OK, then the next time he sends us a message, his "acknowledgement" field, named "seg_ack", will contain the "sequence number" of the *last* byte he has successfully received, plus one. That is, his "seg_ack" has the value of the "sequence number" that he expects us to use, for the *next* packet/segment.

Are we having fun yet? No, well imagine how much fun I had. ha ha. Anyway, just go play with ethereal (you do have ethereal, right? it is a "must have" program for sniffing ethernet packets). Learn where the "seg_ack" and "seg_seq" fields are, and how they work.

One extra complexity, but very handy, is that every packet has some "flag bits". Certain important flag bits (named SYN and FIN), are treated as though they are data bytes. What I mean is, they are counted along with the data bytes, when computing and checking the "seg_seq" and "seg_ack". This may seem wierd at first, but really, it is cool. It allows those important "flags" to get a "free ride", in the event that a packet gets lost and needs to be re-transmitted. These flag bits have an "imaginary position" in the packet/segment. IIRC, the "SYN" takes the position of the first byte in the packet, and the "FIN" takes the last position in the packet. I need to check this to make sure though.

Stay tuned, more later.