taikun.cloud

Taikun Logo

Taikun OCP Guide

Table of Contents

Basic networking

Ethernet

Ethernet is a networking protocol, specified by the IEEE 802.3
standard. Most wired network interface cards (NICs) communicate using
Ethernet.

In the OSI
model
of networking protocols, Ethernet occupies the second layer,
which is known as the data link layer. When discussing Ethernet, you
will often hear terms such as local network, layer 2,
L2, link layer and data link layer.

In an Ethernet network, the hosts connected to the network
communicate by exchanging frames. Every host on an Ethernet
network is uniquely identified by an address called the media access
control (MAC) address. In particular, every virtual machine instance in
an OpenStack environment has a unique MAC address, which is different
from the MAC address of the compute host. A MAC address has 48 bits and
is typically represented as a hexadecimal string, such as
08:00:27:b9:88:74. The MAC address is hard-coded into the
NIC by the manufacturer, although modern NICs allow you to change the
MAC address programmatically. In Linux, you can retrieve the MAC address
of a NIC using the ip command:

$ ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
     link/ether 08:00:27:b9:88:74 brd ff:ff:ff:ff:ff:ff

Conceptually, you can think of an Ethernet network as a single bus
that each of the network hosts connects to. In early implementations, an
Ethernet network consisted of a single coaxial cable that hosts would
tap into to connect to the network. However, network hosts in modern
Ethernet networks connect directly to a network device called a
switch. Still, this conceptual model is useful, and in network
diagrams (including those generated by the OpenStack dashboard) an
Ethernet network is often depicted as if it was a single bus. You’ll
sometimes hear an Ethernet network referred to as a layer 2
segment
.

In an Ethernet network, every host on the network can send a frame
directly to every other host. An Ethernet network also supports
broadcasts so that one host can send a frame to every host on the
network by sending to the special MAC address
ff:ff:ff:ff:ff:ff. ARP and DHCP are two notable protocols that use Ethernet
broadcasts. Because Ethernet networks support broadcasts, you will
sometimes hear an Ethernet network referred to as a broadcast
domain
.

When a NIC receives an Ethernet frame, by default the NIC checks to
see if the destination MAC address matches the address of the NIC (or
the broadcast address), and the Ethernet frame is discarded if the MAC
address does not match. For a compute host, this behavior is undesirable
because the frame may be intended for one of the instances. NICs can be
configured for promiscuous mode, where they pass all Ethernet
frames to the operating system, even if the MAC address does not match.
Compute hosts should always have the appropriate NICs configured for
promiscuous mode.

As mentioned earlier, modern Ethernet networks use switches to
interconnect the network hosts. A switch is a box of networking hardware
with a large number of ports that forward Ethernet frames from one
connected host to another. When hosts first send frames over the switch,
the switch doesn’t know which MAC address is associated with which port.
If an Ethernet frame is destined for an unknown MAC address, the switch
broadcasts the frame to all ports. The switch learns which MAC addresses
are at which ports by observing the traffic. Once it knows which MAC
address is associated with a port, it can send Ethernet frames to the
correct port instead of broadcasting. The switch maintains the mappings
of MAC addresses to switch ports in a table called a forwarding
table
or forwarding information base (FIB). Switches can
be daisy-chained together, and the resulting connection of switches and
hosts behaves like a single network.

VLANs

VLAN is a networking technology that enables a single switch to act
as if it was multiple independent switches. Specifically, two hosts that
are connected to the same switch but on different VLANs do not see each
other’s traffic. OpenStack is able to take advantage of VLANs to isolate
the traffic of different projects, even if the projects happen to have
instances running on the same compute host. Each VLAN has an associated
numerical ID, between 1 and 4094. We say “VLAN 15” to refer to the VLAN
with a numerical ID of 15.

To understand how VLANs work, let’s consider VLAN applications in a
traditional IT environment, where physical hosts are attached to a
physical switch, and no virtualization is involved. Imagine a scenario
where you want three isolated networks but you only have a single
physical switch. The network administrator would choose three VLAN IDs,
for example, 10, 11, and 12, and would configure the switch to associate
switchports with VLAN IDs. For example, switchport 2 might be associated
with VLAN 10, switchport 3 might be associated with VLAN 11, and so
forth. When a switchport is configured for a specific VLAN, it is called
an access port. The switch is responsible for ensuring that the
network traffic is isolated across the VLANs.

Now consider the scenario that all of the switchports in the first
switch become occupied, and so the organization buys a second switch and
connects it to the first switch to expand the available number of
switchports. The second switch is also configured to support VLAN IDs
10, 11, and 12. Now imagine host A connected to switch 1 on a port
configured for VLAN ID 10 sends an Ethernet frame intended for host B
connected to switch 2 on a port configured for VLAN ID 10. When switch 1
forwards the Ethernet frame to switch 2, it must communicate that the
frame is associated with VLAN ID 10.

If two switches are to be connected together, and the switches are
configured for VLANs, then the switchports used for cross-connecting the
switches must be configured to allow Ethernet frames from any VLAN to be
forwarded to the other switch. In addition, the sending switch must tag
each Ethernet frame with the VLAN ID so that the receiving switch can
ensure that only hosts on the matching VLAN are eligible to receive the
frame.

A switchport that is configured to pass frames from all VLANs and tag
them with the VLAN IDs is called a trunk port. IEEE 802.1Q is
the network standard that describes how VLAN tags are encoded in
Ethernet frames when trunking is being used.

Note that if you are using VLANs on your physical switches to
implement project isolation in your OpenStack cloud, you must ensure
that all of your switchports are configured as trunk ports.

It is important that you select a VLAN range not being used by your
current network infrastructure. For example, if you estimate that your
cloud must support a maximum of 100 projects, pick a VLAN range outside
of that value, such as VLAN 200–299. OpenStack, and all physical network
infrastructure that handles project networks, must then support this
VLAN range.

Trunking is used to connect between different switches. Each trunk
uses a tag to identify which VLAN is in use. This ensures that switches
on the same VLAN can communicate.

Subnets and ARP

While NICs use MAC addresses to address network hosts, TCP/IP
applications use IP addresses. The Address Resolution Protocol (ARP)
bridges the gap between Ethernet and IP by translating IP addresses into
MAC addresses.

IP addresses are broken up into two parts: a network number
and a host identifier. Two hosts are on the same
subnet if they have the same network number. Recall that two
hosts can only communicate directly over Ethernet if they are on the
same local network. ARP assumes that all machines that are in the same
subnet are on the same local network. Network administrators must take
care when assigning IP addresses and netmasks to hosts so that any two
hosts that are in the same subnet are on the same local network,
otherwise ARP does not work properly.

To calculate the network number of an IP address, you must know the
netmask associated with the address. A netmask indicates how
many of the bits in the 32-bit IP address make up the network
number.

There are two syntaxes for expressing a netmask:

  • dotted quad
  • classless inter-domain routing (CIDR)

Consider an IP address of 192.0.2.5, where the first 24 bits of the
address are the network number. In dotted quad notation, the netmask
would be written as 255.255.255.0. CIDR notation includes
both the IP address and netmask, and this example would be written as
192.0.2.5/24.

Note

Creating CIDR subnets including a multicast address or a loopback
address cannot be used in an OpenStack environment. For example,
creating a subnet using 224.0.0.0/16 or
127.0.1.0/24 is not supported.

Sometimes we want to refer to a subnet, but not any particular IP
address on the subnet. A common convention is to set the host identifier
to all zeros to make reference to a subnet. For example, if a host’s IP
address is 192.0.2.24/24, then we would say the subnet is
192.0.2.0/24.

To understand how ARP translates IP addresses to MAC addresses,
consider the following example. Assume host A has an IP address
of 192.0.2.5/24 and a MAC address of
fc:99:47:49:d4:a0, and wants to send a packet to host
B with an IP address of 192.0.2.7. Note that the
network number is the same for both hosts, so host A is able to
send frames directly to host B.

The first time host A attempts to communicate with host
B, the destination MAC address is not known. Host A
makes an ARP request to the local network. The request is a broadcast
with a message like this:

To: everybody (ff:ff:ff:ff:ff:ff). I am looking for the computer
who has IP address 192.0.2.7. Signed: MAC address
fc:99:47:49:d4:a0
.

Host B responds with a response like this:

To: fc:99:47:49:d4:a0. I have IP address 192.0.2.7. Signed: MAC
address 54:78:1a:86:00:a5.

Host A then sends Ethernet frames to host B.

You can initiate an ARP request manually using the arping command. For
example, to send an ARP request to IP address
192.0.2.132:

$ arping -I eth0 192.0.2.132
ARPING 192.0.2.132 from 192.0.2.131 eth0
Unicast reply from 192.0.2.132 [54:78:1A:86:1C:0B]  0.670ms
Unicast reply from 192.0.2.132 [54:78:1A:86:1C:0B]  0.722ms
Unicast reply from 192.0.2.132 [54:78:1A:86:1C:0B]  0.723ms
Sent 3 probes (1 broadcast(s))
Received 3 response(s)

To reduce the number of ARP requests, operating systems maintain an
ARP cache that contains the mappings of IP addresses to MAC address. On
a Linux machine, you can view the contents of the ARP cache by using the
arp command:

$ arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
192.0.2.3                ether   52:54:00:12:35:03   C                     eth0
192.0.2.2                ether   52:54:00:12:35:02   C                     eth0

DHCP

Hosts connected to a network use the Dynamic Host Configuration
Protocol (DHCP) to dynamically obtain IP addresses. A DHCP server hands
out the IP addresses to network hosts, which are the DHCP clients.

DHCP clients locate the DHCP server by sending a UDP packet from port 68 to address
255.255.255.255 on port 67. Address
255.255.255.255 is the local network broadcast address: all
hosts on the local network see the UDP packets sent to this address.
However, such packets are not forwarded to other networks. Consequently,
the DHCP server must be on the same local network as the client, or the
server will not receive the broadcast. The DHCP server responds by
sending a UDP packet from port 67 to port 68 on the client. The exchange
looks like this:

  1. The client sends a discover (“I’m a client at MAC address
    08:00:27:b9:88:74, I need an IP address”)
  2. The server sends an offer (“OK 08:00:27:b9:88:74, I’m
    offering IP address 192.0.2.112“)
  3. The client sends a request (“Server 192.0.2.131, I
    would like to have IP 192.0.2.112“)
  4. The server sends an acknowledgement (“OK
    08:00:27:b9:88:74, IP 192.0.2.112 is
    yours”)

OpenStack uses a third-party program called dnsmasq to
implement the DHCP server. Dnsmasq writes to the syslog, where you can
observe the DHCP request and replies:

Apr 23 15:53:46 c100-1 dhcpd: DHCPDISCOVER from 08:00:27:b9:88:74 via eth2
Apr 23 15:53:46 c100-1 dhcpd: DHCPOFFER on 192.0.2.112 to 08:00:27:b9:88:74 via eth2
Apr 23 15:53:48 c100-1 dhcpd: DHCPREQUEST for 192.0.2.112 (192.0.2.131) from 08:00:27:b9:88:74 via eth2
Apr 23 15:53:48 c100-1 dhcpd: DHCPACK on 192.0.2.112 to 08:00:27:b9:88:74 via eth2

When troubleshooting an instance that is not reachable over the
network, it can be helpful to examine this log to verify that all four
steps of the DHCP protocol were carried out for the instance in
question.

IP

The Internet Protocol (IP) specifies how to route packets between
hosts that are connected to different local networks. IP relies on
special network hosts called routers or gateways. A
router is a host that is connected to at least two local networks and
can forward IP packets from one local network to another. A router has
multiple IP addresses: one for each of the networks it is connected
to.

In the OSI model of networking protocols IP occupies the third layer,
known as the network layer. When discussing IP, you will often hear
terms such as layer 3, L3, and network
layer
.

A host sending a packet to an IP address consults its routing
table
to determine which machine on the local network(s) the packet
should be sent to. The routing table maintains a list of the subnets
associated with each local network that the host is directly connected
to, as well as a list of routers that are on these local networks.

On a Linux machine, any of the following commands displays the
routing table:

$ ip route show
$ route -n
$ netstat -rn

Here is an example of output from ip route show:

$ ip route show
default via 192.0.2.2 dev eth0
192.0.2.0/24 dev eth0  proto kernel  scope link  src 192.0.2.15
198.51.100.0/25 dev eth1  proto kernel  scope link  src 198.51.100.100
198.51.100.192/26 dev virbr0  proto kernel  scope link  src 198.51.100.193

Line 1 of the output specifies the location of the default route,
which is the effective routing rule if none of the other rules match.
The router associated with the default route (192.0.2.2 in
the example above) is sometimes referred to as the default
gateway
. A DHCP server typically transmits the
IP address of the default gateway to the DHCP client along with the
client’s IP address and a netmask.

Line 2 of the output specifies that IPs in the
192.0.2.0/24 subnet are on the local network associated
with the network interface eth0.

Line 3 of the output specifies that IPs in the
198.51.100.0/25 subnet are on the local network associated
with the network interface eth1.

Line 4 of the output specifies that IPs in the
198.51.100.192/26 subnet are on the local network
associated with the network interface virbr0.

The output of the route -n and netstat -rn commands are formatted in a slightly
different way. This example shows how the same routes would be formatted
using these commands:

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.0.2.2       0.0.0.0         UG        0 0          0 eth0
192.0.2.0       0.0.0.0         255.255.255.0   U         0 0          0 eth0
198.51.100.0    0.0.0.0         255.255.255.128 U         0 0          0 eth1
198.51.100.192  0.0.0.0         255.255.255.192 U         0 0          0 virbr0

The ip route get
command outputs the route for a destination IP address. From the below
example, destination IP address 192.0.2.14 is on the local
network of eth0 and would be sent directly:

$ ip route get 192.0.2.14
192.0.2.14 dev eth0  src 192.0.2.15

The destination IP address 203.0.113.34 is not on any of
the connected local networks and would be forwarded to the default
gateway at 192.0.2.2:

$ ip route get 203.0.113.34
203.0.113.34 via 192.0.2.2 dev eth0  src 192.0.2.15

It is common for a packet to hop across multiple routers to reach its
final destination. On a Linux machine, the traceroute and
more recent mtr programs prints out the IP address of each
router that an IP packet traverses along its path to its
destination.

TCP/UDP/ICMP

For networked software applications to communicate over an IP
network, they must use a protocol layered atop IP. These protocols
occupy the fourth layer of the OSI model known as the transport
layer
or layer 4. See the Protocol
Numbers
web page maintained by the Internet Assigned Numbers
Authority (IANA) for a list of protocols that layer atop IP and their
associated numbers.

The Transmission Control Protocol (TCP) is the most commonly
used layer 4 protocol in networked applications. TCP is a
connection-oriented protocol: it uses a client-server model
where a client connects to a server, where server refers to the
application that receives connections. The typical interaction in a
TCP-based application proceeds as follows:

  1. Client connects to server.
  2. Client and server exchange data.
  3. Client or server disconnects.

Because a network host may have multiple TCP-based applications
running, TCP uses an addressing scheme called ports to uniquely
identify TCP-based applications. A TCP port is associated with a number
in the range 1-65535, and only one application on a host can be
associated with a TCP port at a time, a restriction that is enforced by
the operating system.

A TCP server is said to listen on a port. For example, an
SSH server typically listens on port 22. For a client to connect to a
server using TCP, the client must know both the IP address of a server’s
host and the server’s TCP port.

The operating system of the TCP client application automatically
assigns a port number to the client. The client owns this port number
until the TCP connection is terminated, after which the operating system
reclaims the port number. These types of ports are referred to as
ephemeral ports.

IANA maintains a registry
of port numbers
for many TCP-based services, as well as services
that use other layer 4 protocols that employ ports. Registering a TCP
port number is not required, but registering a port number is helpful to
avoid collisions with other services. See firewalls
and default ports
in OpenStack Installation Guide for the default
TCP ports used by various services involved in an OpenStack
deployment.

The most common application programming interface (API) for writing
TCP-based applications is called Berkeley sockets, also known
as BSD sockets or, simply, sockets. The sockets API
exposes a stream oriented interface for writing TCP
applications. From the perspective of a programmer, sending data over a
TCP connection is similar to writing a stream of bytes to a file. It is
the responsibility of the operating system’s TCP/IP implementation to
break up the stream of data into IP packets. The operating system is
also responsible for automatically retransmitting dropped packets, and
for handling flow control to ensure that transmitted data does not
overrun the sender’s data buffers, receiver’s data buffers, and network
capacity. Finally, the operating system is responsible for re-assembling
the packets in the correct order into a stream of data on the receiver’s
side. Because TCP detects and retransmits lost packets, it is said to be
a reliable protocol.

The User Datagram Protocol (UDP) is another layer 4 protocol
that is the basis of several well-known networking protocols. UDP is a
connectionless protocol: two applications that communicate over
UDP do not need to establish a connection before exchanging data. UDP is
also an unreliable protocol. The operating system does not
attempt to retransmit or even detect lost UDP packets. The operating
system also does not provide any guarantee that the receiving
application sees the UDP packets in the same order that they were sent
in.

UDP, like TCP, uses the notion of ports to distinguish between
different applications running on the same system. Note, however, that
operating systems treat UDP ports separately from TCP ports. For
example, it is possible for one application to be associated with TCP
port 16543 and a separate application to be associated with UDP port
16543.

Like TCP, the sockets API is the most common API for writing
UDP-based applications. The sockets API provides a
message-oriented interface for writing UDP applications: a
programmer sends data over UDP by transmitting a fixed-sized message. If
an application requires retransmissions of lost packets or a
well-defined ordering of received packets, the programmer is responsible
for implementing this functionality in the application code.

DHCP, the Domain Name System (DNS), the Network
Time Protocol (NTP), and VXLAN are examples of UDP-based protocols used in
OpenStack deployments.

UDP has support for one-to-many communication: sending a single
packet to multiple hosts. An application can broadcast a UDP packet to
all of the network hosts on a local network by setting the receiver IP
address as the special IP broadcast address
255.255.255.255. An application can also send a UDP packet
to a set of receivers using IP multicast. The intended receiver
applications join a multicast group by binding a UDP socket to a special
IP address that is one of the valid multicast group addresses. The
receiving hosts do not have to be on the same local network as the
sender, but the intervening routers must be configured to support IP
multicast routing. VXLAN is an example of a UDP-based protocol that uses
IP multicast.

The Internet Control Message Protocol (ICMP) is a protocol
used for sending control messages over an IP network. For example, a
router that receives an IP packet may send an ICMP packet back to the
source if there is no route in the router’s routing table that
corresponds to the destination address (ICMP code 1, destination host
unreachable) or if the IP packet is too large for the router to handle
(ICMP code 4, fragmentation required and “don’t fragment” flag is
set).

The ping and
mtr Linux
command-line tools are two examples of network utilities that use
ICMP.