User’s Documentation

Programmer’s Documentation (Internals)

If you plan on hacking LSCTL, this chapter is for you. It will describe the available internal APIs and how they interact.

Project organization (components)

The core of LSDN is the lsdn library (liblsdn.so), which implements all of the C API – the netmodel handling and the individual network types. The library itself relies on libmnl library for netlink communication helpers, libjson for its ability to dump netmodel into JSON and uthash for hash tables.

The command-line tools (lsctl and lsctld) are built upon our lsdn-tclext library, which provides the lsctl language engine and is layered on the C API. For more info, see Command-line.

The lsdn library itself is composed of several layers/components (see Fig. 3 for illustration). At the bottom layer, we have several mostly independent utility components:

  • nettypes.c manipulates, parses and prints IP addresses and MAC addresses
  • nl.c provides functions do to more complex netlink tasks than libmnl provides - create interfaces, manipulate QDiscs, filters etc.
  • names.c provides naming tables for netmodel objects, so that we can find physes, virts etc. by name
  • log.c simple logging to stderr governed by the LSDN_DEBUG environment variable
  • errors.c contains lsdn_err_t error codes and infrastructure for reporting commit problems (which do not use simple lsdn_err_t errors). The actual problem reporting relies on the netmodel lsdn_context.
  • list.h embedded linked-list implementation (every C project needs its own :) )

The netmodel core (in net.c and lsdn.c) is responsible for maintaining the network model and managing its life-cycle (more info in Netmodel implementation).

For this, it relies on the rules (in rules.c) system, which helps you manage a chain of TC flower filters and their rules. The system also allows the firewall rules (given by the user) and the routing rules (defined by the virtual network topology) to share the same flower table. However, the sharing is currently not done, because we instead opted to share the routing table among all virts connected through the given phys instead. Since firewall rules are per-virt, they can not live in the shared table. Another function of this module is that it helps us overcome the limit of having at most 32 actions in the kernel for our broadcast rules.

The netmodel core only manages the aspects common to all network types – life cycle, firewall rules and QoS, but calls back to a concrete network type plugin for constructing the virtual network. This is done through the lsdn_net_ops structure and is described more thoroughly in How to support a new network type.

The currently supported network types are in net_direct.c, net_vlan.c, net_vxlan.c (all types of VXLANs) and net_geneve.c. Depending on the type of the network (learning vs static), the network implementations rely on either lbridge.c, a Linux learning bridge, or sbridge.c, a static bridge constructed from TC rules. The Linux bridge is pretty self-explanatory, but you can read out more about the TC rule madness in Static bridge.

Finally, liblsdn also has support for dumping the netmodel in LSCTL and JSON formats, to be either used as configuration files or consumed by other applications (in dump.c).

digraph layering {
node [shape=record]
compound = true

lsctlc [label = <<i>lsctlc</i> program>]
lsctld [label = <<i>lsctld</i> program>]
lsctl [label = <<i>lsctl</i> program>]
tclext [label = <<i>lsctl-tclext</i> library>]

subgraph cluster_liblsdn {
    label = <<i>lsdn</i> library>
    color = black

    json_dump [label = "JSON dump"]
    lsctl_dump [label = "lsctl dump"]
    netmodel
    vlan
    vxlan_static [label = "static vxlan"]
    vxlan_e2e [label = "e2e vxlan"]
    vxlan_mcast [label = "mcast vxlan"]
    geneve
    direct
    sbridge
    lbridge
    rules
    subgraph cluster_util {
        label = <utility modules>;
        list
        error
        log
        names
        nl
        nettypes
        list
    }
}

lsctl_dump -> json_dump
json_dump -> netmodel
lsctld -> tclext
lsctl -> tclext
tclext -> netmodel
netmodel -> {vlan vxlan_static vxlan_e2e vxlan_mcast geneve direct}
{vlan vxlan_e2e vxlan_mcast} -> lbridge
{vxlan_static geneve} -> sbridge
sbridge -> rules
netmodel -> rules


# Layout hacks

# Needed not to render tools parallel with subgraph in parallel
tclext -> lsctl_dump [style=invis]

rules -> list [style=invis ltail=cluster_util]
}

Components and dependencies

Netmodel implementation

The network model (in lsdn.c) provides functions that are not specific to any network type. This includes QoS, firewall rules and basic validation.

Importantly, it also provides the state management needed for implementing the commit functionality, which is important for the overall ease-of-use of the C API. The network model layer must keep track of both the current state of the network model and what is committed. Also it tracks which objects have changed attributes and need to be updated. Finally, it keeps track of objects that were deleted by the user, but are still commited.

For this, it is important to understand a life-cycle of an object, illustrated in Fig. 4.

digraph states {
T [shape = point ];
NEW; RENEW; DELETE; OK; free
T -> NEW [color = "blue"];
NEW -> NEW [label = "update", color = "blue"];
NEW -> free [label = "free", color = "blue"];
NEW -> OK [label = "commit", color = "green"];
NEW -> NEW [label = "c. error", color = "orange" ];
NEW -> FAIL [label = "c. fail", color = "red"];
OK -> RENEW [label = "update", color = "blue"];
OK -> DELETE [label = "free", color = "blue"];
OK -> OK [label = "commit", color = "green"];
DELETE -> free [label = "commit", color = "green"];
DELETE -> free [label = "c. fail", color = "red"];
DELETE -> free [label = "c. error", color = "orange"];
RENEW -> RENEW [label = "update", color = "blue"];

RENEW -> DELETE [label = "free", color = "blue"];
RENEW -> NEW [label = "c. error", color = "orange"];
RENEW -> FAIL [label = "c. fail", color = "red"];
RENEW -> OK [label = "commit", color = "green"];
FAIL -> free [label = "free", color = "blue" ];
FAIL -> FAIL [label = "update", color = "blue" ];
FAIL -> FAIL [label = "c. fail", color = "red" ];
}

Object states. Blue lines denote update (attribute change, free), green lines commit, orange lines errors during commit, red lines errors where recovery has failed.

The objects always start in the NEW state, indicating that they will be actually created with the nearest commit. If they are freed, the free call is done immediately. Any update leaves them in the NEW state, since there is nothing to update yet.

Once a NEW object is successfully committed, it moves to the OK state. A commit has no effect on an OK object, since it is up-to-date.

If a OK object is freed, it is moved to the DELETE state, but its memory is retained until commit is called and the object is deleted from kernel. The objects in DELETE state can not be updated, since they are no longer visible and should not be used by the user of the API. Also, they can not be found by their name.

If an OK object is updated, it is moved to the RENEW state. This means that on the next update, it is removed from the kernel, moved to NEW state, and in the same commit added back to the kernel and moved once again to the OK state. Updating the RENEW object again does nothing and freeing it moves it to the DELETE state, since that takes precedence.

If a commit for some reason fails, LSDN tries to unroll all operations for that object and returns the object to a temporary ERR state. After the commit has ended, it moves all objects from ERR state to the NEW state. This means that on the next commit, the operations will be retried again, unless the user decides to delete the object.

If even the unrolling fails, the object is moved to the FAIL state. The only possibility for the user is to release its memory. If the object was originally already deleted, it bypasses the FAIL state.

Note

If validation fails, commit is not performed at all and object states do not change at all.

How to support a new network type

LSDN does not have an official stable extension API, but the network modules are intended to be mostly separate from the rest of the code. However, there are still a few places you will need to touch.

To support a new type of network :

  • add your network to the lsdn_nettype enum (in private/lsdn.h)
  • add the settings for your network to the lsdn_settings struct (in private/lsdn.h). Place them in the anonymous union, where settings for other types are placed.
  • declare a function lsdn_settings_new_xxx (in include/lsdn.h)
  • create a new file net_xxx.c for all your code and add it to the CMakeLists.txt file

The settings_new function will inform LSDN how to use your network type. Do not forget to do the following things in your settings_new function:

  • allocate new lsdn_settings structure via malloc

  • initialize the settings using lsdn_settings_init_common function

  • fill in the:

    • nettype (as you have added above)
    • switch_type (static, partially static, or learning, purely informational, has no effect)
    • ops (lsdn_net_ops will be described shortly)
  • return the new settings

Also note that your function will be part of the C API and should use ret_err to return error codes (instead of plain return), to provide correct error handling (see Error codes and error handling).

However, the most important part of the settings is the lsdn_net_ops structure – the callbacks invoked by LSDN to let you construct the network. First let us have a quick look at the structure definition (full commented definition is in the source code or Generated Doxygen Documentation):

struct lsdn_net_ops

Public Members

char* type
uint16_t(*get_port)(struct lsdn_settings *s)
lsdn_ip_t(*get_ip)(struct lsdn_settings *s)
lsdn_err_t(*create_pa)(struct lsdn_phys_attachment *pa)
lsdn_err_t(*add_virt)(struct lsdn_virt *virt)
lsdn_err_t(*add_remote_pa)(struct lsdn_remote_pa *pa)
lsdn_err_t(*add_remote_virt)(struct lsdn_remote_virt *virt)
lsdn_err_t(*destroy_pa)(struct lsdn_phys_attachment *pa)
lsdn_err_t(*remove_virt)(struct lsdn_virt *virt)
lsdn_err_t(*remove_remote_pa)(struct lsdn_remote_pa *pa)
lsdn_err_t(*remove_remote_virt)(struct lsdn_remote_virt *virt)
void(*validate_net)(struct lsdn_net *net)
void(*validate_pa)(struct lsdn_phys_attachment *pa)
void(*validate_virt)(struct lsdn_virt *virt)
unsigned int(*compute_tunneling_overhead)(struct lsdn_phys_attachment *pa)

The first callback that will be called is lsdn_net_ops::create_pa. PA is a shorthand for phys attachment and the call means that the physical machine this LSDN is managing has attached to a virtual network. Typically you will need to prepare a tunnel(s) connecting to the virtual network and a bridge connecting the tunnel(s) to the virtual machines (they will be connected later).

If your network does all packet routing by itself, use the lbridge.c module. It will create an ordinary Linux bridge and allow you to connect your tunnel interface via that bridge. We assume your tunnel has a Linux network interface. If not, you will have to come up with some other way of connecting it to the Linux bridge, or use something else than a Linux bridge. In that case, feel free not to use lbridge.c and do custom processing in lsdn_net_ops::create_pa.

If the routing in your network is static, use Static bridge. It will allow you to setup a set of flower rules for routing the packets, ending in custom TC actions. In these actions, you will typically set-up the required routing metadata for the packet and send it of.

After the PA is created, you will receive other callbacks.

The lsdn_net_ops::add_virt callback is called when a new virtual machine has connected on the phys your are managing. Typically, you will add the virtual machine to the bridge you have created previously.

If your network is learning, you are almost done. But if it is static, you will want to handle lsdn_net_ops::add_remote_pa and lsdn_net_ops::add_remote_virt. These callbacks inform you about the other physical machines and virtual machines that have joined the virtual network. If the routing is static, you need to be informed about them to correctly set-up the routing information (see Static bridge). Depending on the implementation of the tunnels in Linux, you may also need to create tunnels for each other remote machine. In that case, the lsdn_net_ops::add_remote_pa callback is the right place.

Finally, you need to fill in the lsdn_net_ops::type with the name of your network type. This will be used as an identifier in the JSON dumps. At this point you might want to decide if your network should be supported in Lsctl Configuration Files and modify lsext.c accordingly. The network type names in LSCTL and JSON should match.

The other callbacks are mandatory. Naturally, you will want to implement the remove/destroy callbacks for all your add/create callbacks. There are also validation callbacks, that allow you to reject invalid network configuration early (see Validation). Finally, LSDN can check the uniqueness of the listening IP address/port combinations your tunnels use, if you provide them using lsdn_net_ops::get_ip and lsdn_net_ops::get_port.

Since an example is the best explanation, we encourage you to look at some of the existing plugins – VLAN (net_vlan.c) for learning networks, Geneve (net_geneve.c) for static networks.

Static bridge

The static-bridge subsystem provides helper functions to help you manage an L2 router built on TC flower rules and actions. Because it is based on TC it can be integrated with the metadata based Linux tunnels.

Metadata-based tunnels (or sometimes called lightweight IP tunnels) are Linux tunnels that can choose their tunnel endpoint by looking at a special packet metadata. This means you do not need to create a new network interface for each endpoint you want to communicate with, but one shared interface can be used, with only the metadata changing. In our case, we use TC actions to set these metadata depending on the destination MAC address (since we know where a virtual machine with that MAC lives). The setup is illustrated in Fig. 5.

graph sbridge {
{VM1 VM2} -- sbridge1
{VM3 VM4} -- sbridge2
{sbridge1 sbridge2} -- sbridge_phys_if
{sbridge1 sbridge2} -- sbridge_phys_if
sbridge_phys_if -- phys_if
sbridge_phys_if -- phys_if
sbridge_phys_if -- phys_if
sbridge_phys_if -- phys_if

sbridge1 [label=<TC bridge for virtual network 1>]
sbridge2 [label=<TC bridge for virtual network 2>]
sbridge_phys_if [label=<Metadata tunnel>]
phys_if [label=<Physical network interface>]
}

Two virtual networks using a static routing (using TC) and shared metadata tunnel. Lines illustrate a connection of each VM.

The static bridge is not a simple implementation of Linux bridge in TC. A bridge is a virtual interface with multiple enslaved interfaces connected to it. However, the static bridge needs to deal with the tunnel metadata during its routing. For that, it provides the following C structures:

Struct lsdn_sbridge represents the bridge as a whole. Internally, it will create a helper interface to hold the routing rules.

Struct lsdn_sbridge_phys_if represents a Linux network interface connected to the bridge. This will typically be a virtual machine interface or a tunnel. Unlike with a classic bridge, a single interface may be connected to multiple bridges.

Struct lsdn_sbridge_if represents the connection of sbridge_phys_if to the bridge. For virtual machines sbridge_if and sbridge_phys_if will be in a one to one correspondence, since virtual machine can not be connected to multiple bridges. If a sbridge is shared, you have to provide a criteria splitting up the traffic, usually by the virtual network identifier.

Struct lsdn_sbridge_route represents a route through given sbridge_if. For a virtual machine, there will be just a single route, but metadata tunnel interfaces can provide multiple routes, each leading to a different physical machine. The users of the static-bridge module must provide TC actions to set the correct metadata for that route.

Struct lsdn_sbridge_mac tells LSDN to use a given route when sending packets to a given MAC address. There will be a sbridge_mac for each VM on a physical machine where the route leads.

The structures above need to be created from LSDN callbacks. For a network with static routing, and metadata tunnels, the correspondence will look similar to this:

callback sbridge
create_pa (first call) create phys_if for tunnel
create_pa create sbridge and if for tunnel
add_virt create if, route and mac
add_remote_pa create route for the physical machine
add_remote_virt create mac for the route

Command-line

The Lsctl Configuration Files are interpreted by the lsdn-tclext library. We have chosen to use the TCL language as a basis for our configuration language. Although it might seem as a strange choice, it provides bigger flexibility for creating DSLs than let’s say JSON or YAML. Basically, TCL enforces just a single syntactic rule:{} and [] parentheses.

Originally, we had a YAML configuration parser, but the project has changed its heading very significantly and the parser was left behind. TCL bindings were done as a quick experiment and have aged quite well since then. The YAML parser was later abandoned altogether.

Naturally, there are advantages to JSON/YAML too. Since our language is Turing complete, it is not as easily analyzed by machines. However, it is always possible to just run the configuration scripts and then examine the network model afterwards. The TCL approach also brings a lot of features for free: conditional compilation, variables, loops etc.

lsdn-tclext library is a collection of TCL commands. One way to use it is in a custom host program (that is lsctl and lsctld). The program will use libtcl to create a TCL interpreter and then call lsdn-tclext to register the LSDN specific commands.

lsctld creates the interpreter, registers the LSDN commands, binds to a Unix domain socket and listens for commands. The commands (received as plain strings) are fed to the interpreter and stdout and stderr is sent back.

lsctlc does not depend on TCL or lsdn-tclext, since it is a simple netcat-like program that simply pipes its input to the running lsctld instance and receives script output back.

lsctl is just a few lines, since it uses the Tcl_Main call. Tcl_Main is provided by TCL for building a custom TCL interpreter quickly and does argument parsing and interpreter setup (tclsh is actually just Tcl_Main call).

The other way to use lsdn-tclext is as a regular TCL extension, from tclsh. pkgIndex.tcl is provided by LSDN and so LSDN can be loaded using the require command.

Test Environment

Our test environment is highly modular, extremely powerful, easy to use and without any complex dependencies. Thus it is easily extensible even for outsiders and people beginning with the project.

CTest

The core of the environment is CTest testing tool from CMake. It provides a very nice way how to define all the tests in a modular way. We create test parts which can be combined together for one complex test. This means that you can for example say that you want to use geneve as a backend for the network, you want to test migrate which means that the migration of virtual machines will be tested and as a verifier use ping. CTest configuration file is called CMakeLists.txt and tests composed from parts can be added with test_parts(...) command. Examples follow, starting with the example described above:

test_parts(geneve migrate ping)

For vlan and dhcp test:

test_parts(vlan dhcp)

For backend without tunnelling, migration with daemon’s help keeping the state in memory and ping:

test_parts(direct migrate-daemon ping)

For complete list of all tests see CMakeLists.txt in the test directory and all parts usable to create complex tests are in test/parts. To run all the tests inside the CTest testing tool just go to test folder and run

ctest

Parts

In the previous section we described the big picture of tests execution. Now we will describe what part is and how to define it. Part is a simple bash script defining functions according to prescribed API for our test environment.

Function prepare() is used for establishing the physical network environment unrelated to the virtual network we would like to manage. These are “wires” we will use for our virtual networking.

connect() is the main phase for setting the virtual network environment. LSDN is usually used in this function for configuring all the virtual interfaces and virtual network appliances.

To test if the applied configuration is working, i.e. it has the expected behavior, function test() is used. Most often ping is used here, but you can use anything for testing the functionality.

If you want to do some special cleanup you can use cleanup() function.

Back to the part primitive - you can combine various parts together but every rational test should define all the described functions no matter how many parts are used.

CTest is pretty good at automated execution of complete tests but if you want to debug the test or execute just part of it there is a run script. This script allows you to execute just selected stages and combine parts in a comfortable way. It’s usage is self-explanatory:

Usage:
        ./run -xpctz [parts]
  -x  trace all commands
  -p  run the prepare stage
  -c  run the connect stage
  -t  run the test stage
  -z  run the cleanup stage

Thus for running a test for the example from the beginning, but just using the connect and prepare stages you can call:

./run -pc geneve migrate ping

QEMU

Because we are dependent on fairly new versions of the Linux Kernel we provide scripts for executing tests in a virtualized environment. This is useful when you use some traditional Linux distribution like Ubuntu with older kernel and you do not want to compile or install custom recent kernel.

As a hypervisor we use QEMU with Arch Linux user-space. Here are several steps you need to follow for the execution in QEMU:

  1. Download actual Linux Kernel to $linux-path.
  2. Run ./create_kernel.sh $linux-path. This will generate valid kernel with our custom .config file.
  3. Run ./create_rootfs.sh which will create the user-space for virtual machine with all dependencies. Here you need pacman for downloading all the packages.
  4. Run ./run-qemu $kernel-path $userspace-path all which will execute all tests and shut down.

run-qemu script is much more powerful and you can run all the examples described above together with debugging in the shell inside that virtual machine. The usage is following:

usage: run-qemu [--help] [--kvm] [--gdb] kernel rootfs guest-command

Available guest commands: shell, raw-shell, all.

shell will execute just a shell and leave the test execution up to you and raw-shell is just for debugging the virtual machine user-space because it will not mount needed directories for tests. all executes all the tests as we have already shown above.

Developmental Documentation

LSDN project focuses on the problem of easily manageable networking setup in thesch environment of virtual machines and cloud environment generally. It perfectly fits to large scale deployment for managing complex virtual networks in data centers as well as small scale deployment for complete control over containers in the software developer’s virtual environment. Naturally the network interface providers have to run Linux Kernel as we use it for the real networking work.

Two core goals which LSDN resolved are:

  1. Make Linux Kernel Traffic Control (TC) Subsystem usable:
    • LSDN provides library with high-level C API for linking together with recent orchestrators.
    • Domain Specific Language (DSL) for standalone configuration is designed and can be used as is.
  2. Audit the TC subsystem and verify that it is viable for management of complex virtual network scenarios as is.
    • Bugs in Linux kernel were found and fixed.
    • TC is viable to be used for complex virtual network management.

Problem Introduction

The biggest challenge in the cloud industry today is how to manage enormous number of operating system instances together in some feasible and transparent way. No matter if containers or full computer virtualization is used the virtualization of networking brings several challenges which are not present in the world of classical physical networks, e.g. the isolation customer’s networks inside of datacenter (multi-tenancy), sharing the bandwidth on top of physical layer etc. All these problems have to be tackled in a very thoughtful way. Furthermore it would be nice to build high-level domain specific language (DSL) for configuring the standalone network as well as C language API for linking and using with orchestrators.

The networking functionality of such a needed tool is following:
  • Support for Virtual Networks, Switches and Ports.
  • API for management via library.
  • DSL for stand-alone management.
  • Network Overlay.
  • Multi-tenancy.
  • Firewall.
  • QoS support.

Most of the requirements above are barely fulfilled with vast majority of recent products which is the main motivation for LSDN project.

Current Situation

The domination of open-source technologies in the cloud environment is clear. Thus we do not focus on the cloud based on closed, proprietary technologies such as cloud services from Microsoft.

In the open-source world the position of Open vSwitch (OVS) is dominant. It is kernel module providing functionality for managing virtual networks and is used by all big players in the cloud technology, e.g. RedHat. However Linux Kernel provides almost identical functionality via it’s traffic control (TC) subsystem. Thus there is code duplicity of TC and OVS and furthermore the code base of OVS is not as clear as the code base of TC. Hence the effort to eliminate OVS in favor of TC and focus to improve just one place in the Linux networking world.

Although TC is super featureful it has no documentation (literally zero) and the error handling of it’s calls is most of the time without any additional information. Hence for correct TC usage one has to read bunch of kernel source codes. Add bugs in a rare used places and we have very powerful but almost unusable software. Thus some higher level API is very attractive for everyone who wants to use more advanced networking features from Linux kernel.

Similar Projects

There is no direct competition among tools building on top of TC to make it much more usable (actually to make it just usable). However there are competitors for TC, which are not that powerful or they are just modules full of hacks and are taken positively in the Linux mainline.

open vSwitch

  • Similar level of functionality to TC.
  • Complex, hardly maintainable code and code duplicity with respect to TC.
  • External module without any chance to be accepted to kernel mainline.
  • Slightly more user convenient configuration than TC.

vSphere Distributed Switch

  • Out-of-the game because it is not open-source.
  • Not that featureful as TC. E.g. no firewall, geneve etc.
  • Closed-source product of VMware.
  • Hardly applicable to heterogeneous open-source environment.

Hyper-V Virtual Switch

  • Out-of-the game because it is not open-source.
  • Not that featureful as TC. E.g. no firewall, geneve etc.
  • Closed-source product of Microsoft.
  • Hardly applicable to heterogeneous open-source environment.

Development Environment

In this section we present all the tools used in our project which are worth mentioning.

Development Tools

The platform independent builds with all the dependency and version checks are done thanks to cmake in cooperation with pkgconfig. This is much nicer and more featureful alternative to autoconf tools.

Furthermore we kept everything since the beginning in the GIT repository on GitHub. We used pretty intensely with all it’s features like branches etc. VCS is a must for any project and GIT is the most common choice.

When we were developing daemon (for migration support) we found library called libdaemon which helps you to write system daemon in a proper way with all the signal handling, sockets management and elimination of code full of race conditions.

As a code editor only VIM and ed were allowed.

Testing Environment

Our testing environment is based on the highly modular complex of bash scripts, where every part which should be tested defines prescribed functions which are further executed together with other parts. Like this we can create complex tests just with combination of several parts.

For automatic test execution and it’s simplification we used ctest tool which is part of cmake package.

The continuous integration was used through the Travic-CI service which after every code commit executed all the tests and provides automatic email notification in case of failure.

We have also extensive support for testing on not supported kernels via QEMU. Automatic scripts are able to create minimalistic and up-to-date Arch Linux root filesystem, boot up-to-date kernel and ran all tests. This method is also used on Travis-CI, where only LTS versions of Ubuntu are available.

Of course various networking tools like dhcpd, dhcpcd, dhclient, tcpdump, iproute, ping etc. were used for diagnostics as well as directly in tests.

Note that during tests we were highly dependent on Linux namespaces, hence we were able to simulate several virtual machines without any overhead and speed up all the tests.

Communication Tools

Communication among all team members and leaders was performed via old-school mailing lists and IRC combo. We used our own self-hosted mailman instance for several mailing lists:

  • lsdn-general for general talk, organization, communication with leaders and all important decisions.
  • lsdn-travis for automatic reports from Travis-CI notifying us about commits which break the correct functionality.
  • lsdn-commits for summary of every commit we made. This was highly motivation element in our setup, because seeing your colleague committing for the whole day can make you feel really bad. Furthermore discussion about particular commit were done in the same thread, which enhances the organization of decisions we made and why.

For real-time communication we used IRC channel #lsdn on Freenode. This is useful especially for flame-wars and arguing about future design of the tool.

We have developed a simple bot for our mailing list to remind us of important deadlines and nag people who have not commited to the repository for a long time. This helped us “feel” the schedule and keeped us focused on work.

Documentation Tools

The project has fairly nice documentation architecture. C source codes including API are commented with Doxygen, which is a standard way how to this kind of task. Then the Doxygen output is used and enhanced with tons of various documentations (user, developmental…) and processed with Sphinx.

Sphinx is a tool for creating really nice documentations and supports various outputs. Like this we are able to have HTML and PDF documentation synced and both formats look fabulous.

Furthermore we use readthedocs.io for automatic generation of documentation after every documentation commit. This also means that we have always up-to-date documentation online in browsable HTML version as well as downloadable and printable PDF version. Note that PDF generation uses LaTeX as a typesetting system, thus the printed documentation looks great.

The whole documentation source is written in reStructuredText (rst) markup language which greatly simplified the whole process of creation such a comprehensive documentation.

Open-source contributions

We have identified a few bugs in the Linux kernel during our development. We believe this is mainly because of the unusual setups we exercise and new kernel features (such as goto chain, lwtunnels) we use. Following bugs were patched or at least reported and patched by someone else:

We have also identified a bug in iproute2:

Naturally, our tooling also has problems, so we also fixed a bug in sphinx and breathe.

Project Timeline

The project came from an idea of Jiri Benc (Linux Kernel Networking Developer) from Red Hat Czech who wanted to create a proof-of-concept tool which will try to replace Open vSwitch with purely Linux Kernel functionality and find all the missing functionality or bugs in Linux Kernel which would block or slow down the effort to eliminate Open vSwitch.

These days Vojtech Aschenbrenner was an intern in Jiri’s team and also a student who was looking for challenging Software Project topic from Systems field, which was a mandatory part of studies at Charles University. Hence the topic arose.

Formation of the team was not that straightforward. In the beginning the team was composed from 7 people. They were people with Systems interests and also great computer scientists. The property of excellency was actually the biggest problem of the team. In the beginning part of the implementation phase 3 people left the studies and also team because of much better offer. It was two times because of Google and one time because of Showmax. Thus 4 people left in the team which was still manageable.

However another personal problems came with studies in the US and jobs of the remaining members. Vojtech Aschenbrenner left to the University of Rochester and have almost no time to work on a project for a lot of weeks. Similar situation came to Adam Vyškovský who left to Paris because of a dream job in an aviation. Jan Matějek still had full-time job in SUSE and it looked like the project has a huge problems and will most probably fail. However Roman Kápl showed his true determination and saved the project although he has also part-time job in a systems company. It is for sure, that the project would fail without his knowledge, skills in system programming and diligence. When all the remaining members who were still part of the team saw how he is continuously working on the project they came back from abroad and decided to finish the project as well as their master studies instead of continuing they career elsewhere. All of the team members believe that Roman influenced our future life in a positive way.

After this we managed to do hackatons quite often and do the majority of the work in a several months. Because the problematic part of the project where a lot of people left was before the official start the official timeline of the project was according to the plan and we were able to fulfill our deadlines which were following:

  • Month 1:
    • Analysis of the requirements of cloud environments for software defined networking.
    • Analysis and introduction to Linux Kernel networking features, especially traffic control framework and networking layer of the Linux Kernel.
    • Description of detailed use-cases which will be implemented.
  • Month 2:
    • API design.
  • Months 3 - 7:
    • Implementation of the complete functionality of the project. This was the main developing part.
  • Months 8 - 9:
    • Finalization.
    • Debugging.
    • Documentation.
    • Presentation preparation.
    • (The most intense part)

Team Members

The project was originally started with people who are no longer in the team from various of reasons. We would like to honorably mention them, because the initial project topics brainstorming were done with them.

  • Martin Pelikán left to Google Sydney few weeks after the project was started. Although he is a non-sleeper which can work on several projects together he was not able to find a spare time for this one. This was a big loss because his thesis was about TC.
  • David Krška left to Google London few weeks after his bachelor studies graduation.
  • David Čepelík left to Showmax one semester after his bachelor studies graduation.

The rest of the people who started the project were able to stay as a part of the team and finish it.

  • Vojtěch Aschenbrenner established the team and tried to lead the project. He also created the infrastructure, hosted and managed the communication platform and officially communicated with authorities from the University as well as mediated the communication inside the team. He created the LSDN daemon and the way how it communicates with the client. He also worked on the testing environment’s scripts, developmental and testing part of documentation and maintains the Arch Linux package.
  • Roman Kápl is the main developer of the project. He participated on all parts of the project, most notably on internal parts, which are directly communication with kernel. There is no part of the project, which Roman did not touched. He always solved the most difficult problems, fixed several bugs in Linux kernel and in tools used in the project. He maintains package for distributions based on Debian.
  • Jan Matějek was mainly involved in writing documentation generated by Doxygen and code reviews. Thanks to it he fixed logical mistakes in the project and commented the whole source code in a great detail. He was partly involved also in non-generated documentation. He maintains package for distributions using RPM. He was the original author of the CMake automated tests.
  • Adam Vyškovský was together with Roman the main developer of internals and is the main author of the TCL/JSON exporter, however he also wrote big portion of non-generated documentation and most notably periodically revised it and fixed non-trivial amount of mistakes. He spent enormous amount of time during debugging the netlink communication with Linux kernel, which was absolutely crucial for the project.

At this place Jiri Benc, the official leader from Red Hat Czech, should be mentioned because discussion with him was always full of knowledge and his overview of the Linux Kernel and open-source world is enormous. He always found a spare time to arrange a meeting with us and was also willing to help us move forward and motivate us.

Conclusion, Contribution and Future Work

The project was able to fulfill all the requirements set in the beginning and also follow the plan created in the beginning. This means that all the requested functionality was implemented and properly tested. Furthermore it was documented all through from both programmers view and also from user (API) view. Also detailed use cases with the quick-start guide were described. Especially the quick-start guide showed how easy it is to create complex virtual networking scenario in a few steps with very minimal configuration files.

At the end the whole project was all through tested in both, virtual setups, physical setups as well as hybrid setups. Finally the demo presentation showing the power of LSDN was created. This part of work showed how capable LSDN (and TC framework) is in terms of replacing Open vSwitch – it is capable and the direction of TC framework development goes in the right way of replacing Open vSwitch in the future.

Another big success of the project was patching the upstream of Linux Kernel as well as patching the tooling as Sphinx and Breathe. Also several bugs were reported. This was the secondary and optional target of the project which was also fulfilled.

LSDN has the ambition to become the only tool using the extremely powerful TC framework in Linux Kernel and use it in very user convenient way with very minimal additional dependencies for creation complex virtual network scenarios. Also the core of the tool is written efficiently in C, thus there is no performance impact of using LSDN. Furthermore we were able to push LSDN installation packages to user repositories of Linux distributions or at least create the packages. This means that the comfort of installation is maximal which helps to fulfill the main goal of creating easy to use management tool for complex networks.

Because of the very promising future of the tool, the LSDN team is willing to continue in supporting the project as well as integrate future enhancements in the TC framework, fix bugs found in the production as well as customize the project according to the future needs of virtual networks.

Furthermore there are some features that we consider useful and could be improved upon straight away. Some of them rely on things that the kernel learned to do in the last months of the project, or that we have discovered recently - the egress qdisc or better default disciplines (CoDEL was suggested). We would also like to improve the firewall (rewrite the rule engine and add support for ACCEPT actions).

The next challenging step is to integrate LSDN into most popular virtualization orchestrators and eliminate Open vSwitch. This would attract more developers and make the project part of the state of the art cloud ecosystem - this is the real goal!

Generated Doxygen Documentation

Doxygen (Generated documentation)

Generated documentation by Doxygen for LSDN can be found at https://asch.github.io/lsdn-doxygen . This documentation contains all parts of LSDN (not only the public API).

Looking for something?