Prolixium Communications Network

From Prolixium Wiki
Jump to navigation Jump to search
Prolixium Communications Network Logo

The Prolixium Communications Network (known also as PCN, mynet, My Network, and Prolixium .NET) is a collection of small, geographically disperse, computer networks that provide IPv4 and IPv6, VPN, and VoIP services to the Kamichoff family. Owned and operated solely by Mark Kamichoff, PCN often serves as a testbed for various network experiments. The majority of the PCN nodes are connected via residential data services (cable modem), while some located in data centers have Gigabit Ethernet connections to the Internet.

Current State

Overview

PCN WAN Architecture
PCN World Map

As of February 2, 2022, PCN is composed of several networks in the United States and across the globe, connected via OpenVPN and Wireguard with the IPv6 backbone connected via 6in4 tunnels:

Each site has multiple OpenVPN tunnels to other locations supporting both IPv4 and IPv6. The network is primarily powered by Free Range Routing (FRR) with some sites using BIRD.

Routing

The routing infrastructure consists of several autonomous systems, taken from the IANA-allocated private range: 64512 through 65534. Each site runs IBGP, possibly with a route reflector, and its own IGP for local next-hop resolution. EBGP is used between sites and peering connections. IPv4 Internet connectivity for each site is achieved by advertisement of default routes from boxes performing NAT. The lab is connected to starfire (core router) in Ashburn, VA. The PCN used to use one large OSPF area with no EGP. It was converted to a BGP confederation setup, which was a bad idea (but educational!), then reconverted to its current state.

BGP on PCN

IPv6 Connectivity

IPv6 connectivity is provided by four (5) direct connections to Vultr, Choopa (The Constant Company), ARP Networks, and Free Range Cloud. A Hurricane Electric BGP tunnel is used as backups in LAX and EWR2 but is depreferenced. The border transit network piece of the PCN provides this connectivity.

IPv6 addressing is out of 2620:6:2000::/44, which is a direct allocation from ARIN.

Border Transit Network

The border transit network operates in AS395460 and consists of excalibur, trident, orca, pegasus, and concorde. Connectivity is provided by the following transit peers:

  • trident: AS25795 and AS6939
  • excalibur: AS20473 and AS6939
  • orca: AS20473
  • concorde: AS20473
  • pegasus: AS53356

This network injects a default route into the rest of the PCN, which can be referred to PEN (Prolixium Enterprise Network). The border network receives a full table from all transits and advertises 2620:6:2000::/44 out each peer along with some sites advertising /48 specifics for networks that are nearby.

Hurricane Electric (AS6939) is only used as backup because it is a tunneled connection and is suspected to be throttled.

Border Transit Network

Border Transit Network Map

The following hosts do not default route to the border transit network and use their own native IPv6 connectivity:

  • centauri
  • firefly
  • storm

The following hosts may have IPv6 connectivity but it's not currently enabled (at time of writing):

  • exodus
  • galactica
  • photonic

DNS

DNS is done with two views: internal and external. PCN has two external nameservers, and four internal ones, all which perform zone transfers from the master nameserver, ns3.antiderivative.net. antiderivative.net is used for all NS records, as well as glue records at the GTLD servers. The internal nameservers are ns{1-4} and external ones are ns{2,3}. Each zone has two views, internal and external, and a common file that is included in both views (SOA, etc.). The zones include the following:

  • Internal view, answering to 10/8, 172.16/12, and 192.168/16 addresses
    • 3.10.in-addr.arpa. and 3.16.172.in-addr.arpa. reverse zones
    • prolixium.com, prolixium.net, antiderivative.net, etc.'s internal A/CNAME records
  • External view, answering to everything !RFC1918
    • prolixium.com, prolixium.net, antiderivative.net, etc.'s external A/CNAME records
  • Common information, answering for all hosts
    • 180/30.189.9.69.in-addr.arpa., 232/29.186.9.69.in-addr.arpa, 0.0.0.2.6.0.0.0.0.2.6.2.ip6.arpa., and other reverse zones
    • prolixium.com, prolixium.net, antiderivative.net, etc.'s common MX records

Previously, the Xicada DNS Service (developed by Mark Kamichoff) kept track of all the forward delegations as well as IPv4 reverse delegations on Xicada. The administrator of each node enumerated their zones into a web form, and then configured their DNS server to pull down a forwarders definition for all Xicada zones. It supported BIND and djbdns, but also outputted a CSV file if someone decided to use another DNS server. It was originally intended that each DNS server should pull down a fesh copy of the forwarders definition file nightly, but there were really no rules.

Mark Kamichoff has a policy on his network to have DNS entries (includes A, AAAA, and PTR) for each and every active IP address. If a host is offline, the DNS records should be immediately expunged. This precludes the requirement of a host management system or a collection of poorly-maintained spreadsheets. If an IP is needed, the PTR should be checked. All DHCP-assigned IP addresses are created via {side ID}-{lastoctet}.prolixium.com. Again, no confusion. DNS itself is a database, so why not use it?

All transit links on PCN are addressed using the prolixium.net domain. The format is {unit/VLAN}.{interface}.{host}.prolixium.net. For example, the xl1 interface on starfire would be: xl1.starfire.prolixium.net. There is a collection of DNS entries for every IPv4 and IPv6 transit link. There is not one hop in my network which has no PTR record (or a PTR record w/out a corresponding A or AAAA record). Each router has a loopback interface with IPv4 and IPv6 addresses (if supported).

Ashburn-Specific Setup

Ashburn LAN

The network setup in Ashburn (formerly Seattle, WA and Charlotte, NC) is slightly different from the other sites, where there is a single router with a dynamic address. In the Ashburn location there are two ISPs and they're terminated in separate LXC instances (all with VPNs to at least one of interstellar, nox, dax, or elise - the "enterprise" network):

  • discovery (on evolution) - Verizon FiOS
  • sprint (on evolution) - Verizon Wireless (LTE)

starfire and evolution are the two core routers with multiple Gigabit Ethernet interfaces. The current routing setup is as follows:

  • IPv6 (Internet & internal) inbound & outbound traffic traverses discovery (Verizon FiOS) via VPN
  • IPv4 Internet inbound & outbound traffic traverses discovery (Verizon FiOS) via NAT
  • All LXCs above advertise an IPv4 default route into OSPFv2
  • LOCAL_PREF and AS_PATH prepending influence the traffic flow

In the case of backup, discovery is replaced with the LXC sprint.

In the past, NetFlow was used on atlantis, which was depicted in the drawing below:

PCN NetFlow Setup

The NetFlow collector ran ntop, but this was uninstalled due to instability.

Printing

The whole printing/CUPS/lpd setup is mostly an annoyance. Most people would want to run CUPS on every Unix client on the network. Mark Kamichoff believes it's better to have a lightweight client send a PostScript file via lpd to a CUPS server rather than sending a huge RAW raster stream across the network and have both the client and server do print processing. See the diagram to the bottom:

PCN Printing Setup

SmokePing

For monitoring, PCN uses a combination of Nagios, SmokePing, and MRTG. The SmokePing setup itself is a combination of slaves and masters, both IPv4 and IPv6.

SmokePing

nox is the master for a few slaves:

  • tiny - VPS connected to atlantic.net
  • storm - RPi 3 connected to AT&T Fiber
  • exodus - RPi 3 connected to AT&T DSL
  • galactica - RPi 3 B+ connected to Comcast Xfinity
  • photonic - RPi 4 B connected to Charter Spectrum

History

History is hidden by default. Click expand to see it.
Warning: This entire section is written in the first-person (Mark Kamichoff's) point of view

Beginnings

After joining the [Xicada network back at RPI, I decided to continue linking all of my networks and sites together via various VPN technologies. At first, the network was just a simple VPN between my network at home and a few computers in my dorm room at RPI. The connection tunnelled through RPI's firewall like a knife through warm butter, using OpenVPN's UDP encapsulation mode. Actually, a site to site UDP tunnel was the only thing OpenVPN offered, back then. My router at RPI was a blazing-fast Pentium 166MHz box running Debian GNU/Linux. At that point, my Xicada tunnels were terminated on another box I found in the trash, an old AMD K6-300, which eventually ran FreeBSD 4.

The network quickly started expanding, and I was able to move the K6-300 box (starfire) into the ACM's lab, which was given a 100mbit link, in the basement of the DCC. At this point in time, my network had three sites: home, the lab, and my dorm room. Since I didn't stick around RPI during most summers, I reterminated the Xicada links on starfire, since it sported a more permanent link.

Shortly after starfire was moved to the lab, I started toying with IPv6, and acquired a tunnel via Freenet6 (now Hexago, since they're actually trying to sell products, or something). RPI's firewall wouldn't allow IP protocol 41 through the firewall, and my attempts at getting this opened up for my IP failed. So, I terminated the IPv6 tunnel on my box at home, which sat on Optimum Online. Freenet6 gave me a /48 block out of the 3ffe::/16 6bone space, and I started distributing /64's out to all of my LAN segments. I started running Zebra's OSPFv3 daemon, and realized it was buggy as all get out. It mostly worked, though. Since Freenet6 gave me an ip6.int. delegation, I spent some time applying tons of patches to djbdns, my DNS server of choice, back then. After tons of patching, I got IPv6 support, which was fairly neat at the time. What did I use this new-found IPv6 connectivity for? IRC and web site hosting. www.prolixium.com has had an AAAA record since at least 2003.

Sometime in 2003 (I forget when), I moved my IPv6 tunnel to BTExact, British Telecom's free tunnel broker that actually gave out non-6bone /48's and ip6.arpa. DNS delegations. I quickly moved to them, and enjoyed quicker speeds than Freenet6 for about a year. Of course, after a year, my parents had a power outage at home, and my server lost the IP it had with OOL for the past two years. BTExact, at that time, had frozen their tunnel broker service, and didn't allow any modifications or new tunnels to be created. I went back to Freenet6, who had changed to 2001::/16 space.

After leaving RPI, and getting a job, I decided to purchase a dedicated server from SagoNet. I extended my network down to Tampa, FL, where the server was located.

Fast-forwarding to the present day, I currently have six sites, and native IPv6 from Voxel dot Net. Almost every host on the network is IPv6-aware, and the IPv6 connectivity is controlled completely by pf.

Xicada connectivity at this point has been terminated, due to lack of interest.

VLAN Conversion (Laundry Room Data Center)

VLAN Setup
I'm lucky to have CAT5(e?) cabled to every room in my condo, all aggregated in the laundry room, I figured it was time to deploy a couple different VLANs on my network. Initially, I just had a dumb switch connecting all of the various ports in different rooms together. Since that was too simple of a solution, I picked up a Cisco 2940 switch on eBay, and setup a 1Gbit trunk between starfire and the laundry room. I setup 4x VLANs:
  • 2: Various wall jacks
  • 3: Media center link (connected to kamikaze)
  • 4: Linksys link (connected to mercury)
  • 5: Lab link (connected to hysteresis)

I ended up throwing some other gear in the laundry room along with the switch, and ended up moving my lab (3.0) there.

BGP (Confederations) Conversion

History

Starting with the Xicada project, my network was one big OSPF backbone area. Entirely flat, except for some route redistribution for the lab connection. When I added OSPFv3 for IPv6 reachability, it was no different - one big area: no stub areas, no frills. It worked, but was boring, and didn't provide the flexibility required if I wanted to start redirecting Internet traffic.

After reading up on BGP, I realized I could make my network 1000% more complex, while gaining some real-world experience. Sounds like a plan, huh? Preparation and Design

Due to some Quagga instability issues, I originally tested out some alternate BGP/OSPF implementations, including XORP. Unfortunately, none of them fit the bill, and XORP, although promising, was horribly unstable and appeared to suffer from configuration file parsing issues, more than anything else. So I decided to stick with Quagga. I also decided to keep two separate BGP connections, one for IPv4 and one for IPv6 (so I didn't run into any nasty next-hop accessibility problems).

One of the goals of the redesign was to eliminate the large network-wide IGP process and break down each site into sub-ASes, using BGP confederations and route reflectors. This required a partial mesh of CBGP (confederation BGP - like EBGP, but more attributes are retained) between all the sites, to take advantage of the tunnels. Unfortunately, this meant that I had to renumber all of my IPv6 tunnels, since they were all /128's. Not a big deal. I didn't want to do this with the IPv4 (OpenVPN) tunnels, since the documentation strongly recommended against the use of anything other than a 32-bit netmask. This required the use of the ebgp-multihop command, since according to most [E]BGP implementations, /32's or /128's connecting to each other is not classified as 'directly connected' for some reason. (doesn't make sense to me, since even a TTL of 1 should theoretically allow communication to succeed)

At each site, I wanted to run IBGP internally, and designate one box to be the route reflector, in order to loosen the IBGP full-mesh requirement. Some of the OpenWrt devices did not have loopbacks at the time, so I needed to shuffle around some addresses and fix this.

I'd still run an IGP internal to each site (not nox or dax, since they are only one router), and advertise a default route via OSPFv2 within the site, for Internet access. I could also advertise default routes from two different routers within a site, for redundancy and failover Internet access.

So, here's some of the tasks I performed prior to making any routing changes:

  1. Add loopbacks to all routers
  2. Redo all IPv6 tunnel interfaces, converted to /126's to avoid subnet-router anycast issues
  3. Redo tunnel naming standards (was too long before)

IPv6 Migration

I figured, since on most platforms, IGP routes take precedence over BGP routes, I could add all the peering relationships and get everything setup without skipping a beat. Quagga's zebra process wouldn't insert or remove anything from the FIB (the kernel routing table). Then I could remove OSPFv3 from all the WAN links, and zebra would just shuffle around the routes, but reachability would come back within a few minutes, maybe?

So I started building the BGP neighbors, and quickly ran into a problem. For some reason, no IPv6 BGP routes were being sent to other peers from Quagga's bgpd. I posted a message to the mailing list, and quickly got a helpful response. Apparently I was hitting a bug that's been in Quagga for awhile (typo) that dealt with the address-family negotiation between peers. The quick fix was to add 'override-capability' to each neighbor (or peer group) and it would accept all advertised address families.

After all the peers were setup, I disabled OSPFv3 on all the WAN links, and everything reconverged... oddly. It looked like BGP was doing path-selection based on tiebreakers, and picking the higher peer address as the best path for a destination, even if it meant not utilizing the directly connected link. After scratching my head for a few minutes, I realized my stupidity. Normal BGP treats AS_CONFED_SEQUENCE and AS_CONFED_SET as a length of one, so all paths through my network looked like they had an AS path length of *1*. Luckily, Quagga had a nice bgp bestpath as-path confed command that modified the path selection algorithm, and gave me what I wanted. I described this a blog entry.

Since I wanted all loopbacks and transit interfaces reachable from anywhere, I added a ton of network statements to bgpd. It felt like a hack, but isn't too bad, since there's really no other way of doing it, without using a network-wide IGP.

IPv4 Migration

Since the IPv6 migration was successful, I figured the IPv4 migration would turn out the same - and it did, mostly.

I started setting up the IPv4 BGP neighbors, and ran into a strange issue with ScreenOS. I've documented it here. Basically, my two Juniper firewalls wouldn't establish IBGP connections unless they were configured as passive neighbors (wait for a connection).

After all the IPv4 BGP connections were up and running, I killed the network-wide IGP process entirely (shut off ospfd/ospf6d on dax and nox), and let everything reconverge. It worked out of the box - success!

I removed the static default routes on my OpenWrt routers, and advertised defaults at each site. No problem there.

Finish

Although I ran into a number of problems, and probably complicated troubleshooting of my network by an order of magnitude, I think the conversion was worth it. Now if anyone wants to start Xicada 2.0, we can do it right, this time...

EBGP Conversion

I got sick of confederations, so I just removed the confederation statements and converted all of the inter-site links to straight EBGP.

Applications

PCN enables several applications:

  • VoIP (via SIP / G.711u)
  • IPv6 Internet access
  • Streaming audio

Lab

Main Article: PCN Lab

The PCN lab is Mark Kamichoff's network proving ground and general hacking arena.

External Links