Debugging pihole-FTL

My primary project this week was rebuilding my file server on computer 13C06, the workstation I used at Olinia and which Bill gave to me when I left. Overall the rebuild went smoothly. But when I went to put it into service, I discovered DNS wasn’t working.

The culprit was in pihole-FTL, which forms the core of the Pi-hole DNS ad blocking service. In turn, pihole-FTL is a fork of dnsmasq, a very useful combination DNS/DHCP/TFTP server.

The Pi-hole folks wrote what is essentially a Ph-Hole front-end to dnsmasq, making it much more difficult to see what dnsmasq is doing. For example, dnsmasq has a logging service built into it, but its operations are now configured using a simpler interface than what dnsmasq provided. To get more detail from the dnsmasq parts of pihole-FTL, I had to review the source code where options are set, then modify the code in dnsmasq.c to set output logging to a file:

94 | daemon->log_file = "/r/dnsmasq.log.text";

With that line in place, I saw the following in the log file:

dnsmasq[6791]: started, version pi-hole-2.80 cachesize 150
dnsmasq[6791]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ips
dnsmasq[6791]: warning: using interface eno1 instead
dnsmasq[6791]: reading /etc/resolv.conf
dnsmasq[6791]: ignoring nameserver 192.168.1.1 - local interface
dnsmasq[6791]: ignoring nameserver 192.168.1.3 - local interface
dnsmasq[6791]: read /etc/hosts - 27 addresses

So the reason DNS wasn’t working was the copy of pihole-FTL running on the new server decided to ignore both nameservers in listed in /etc/resolv.conf because it considers them to be “local interface” (whatever that means.)

Ignoring 192.168.1.1 was expected: that’s the address I told pihole-FTL to listen on, so if it tried to forward a non-resolved query to that address it would end up attempting to answer it itself, fail, and forward it again in an endless loop.

However it should be recognizing 192.168.1.3, because that’s where I’m running bind, which I’m using as my forwarding DNS server.

The code in question is in dnsmasq/network.c:

1531       for (iface = daemon->interfaces; iface; iface = iface->next)
1532         if (sockaddr_isequal(&serv->addr, &iface->addr))
1533           break;
1534       if (iface)
1535         {
1536           my_syslog(LOG_WARNING, _("ignoring nameserver %s - local interface"), daemon->namebuff);
1537           serv->flags |= SERV_MARK;
1538           continue;
1539         }

At this point I decided it was time to pick up a new skill: gdb, the GNU debugger.

[root@penguin FTL-master]# gdb --args pihole-FTL -f
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /var/tmp/FTL-master/pihole-FTL...done.
(gdb) break network.c:1532
Breakpoint 1 at 0x63195: file dnsmasq/network.c, line 1532.
(gdb) break network.c:1536
Breakpoint 2 at 0x631a4: file dnsmasq/network.c, line 1536.
(gdb) run
    ... lots of output ...
breakpoint 1, check_servers () at dnsmasq/network.c:1532
1532                if (sockaddr_isequal(&serv->addr, &iface->addr))
(gdb) print/x serv.addr.in.sin_addr     <-- The "server" IP address
$1 = {s_addr = 0x0301a8c0}              <-- IP address 0CA80103, or 192.168.1.1
(gdb) p/x iface.addr.in.sin_addr        <-- The "iface" address (loop variable)
$2 = {s_addr = 0x1701a8c0}              <-- IP address C0A80117, or 192.168.1.3
(gdb) continue

In dnsmasq there are two lists of interest: daemon->servers and daemon->interfaces.

On the old penguin:

SERV (check against): 
  p/x serv.addr.in.sin_addr
  {s_addr = 0x301a8c0} -> C0 A8 01 03 -> 192.168.1.3

IFACE (loop var):
  p/x iface.addr.in.sin_addr
  {s_addr = 0x101a8c0} -> C0.A8.01.01 -> 192.168.1.1
  p/x iface.addr.in.sin_addr
  {s_addr = 0x100007F} -> 7F.00.00.01 -> 127.0.0.1

On the new penguin:

SERV (check against): 
  p/x serv.addr.in.sin_addr
  {s_addr = 0x0x1501a8c0} -> C0 A8 01 15 -> 192.168.1.21
  p/x serv.addr.in.sin_addr
  {s_addr = 0x0x1701a8c0} -> C0 A8 01 15 -> 192.168.1.23

IFACE (loop var):
  p/x iface.addr.in.sin_addr: 0x1701a8c0 -> C0.A8.01.17 -> 192.168.1.23
  p/x iface.addr.in.sin_addr: 0x1501a8c0 -> C0.A8.01.15 -> 192.168.1.21
  p/x iface.addr.in.sin_addr: 0x1701a8c0 -> C0.A8.01.17 -> 192.168.1.23
  p/x iface.addr.in.sin_addr: 0x0100007F -> 7F.00.00.01 -> 127.0.0.1

The fact 192.168.1.21 was appearing in both the serv and iface structures was a clue, but it took a bit more digging to finally figure out what was happening. Eventually I discovered that /etc/dnsmasq.conf was an empty file. I believe this was a consequence of some patches I make to the installer prior to runnning it. And so it was that after nearly two days of troubleshooting, the answer to the DNS issue was to set up a working /etc/dnsmasq.conf file:

conf-dir=/etc/dnsmasq.d