Debugging pihole-FTL
My primary project this week was rebuilding my file server on computer 13C06, the workstation I used at Olinia and which Bill gave to me when I left. Overall the rebuild went smoothly. But when I went to put it into service, I discovered DNS wasn’t working.
The culprit was in pihole-FTL, which forms the core of the Pi-hole DNS ad blocking service. In turn, pihole-FTL is a fork of dnsmasq, a very useful combination DNS/DHCP/TFTP server.
The Pi-hole folks wrote what is essentially a Ph-Hole front-end to dnsmasq,
making it much more difficult to see what dnsmasq is doing. For example,
dnsmasq has a logging service built into it, but its operations are now
configured using a simpler interface than what dnsmasq provided. To get more
detail from the dnsmasq parts of pihole-FTL, I had to review the source code
where options are set, then modify the code in dnsmasq.c
to set output logging
to a file:
94 | daemon->log_file = "/r/dnsmasq.log.text";
With that line in place, I saw the following in the log file:
dnsmasq[6791]: started, version pi-hole-2.80 cachesize 150 dnsmasq[6791]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ips dnsmasq[6791]: warning: using interface eno1 instead dnsmasq[6791]: reading /etc/resolv.conf dnsmasq[6791]: ignoring nameserver 192.168.1.1 - local interface dnsmasq[6791]: ignoring nameserver 192.168.1.3 - local interface dnsmasq[6791]: read /etc/hosts - 27 addresses
So the reason DNS wasn’t working was the copy of pihole-FTL running on the new
server decided to ignore both nameservers in listed in /etc/resolv.conf
because it considers them to be “local interface” (whatever that means.)
Ignoring 192.168.1.1
was expected: that’s the address I told pihole-FTL to
listen on, so if it tried to forward a non-resolved query to that address it
would end up attempting to answer it itself, fail, and forward it again in an
endless loop.
However it should be recognizing 192.168.1.3,
because that’s where I’m running
bind, which I’m using as my forwarding DNS server.
The code in question is in dnsmasq/network.c
:
1531 for (iface = daemon->interfaces; iface; iface = iface->next) 1532 if (sockaddr_isequal(&serv->addr, &iface->addr)) 1533 break; 1534 if (iface) 1535 { 1536 my_syslog(LOG_WARNING, _("ignoring nameserver %s - local interface"), daemon->namebuff); 1537 serv->flags |= SERV_MARK; 1538 continue; 1539 }
At this point I decided it was time to pick up a new skill: gdb, the GNU debugger.
[root@penguin FTL-master]# gdb --args pihole-FTL -f GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /var/tmp/FTL-master/pihole-FTL...done. (gdb) break network.c:1532 Breakpoint 1 at 0x63195: file dnsmasq/network.c, line 1532. (gdb) break network.c:1536 Breakpoint 2 at 0x631a4: file dnsmasq/network.c, line 1536. (gdb) run ... lots of output ... breakpoint 1, check_servers () at dnsmasq/network.c:1532 1532 if (sockaddr_isequal(&serv->addr, &iface->addr)) (gdb) print/x serv.addr.in.sin_addr <-- The "server" IP address $1 = {s_addr = 0x0301a8c0} <-- IP address 0CA80103, or 192.168.1.1 (gdb) p/x iface.addr.in.sin_addr <-- The "iface" address (loop variable) $2 = {s_addr = 0x1701a8c0} <-- IP address C0A80117, or 192.168.1.3 (gdb) continue
In dnsmasq there are two lists of interest: daemon->servers
and
daemon->interfaces
.
On the old penguin:
SERV (check against): p/x serv.addr.in.sin_addr {s_addr = 0x301a8c0} -> C0 A8 01 03 -> 192.168.1.3 IFACE (loop var): p/x iface.addr.in.sin_addr {s_addr = 0x101a8c0} -> C0.A8.01.01 -> 192.168.1.1 p/x iface.addr.in.sin_addr {s_addr = 0x100007F} -> 7F.00.00.01 -> 127.0.0.1
On the new penguin:
SERV (check against): p/x serv.addr.in.sin_addr {s_addr = 0x0x1501a8c0} -> C0 A8 01 15 -> 192.168.1.21 p/x serv.addr.in.sin_addr {s_addr = 0x0x1701a8c0} -> C0 A8 01 15 -> 192.168.1.23 IFACE (loop var): p/x iface.addr.in.sin_addr: 0x1701a8c0 -> C0.A8.01.17 -> 192.168.1.23 p/x iface.addr.in.sin_addr: 0x1501a8c0 -> C0.A8.01.15 -> 192.168.1.21 p/x iface.addr.in.sin_addr: 0x1701a8c0 -> C0.A8.01.17 -> 192.168.1.23 p/x iface.addr.in.sin_addr: 0x0100007F -> 7F.00.00.01 -> 127.0.0.1
The fact 192.168.1.21
was appearing in both the serv
and iface
structures
was a clue, but it took a bit more digging to finally figure out what was
happening. Eventually I discovered that /etc/dnsmasq.conf
was an empty file.
I believe this was a consequence of some patches I make to the installer prior
to runnning it. And so it was that after nearly two days of troubleshooting, the
answer to the DNS issue was to set up a working /etc/dnsmasq.conf
file:
conf-dir=/etc/dnsmasq.d