Troubleshooting the Juniper SRX jdhcpd

There is a new game in town when it comes to configuring your SRX to provide DHCP addresses.

The new method of configuration is using a new daemon called jdhcpd which is outlined in the following Juniper KB article.

Fair enough. I moved my configuration from the old method, to the new method to allow DHCP scopes to exist in routing-instances.

And yet – when I apply that configuration…my DHCP daemon doesn’t seem to be running.

Odd, right? I checked, and rechecked my configuration, rebooted, performed a “commit full” and still have the same results. A quick cull of all internet related posts also yields nothing significant but also a few other posts with no answers, which means either I’m quickly getting denser in my old age, or this behavior can be explained by a bug.

system {
  dhcp-local-server {
    group WIFI {
      interface fe-0/0/5.0;
    }
  }
}
access {
 address-assignment {
 pool WIFI-PUBLIC {
 family inet {
 network 172.29.0.0/24;
 range WIFI-PUBLIC-POOL {
 low 172.29.0.10;
 high 172.29.0.100;
 }
 }
 }
 }
}

root@SRX100-H2_Branch_1# run show version
Hostname: SRX100-H2_Branch_1
Model: srx100h2
JUNOS Software Release [12.1X46-D40.2]

root@SRX100-H2_Branch_1# commit full
Feb 12 06:31:32 init: can not access /usr/sbin/hostname-cached: No such file or directory
Feb 12 06:31:32 init: hostname-caching-process (PID 0) started
Feb 12 06:31:32 init: security-intelligence (PID 19821) started
Feb 12 06:31:32 init: can not access /usr/sbin/ipmid: No such file or directory
Feb 12 06:31:32 init: ipmi (PID 0) started
Feb 12 06:31:32 init: dhcp-service (PID 18286) exited with status=1 <-THIS IS BAD
Feb 12 06:31:32 init: dhcp-service (PID 19826) started
commit complete

Feb 12 06:30:42 SRX100-H2_Branch_1 mgd[2768]: UI_CHILD_START: Starting child '/usr/sbin/dhcpd'
Feb 12 06:30:42 SRX100-H2_Branch_1 mgd[2768]: UI_CHILD_STATUS: Cleanup child '/usr/sbin/dhcpd', PID 19436, status 0
Feb 12 06:30:44 SRX100-H2_Branch_1 mgd[2768]: UI_CHILD_START: Starting child '/usr/sbin/jdhcpd'
Feb 12 06:30:46 SRX100-H2_Branch_1 mgd[2768]: UI_CHILD_STATUS: Cleanup child '/usr/sbin/jdhcpd', PID 19441, status 0
Feb 12 06:31:32 SRX100-H2_Branch_1 /kernel: init: dhcp-service (PID 18286) exited with status=1
Feb 12 06:31:32 SRX100-H2_Branch_1 /kernel: init: dhcp-service (PID 19826) started
Feb 12 06:31:36 SRX100-H2_Branch_1 /kernel: setsockopt(RTS_ASYNC_NEED_RESYNC) ignored (dhcpd): client already active
Feb 12 06:31:31 SRX100-H2_Branch_1 jdhcpd: DH_SVC_UDP_SOCKET_EXISTS_FAILURE: UDP socket already established

root@SRX100-H2_Branch_1# run show system processes | match dhcp
19816 ?? S 0:00.38 /usr/sbin/dhcpd -N  <-- this should be jdhcpd!!!
19428 p0 S+ 0:00.07 egrep -i dhcp (grep)

root@SRX100-H2_Branch_1# run show dhcp server statistics
error: the dhcp-service subsystem is not running  <- sad panda

 

 

So I tried dumping just this bit of config on fresh SRX, and … it works! So there must be something in my existing configuration that’s causing this. But what is it.

By pasting blocks of config sequentially, and checking my running daemons I was able to narrow down what worked, and what didn’t. Then it was a matter of adding/removing bits of the config until I determined exactly which part was the culprit.

It turns out, that the autoinstallation process relies on the dhcpd daemon (I’m surmising) to get it’s assigned address on boot, except that process isn’t compatible with the *new* way of configuring DHCP. Sigh.

 

As soon as I inactivate this part of the config, DHCP works as it’s not trying to start both DHCP daemons. I suspect because autoinstallation is the first stanza in the config file, it’s likely that daemon starts first, followed by the jdhcpd daemon which can never bind to a socket.

root@SRX100-H2_Branch_1# show system autoinstallation
##
## inactive: system autoinstallation
##
usb {
 disable;
}
root@SRX100-H2_Branch_1# commit
commit complete
root@SRX100-H2_Branch_1# run show system processes | match dhcp
18286 ?? S 0:02.07 /usr/sbin/jdhcpd -N <-- Success!!
root@SRX100-H2_Branch_1# run show dhcp server binding
IP address Session Id Hardware address Expires State Interface
172.29.0.11 4 08:00:27:df:75:5f 81431 BOUND fe-0/0/5.0

 

Much better. Since I didn’t find any reference to this while exercising my Google Foo, hopefully this ends up helping someone else.

Advertisements

Generating Custom Juniper Syslog Messages

I wanted to focus on a lesser known feature which I’ve found useful over the years when trying to setup NMS alerting and logging which doesn’t typically garner much attention in the documentation.

This would be the logger utility available on any JunOS platform from the shell. This program allows you to generate virtually any syslog event at will to test your configurations and confirm you have the correct hosts, priority, facility and filtering set on your local host as well as confirm your alerting is working correctly w/o having to flap interfaces, or other intrusive testing.

Here’s a quick overview on our available Severity and Facility definitions.

So let’s say I want to generate a syslog message to a remote syslog collector, but I only want to send messages which are generated by the external facility with a severity of info or better.

This process is very well outlined in the Day One Book : Applying Junos Event Automation

But to save the hunting, here’s the relevant excerpt;


 

How to use the logger test utility

1. The logger utility is a shell command, and so the user must first start a system shell by invoking the start shell command:

    user@Junos> start shell %

2. The logger utility has the following command syntax: logger -e EVENT_ID -p SYSLOG_PRIORITY -d DAEMON -a ATTRIBUTE=VALUE MESSAGE. Only the EVENT_ID is required, and it must be entered entirely in uppercase:

    % logger -e UI_COMMIT

The above command causes a UI_COMMIT event to be generated, originated from the logger daemon, with no attributes, no message, and a syslog facility/severity of user/notice.

The default settings can be altered by using one of the optional command line arguments.

3. For an alternate syslog facility/severity use the -p argument and specify the facility/severity in the same facility.severity format used by the jcs:syslog() function:

    % logger -e UI_COMMIT –p external.info

MORE? See Day One: Applying Junos Operations Automation for a table that lists the valid syslog facilities and severities for the jcs:syslog() function. 4. To alter what daemon generated the event, use the -d argument:

    % logger -e UI_COMMIT –d mgd

5. Include attributes for the event by using the -a argument. Use the argument multiple times if more than one attribute is needed. The attribute name must be in lowercase and should be followed by an equal sign and the desired value:

    % logger -e UI_COMMIT -a username=user -a command=commit

6. The syslog message follows all the command line arguments. Quotes are not required but are recommended for clarity:

    % logger -e UI_COMMIT -d mgd "This is a fake commit."

The above command causes the following message to be shown in the syslog:

 

    Jul 22 12:47:03 Junos mgd: UI_COMMIT: This is a fake commit.

NOTE When using the logger utility the event ID must always be in uppercase and the attribute names must always be in lowercase.


 

So let’s test this out.

 

root@vSRX-NAT-GW% logger -e LICENSE_VIOLATION -p external.info "This is a test of the emergency broadcast system"
root@vSRX-NAT-GW% exit
exit
root@vSRX-NAT-GW> show log messages | match LICENSE_VIOLATION | last 10
Jan 27 04:53:59 vSRX-NAT-GW logger: LICENSE_VIOLATION: This is a test of the emergency broadcast system

 

You can further confirm this by checking the syslog configuration on your local device. In this case, I’ve specified I’m ONLY sending messages to external host 192.168.56.11 if the facility is ‘external’ AND the severity is ‘info’ or greater (ie. not debug) AND the regex of the message matches LICENSE. Otherwise, we’ll likely have a local catch-all configured with any-any to locally log messages we may not be explicitly interested in looking at on the remote server.

root@vSRX-NAT-GW> show configuration
## Last commit: 2016-01-27 04:59:04 UTC by root
version 12.1X47-D20.7;
system {
  syslog {
    host 192.168.56.11 {
      external info;
      match LICENSE;
    }
   file messages {
     any any;
     authorization info;
   }
   file interactive-commands {
     interactive-commands any;
   }
}

 

To check this is working correctly, we’ll throw a ‘monitor traffic’ on the management interface looking for outbound UDP traffic to my syslog host, and we’ll generate some messages.

root@vSRX-NAT-GW> monitor traffic interface ge-0/0/0 matching "host 192.168.56.11 and udp"
[...]
Use <no-resolve> to avoid reverse lookups on IP addresses.
^C
941 packets received by filter
root@vSRX-NAT-GW>

 

And here’s the various tests we ran;

barnesry-mbp:python barnesry$ ssh root@192.168.56.107
Password:
--- JUNOS 12.1X47-D20.7 built 2015-03-03 21:53:50 UTC
root@vSRX-NAT-GW% logger -e LICENSE_VIOLATION -p daemon.warning "This is a test"
root@vSRX-NAT-GW% logger -e VIOLATION -p external.info "This is a test"
root@vSRX-NAT-GW% logger -e VIOLATION -p external.info "This is a test"

You can see in the capture above we captured no packets leaving my local host as a result of any of the above tests we generated.

Now, let’s restart the packet capture again, and generate a message we’re pretty certain will match.

barnesry-mbp:python barnesry$ ssh root@192.168.56.107
Password:
--- JUNOS 12.1X47-D20.7 built 2015-03-03 21:53:50 UTC
root@vSRX-NAT-GW% logger -e LICENSE_VIOLATION -p external.info "This is a test"

 

Let’s check our packet capture again.

root@vSRX-NAT-GW> monitor traffic interface ge-0/0/0 matching "host 192.168.56.11 and udp"
[...]
05:09:17.723090 Out IP truncated-ip - 42 bytes missing! 192.168.56.107.syslog > 192.168.56.11.syslog: SYSLOG local2.info, length: 74
^C
121 packets received by filter
0 packets dropped by kernel
root@vSRX-NAT-GW>

Success!