Skip to end of metadata
Go to start of metadata

Problem

When having an overlapping cluster with a set of Linux nodes and more than 10 clusters, it happens regularly that the Check_MK service shows the error


[agent] Empty output from agent at <ip>:6556.

This happens, because the agent is queried in parallel for each cluster.
Reschedule active checks on all related Check_MK service which is Spread over 0 minutes also has a negative influence. Always spread it over 1 or more Minutes.
This leads to systemd, xinetd or cmk-agent-ctl blocking the connection.
Also check_tcp or host check commands to port 6556/tcp could consume one per source connection.

Solution

This does not have to happen

  • If you use a lower or equal Normal check interval for service checks for the clusternodes compared to clusterhosts.
    In other words: check the clusternode more frequently than the clusterhost or at least with the same interval.
  • The clusterhosts will use the cached agent output of the clusternode if they are recent enough.
  • Set  Global Setting > Maximum cache file age for clusters to e.g. 1.5 times the Normal check interval for service checks
  • If you use the defaults of Normal check interval for service checks is equal to 1 min and Maximum cache file age for clusters of 90 seconds you are fine.

Other alternatives

  • If you can't or do not want to set Normal check interval for service checks and Maximum cache file age for clusters as described above you can configure a higher per source limiting for the agent.
  • Since there are at least 3 methods to get the agent output from port 6556/tcp there are also 3 different ways to do it.

xinetd

Edit /etc/xinetd.conf and add this line to the defaults section:

 per_source = <the number of clusters that this node belongs to>

Restart the xinetd daemon after that change.

systemd

  • We have to distinguish between 2.0 and 2.1 here.
  • With 2.1 cmk-agent-ctl is listening to 6556/tcp, not a systemd socket.

    Edit /etc/systemd/system/check-mk-agent.socket and add this line to the Socket section:
MaxConnectionsPerSource=<the number of clusters that this node belongs to>


Reload the systemd manager configuration by issuing

systemctl daemon-reload

You verify your change by executing

systemctl show check-mk-agent.socket | grep MaxConnectionsPerSource

cmk-agent-ctl

  • the cmk-agent-ctl has got it's own per source limit protection which is not done by systemd.
  • Currently it's not configurable by the backery but you can controll this by an environment variable.
  • edit the systemd unit to set an environment variable
 systemctl edit cmk-agent-ctl-daemon.service 
  • set  Environment variable DEBUG_MAX_CONNECTIONS
# /etc/systemd/system/cmk-agent-ctl-daemon.service.d/override.conf
[Service]
Environment="DEBUG_MAX_CONNECTIONS=16"
  • make systemd aware of this change
  • restart the cmk-agent-ctl-daemon unit to use the Environment variable DEBUG_MAX_CONNECTIONS
systemctl daemon-reload 
systemctl restart cmk-agent-ctl-daemon.service

1 Comment

  1. As far as I can remember, that was due to multiple fetchers trying to access the cached data, correct.