Problem
When having an overlapping cluster with a set of Linux nodes and more than 10 clusters, it happens regularly that the Check_MK service shows the error
[agent] Empty output from agent at <ip>:6556
.This happens, because the agent is queried in parallel for each cluster.
Reschedule active checks on all related Check_MK service which is Spread over 0 minutes also has a negative influence. Always spread it over 1 or more Minutes.
This leads to systemd, xinetd or cmk-agent-ctl blocking the connection.
Also check_tcp or host check commands to port 6556/tcp could consume one per source connection.
Solution
This does not have to happen
- If you use a lower or equal Normal check interval for service checks for the clusternodes compared to clusterhosts.
In other words: check the clusternode more frequently than the clusterhost or at least with the same interval. - The clusterhosts will use the cached agent output of the clusternode if they are recent enough.
- Set Global Setting > Maximum cache file age for clusters to e.g. 1.5 times the Normal check interval for service checks
- If you use the defaults of Normal check interval for service checks is equal to 1 min and Maximum cache file age for clusters of 90 seconds you are fine.
Other alternatives
- If you can't or do not want to set Normal check interval for service checks and Maximum cache file age for clusters as described above you can configure a higher per source limiting for the agent.
- Since there are at least 3 methods to get the agent output from port 6556/tcp there are also 3 different ways to do it.
xinetd
Edit /etc/xinetd.conf and add this line to the defaults section:
per_source = <the number of clusters that this node belongs to>
Restart the xinetd daemon after that change.
systemd
- We have to distinguish between 2.0 and 2.1 here.
- With 2.1 cmk-agent-ctl is listening to 6556/tcp, not a systemd socket.
Edit /etc/systemd/system/check-mk-agent.socket and add this line to the Socket section:
MaxConnectionsPerSource=<the number of clusters that this node belongs to>
Reload the systemd manager configuration by issuing
systemctl daemon-reload
You verify your change by executing
systemctl show check-mk-agent.socket | grep MaxConnectionsPerSource
cmk-agent-ctl
- the cmk-agent-ctl has got it's own per source limit protection which is not done by systemd.
- Currently it's not configurable by the backery but you can controll this by an environment variable.
- edit the systemd unit to set an environment variable
systemctl edit cmk-agent-ctl-daemon.service
- set Environment variable DEBUG_MAX_CONNECTIONS
# /etc/systemd/system/cmk-agent-ctl-daemon.service.d/override.conf [Service] Environment="DEBUG_MAX_CONNECTIONS=16"
- make systemd aware of this change
- restart the cmk-agent-ctl-daemon unit to use the Environment variable DEBUG_MAX_CONNECTIONS
systemctl daemon-reload systemctl restart cmk-agent-ctl-daemon.service
1 Comment
Lars Getwan
As far as I can remember, that was due to multiple fetchers trying to access the cached data, correct.