As you may know, NetEye Cloud is our multi-tenant SaaS solution for monitoring your infrastructure. It’s crucial to us for keeping every tenant aligned with the latest configurations and patches. We’ve managed to automate and align the agents via Desired State Configuration (DSC) and Ansible, but we still had to manually check those agents’ configurations. Luckily, there’s a tool that helps tremendously in automating these configurations: Jinja2.
None, the tool is already available in your NetEye Master/Satellite installation.
We need to collect metrics via telegraf on ServerIIS and ServerDB. Each server has different performance counters to be collected, as they have different applications installed. The telegraf agents will connect to the host on which we’re running this Ansible playbook.
For this use case we need an easy inventory.yml like this:
all:
hosts:
ServerIIS.lab.local:
telegraf_role: IIS
ServerDB.lab.local:
telegraf_role: MSSQL
We need to create a templates folder, then inside this we create a telegraf.j2 file.
NOTE: In Jinja, code is always between single curly brackets. To print a variable, double curly brackets are used instead.
[agent]
interval = "5s"
round_interval = true
metric_buffer_limit = 1000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
debug = false
quiet = true
hostname = ""
[global_tags]
###############################################################################
# OUTPUTS #
###############################################################################
[[outputs.nats]]
servers = ["nats://{{ nats_server }}:4222"]
subject = "telegraf.metrics"
data_format = "influx"
secure = true
# ## TLS Config
tls_ca = "c:\\Program Files\\telegraf\\certs\\root-ca.crt"
tls_cert = "c:\\Program Files\\telegraf\\certs\\telegraf-agent.crt.pem"
tls_key = "c:\\Program Files\\telegraf\\certs\\private\\telegraf-agent.key.pem"
###############################################################################
# INPUTS #
###############################################################################
################# Monitor telegraf itself #################
[[inputs.win_perf_counters]]
UseWildcardsExpansion = true
LocalizeWildcardsExpansion = false
[[inputs.win_perf_counters.object]]
ObjectName = "Process"
Counters = ["% Processor Time","% Privileged Time","Handle Count","Thread Count","Page File Bytes","Working Set","Working Set - Private","IO Read Bytes/sec","IO Write Bytes/sec","ID Process"]
Instances = ["telegraf"]
Measurement = "agent"
############################################################################################
################# START WIN #################
[[inputs.win_perf_counters.object]]
ObjectName = "Memory"
Counters = ["Available KBytes","Commit Limit","Committed Bytes","Page Faults/sec","Page Reads/sec","Page Writes/sec","Pages Input/sec","Pages Output/sec","Pages/sec","Pool Nonpaged Bytes","Pool Paged Bytes","Standby Cache Reserve Bytes","System Cache Resident Bytes"]
Instances = ["------"]
Measurement = "Memory"
[[inputs.win_perf_counters.object]]
ObjectName = "Network Interface"
Counters = ["Bytes Received/sec","Bytes Sent/sec","Bytes Total/sec","Current Bandwidth","Offloaded Connections","Output Queue Length"]
Instances = ["*"]
Measurement = "Network_Interface"
IncludeTotal = true
[[inputs.win_perf_counters.object]]
ObjectName = "Paging File"
Counters = ["% Usage","% Usage Peak"]
Instances = ["*"]
Measurement = "Paging_File"
IncludeTotal = true
[[inputs.win_perf_counters.object]]
ObjectName = "PhysicalDisk"
Counters = ["Avg. Disk Bytes/Read","Avg. Disk Bytes/Transfer","Avg. Disk Bytes/Write","Avg. Disk Queue Length","Avg. Disk Write Queue Length","Avg. Disk sec/Read","Avg. Disk sec/Write","Disk Read Bytes/sec","Disk Reads/sec","Disk Write Bytes/sec","Disk Writes/sec"]
Instances = ["*"]
Measurement = "PhysicalDisk"
IncludeTotal = true
[[inputs.win_perf_counters.object]]
ObjectName = "Processor"
Counters = ["% Privileged Time","% Processor Time"]
Instances = ["*"]
Measurement = "Processor"
IncludeTotal = true
[[inputs.win_perf_counters.object]]
ObjectName = "System"
Counters = ["Context Switches/sec","Processes","Processor Queue Length","Threads"]
Instances = ["------"]
Measurement = "System"
################# END WIN #################
{% if telegraf_role == 'IIS' %}
################# START iis ###################
[[inputs.win_perf_counters.object]]
ObjectName = "HTTP Service Request Queues"
Counters = ["ArrivalRate","CacheHitRate","CurrentQueueSize","MaxQueueItemAge","RejectedRequests","RejectionRate"]
Instances = ["*"]
Measurement = "HTTP_Service_Request_Queues"
IncludeTotal = true
[[inputs.win_perf_counters.object]]
ObjectName = "HTTP Service Url Groups"
Counters = ["BytesReceivedRate","BytesSentRate","BytesTransferredRate","ConnectionAttempts","CurrentConnections","GetRequests","HeadRequests","MaxConnections"]
Instances = ["*"]
Measurement = "HTTP_Service_Url_Groups"
IncludeTotal = true
[[inputs.win_perf_counters.object]]
ObjectName = "HTTP Service"
Counters = ["CurrentUrisCached","TotalFlushedUris","TotalUrisCached","UriCacheFlushes","UriCacheHits","UriCacheMisses"]
Instances = ["------"]
Measurement = "HTTP_Service"
################# END iis ###################
{% endif %}
{% if telegraf_role == 'MSSQL' %}
################# START SQL #################
[[inputs.win_perf_counters.object]]
ObjectName = "SQLAgent:JobSteps"
Counters = ["Active steps","Total step retries"]
Instances = ["*"]
Measurement = "JobSteps"
IncludeTotal = true
[[inputs.win_perf_counters.object]]
ObjectName = "SQLAgent:Jobs"
Counters = ["Active jobs","Failed jobs","Jobs activated/minute"]
Instances = ["*"]
Measurement = "Jobs"
IncludeTotal = true
[[inputs.win_perf_counters.object]]
ObjectName = "SQLAgent:SystemJobs"
Counters = ["Active system jobs"]
Instances = ["*"]
Measurement = "SystemJobs"
IncludeTotal = true
################# END SQL #################
{% endif %}
As mentioned in the use case, the agent will connect to the current host we are running the playbook from. Let’s create a playbook.yml
- name: Create telegraf configurations
hosts: all
connection: local
vars:
nats_server: "{{ hostvars['localhost'].ansible_nodename}}"
tasks:
- name: Gather facts from localhost for later use
setup:
delegate_to: "localhost"
delegate_facts: true
- name: create tmp folders
file:
path: "output/{{inventory_hostname}}/"
state: directory
delegate_to: localhost
- name: "generate conf locally"
template:
src: templates/telegraf.j2
dest: "output/{{ inventory_hostname }}/telegraf.conf"
delegate_to: localhost
ansible-playbook playbook.yml -i inventory.yml
Now you’ll find the templates in the newly created output folder.
With this simple playbook and template, we now have the telegraf configurations to install on the servers. It seems like a lot of effort for only two servers, but the possibilities are endless when you have to keep huge system environments up-to-date with the latest configurations! Furthermore, it’s possible to generate and install telegraf directly from the NetEye servers, but that’s a topic for another blog 😊.
How to manage Apache web servers using Jinja2 templates and filters | Enable Sysadmin (redhat.com)
How to build your inventory — Ansible Documentation
Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth Phoenix.