I guess, the you have already configures Splunk agent (Universal Forwarder) to collect all neede metrics. So right now we can start and create first Splunk dashboard.

Here is the instruction, how to create dashboard: https://docs.splunk.com/Documentation/Splunk/7.1.2/SearchTutorial/Createnewdashboard

It can be done manually. But the whole definition is saved in XML format. When you create a dashboard, edit it and then view a source, then you will see the XML code. If you correctly named the index (perfmon), then you should be able to use my XML file and see the results.

Dashboard: Citrix – Servers – Performance Details

Let;s take a look how the dashboard is created. At the top, we have two select boxes.

The first one – date picker (with default value – Last 24 hours). Second select is host picker. I define it to get all hosts name from perfmon index.

<input type="dropdown" token="hostPicker">
<label>Host</label>
<fieldForLabel>host</fieldForLabel>
<fieldForValue>host</fieldForValue>
<search>
<query>index=perfmon host="CTX*" | table host | dedup host | sort host</query>
<earliest>-24h@h</earliest>
<latest>now</latest>
</search>
</input>

 

I have also added filtering – host=”CTX*”. This means, that a query would chose all host with a name beginning from “CTX“.

Here are all the rest queries use to build a dashboard. In every query I’m explicitly setting time frame (span) to 1 minut.

index=perfmon host=$hostPicker$ collection=Processor | timechart span=1m avg(Value) by counter

I did this, to get the better data resolution. By default Splunk sets span rage based on a data frame range. For 60 minutes data will be aggregated in 1 minut window. For 4 hours range 5 minutes. But when you want to see 24 hours data they will be aggregated using 30 minutes tie frame. And some times this can be not enough. Especially, when event tooks 5 or 10 minutes.

SPL queries ($hostPicker$ is a name of a host you would like to visualize):

index=perfmon host=$hostPicker$ collection=Processor | timechart span=1m avg(Value) by counter
index=perfmon host=$hostPicker$ collection=Memory counter="Available Mbytes" | timechart span=1m avg(Value) by counter
index=perfmon host=$hostPicker$ collection=Memory counter="% Committed Bytes In Use" | timechart span=1m avg(Value) by counter
index=perfmon host=$hostPicker$ collection="Paging File" counter="% Usage" | timechart span=1m avg(Value) by counter
index=perfmon host=$hostPicker$ collection=System counter="Context Switches/sec" | timechart span=1m avg(Value) by counter
index=perfmon host=$hostPicker$ counter="Bytes Sent/sec" | eval kB=round(Value/1024,2) | timechart span=1m avg(kB) by instance
index=perfmon host=$hostPicker$ counter="Bytes Received/sec" | eval kB=round(Value/1024,2) | timechart span=1m avg(kB) by instance
index=perfmon host=$hostPicker$ collection=LogicalDisk_latency counter="Avg. Disk sec/Read" NOT "instance=_Total" | timechart span=1m avg(Value) by instance
index=perfmon host=$hostPicker$ collection=LogicalDisk_latency counter="Avg. Disk sec/Write" NOT "instance=_Total" | timechart span=1m avg(Value) by instance
index=perfmon host=$hostPicker$ collection=LogicalDisk counter="Disk Reads/sec" NOT "instance=_Total" | timechart span=1m avg(Value) by instance
index=perfmon host=$hostPicker$ collection=LogicalDisk counter="Disk Writes/sec" NOT "instance=_Total" | timechart span=1m avg(Value) by instance
index=perfmon host=$hostPicker$ collection=LogicalDisk counter="Avg. Disk Bytes/Read" NOT "instance=_Total" | eval kB=round(Value/1024,2) | timechart span=1m avg(kB) by instance
index=perfmon host=$hostPicker$ collection=LogicalDisk counter="Avg. Disk Bytes/Write" NOT "instance=_Total" | eval kB=round(Value/1024,2) | timechart span=1m avg(kB) by instance
index=perfmon host=$hostPicker$ collection=LogicalDisk counter="Current Disk Queue Length" NOT "instance=_Total" | timechart span=1m avg(Value) by instance
index=perfmon host=$hostPicker$ sourcetype="Perfmon:LogicalDisk_space" counter="% Free Space" NOT "instance=_Total" | timechart span=5m avg(Value) by instance
index=perfmon host=$hostPicker$ sourcetype="WinEventLog:Application" "EventCode=1501" "SourceName=ING_Citrix" | rex "Message=\"(?&lt;Message&gt;.+)\", transformed=" | spath input=Message | eval wc_size=round(vDisk_cache_file_size/1000,2) | timechart span=15m avg(wc_size) as "Write Cache file size"
index=perfmon host=$hostPicker$ sourcetype="WinEventLog:Application" "EventCode=1501" "SourceName=ING_Citrix" | rex "Message=\"(?&lt;Message&gt;.+)\", transformed=" | spath input=Message | eval wc_file_used_perc=100-vDisk_cache_drive_free_size | eval threshold=100 | timechart span=15m avg(threshold) as limit, avg(wc_file_used_perc) as "% File used"
index=perfmon host=$hostPicker$ sourcetype="WinEventLog:Application" "EventCode=1501" "SourceName=ING_Citrix" | rex "Message=\"(?&lt;Message&gt;.+)\", transformed=" | spath input=Message | timechart span=15m avg(vDisk_RAM_cache_usage) as "RAM usage"

And here are the results (from Citrix/Terminal Server)

Basic information about Processor, Memory and System:

Network interfaces:

Disks:

Disks again:

And the last charts – PVS Write Cache.

As I mentioned previously, all those charts can be used for generic Windows Servers. Except the last three. I will describe how to collect such data from PVS Target Device on separate post.