1gmond.conf(5) Ganglia Monitoring System gmond.conf(5)
2
3
4
6 gmond.conf - configuration file for ganglia monitoring daemon (gmond)
7
9 The gmond.conf file is used to configure the ganglia monitoring daemon
10 (gmond) which is part of the Ganglia Distributed Monitoring System.
11
13 All sections and attributes are case-insensitive. For example, name or
14 NAME or Name or NaMe are all equivalent.
15
16 Some sections can be included in the configuration file multiple times
17 and some sections are singular. For example, you can have only one
18 cluster section to define the attributes of the cluster being
19 monitored; however, you can have multiple udp_recv_channel sections to
20 allow gmond to receive message on multiple UDP channels.
21
22 cluster
23 There should only be one cluster section defined. This section
24 controls how gmond reports the attributes of the cluster that it is
25 part of.
26
27 The cluster section has four attributes: name, owner, latlong and url.
28
29 For example,
30
31 cluster {
32 name = "Millennium Cluster"
33 owner = "UC Berkeley CS Dept."
34 latlong = "N37.37 W122.23"
35 url = "http://www.millennium.berkeley.edu/"
36 }
37
38 The name attributes specifies the name of the cluster of machines. The
39 owner tag specifies the administrators of the cluster. The pair
40 name/owner should be unique to all clusters in the world.
41
42 The latlong attribute is the latitude and longitude GPS coordinates of
43 this cluster on earth. Specified to 1 mile accuracy with two decimal
44 places per axis in decimal.
45
46 The url for more information on the cluster. Intended to give purpose,
47 owner, administration, and account details for this cluster.
48
49 There directives directly control the XML output of gmond. For
50 example, the cluster configuration example above would translate into
51 the following XML.
52
53 <CLUSTER NAME="Millennium Cluster" OWNER="UC Berkeley CS Dept."
54 LATLONG="N37.37 W122.23" URL="http://www.millennium.berkeley.edu/">
55 ...
56 </CLUSTER>
57
58 host
59 The host section provides information about the host running this
60 instance of gmond. Currently only the location string attribute is
61 supported. Example:
62
63 host {
64 location = "1,2,3"
65 }
66
67 The numbers represent Rack, Rank and Plane respectively.
68
69 globals
70 The globals section controls general characteristics of gmond such as
71 whether is should daemonize, what user it should run as, whether is
72 should send/receive date and such. The globals section has the
73 following attributes: daemonize, setuid, user, debug_level, mute, deaf,
74 allow_extra_data, host_dmax, host_tmax, cleanup_threshold, gexec,
75 send_metadata_interval and module_dir.
76
77 For example,
78
79 globals {
80 daemonize = true
81 setuid = true
82 user = ganglia
83 host_dmax = 3600
84 host_tmax = 40
85 }
86
87 The daemonize attribute is a boolean. When true, gmond will daemonize.
88 When false, gmond will run in the foreground.
89
90 The setuid attribute is a boolean. When true, gmond will set its
91 effective UID to the uid of the user specified by the user attribute.
92 When false, gmond will not change its effective user.
93
94 The debug_level is an integer value. When set to zero (0), gmond will
95 run normally. A debug_level greater than zero will result in gmond
96 running in the foreground and outputting debugging information. The
97 higher the debug_level the more verbose the output.
98
99 The mute attribute is a boolean. When true, gmond will not send data
100 regardless of any other configuration directives.
101
102 The deaf attribute is a boolean. When true, gmond will not receive
103 data regardless of any other configuration directives.
104
105 The allow_extra_data attribute is a boolean. When false, gmond will
106 not send out the EXTRA_ELEMENT and EXTRA_DATA parts of the XML. This
107 might be useful if you are using your own frontend to the metric data
108 and will like to save some bandwith.
109
110 The host_dmax value is an integer with units in seconds. When set to
111 zero (0), gmond will never delete a host from its list even when a
112 remote host has stopped reporting. If host_dmax is set to a positive
113 number then gmond will flush a host after it has not heard from it for
114 host_dmax seconds. By the way, dmax means "delete max".
115
116 The host_tmax value is an integer with units in seconds. This value
117 represents the maximum amount of time that gmond should wait between
118 updates from a host. As messages may get lost in the network, gmond
119 will consider the host as being down if it has not received any
120 messages from it after 4 times this value. For example, if host_tmax is
121 set to 20, the host will appear as down after 80 seconds with no
122 messages from it. By the way, tmax means "timeout max".
123
124 The cleanup_threshold is the minimum amount of time before gmond will
125 cleanup any hosts or metrics where tn > dmax a.k.a. expired data.
126
127 The gexec boolean allows you to specify whether gmond will announce the
128 hosts availability to run gexec jobs. Note: this requires that gexecd
129 is running on the host and the proper keys have been installed.
130
131 The send_metadata_interval establishes an interval in which gmond will
132 send or resend the metadata packets that describe each enabled metric.
133 This directive by default is set to 0 which means that gmond will only
134 send the metadata packets at startup and upon request from other gmond
135 nodes running remotely. If a new machine running gmond is added to a
136 cluster, it needs to announce itself and inform all other nodes of the
137 metrics that it currently supports. In multicast mode, this isn't a
138 problem because any node can request the metadata of all other nodes in
139 the cluster. However in unicast mode, a resend interval must be
140 established. The interval value is the minimum number of seconds
141 between resends.
142
143 The override_hostname and override_ip parameters allow an arbitrary
144 hostname and/or IP (hostname can be optionally specified without IP) to
145 use when identifying metrics coming from this host.
146
147 The module_dir is an optional parameter indicating the directory where
148 the DSO modules are to be located. If absent, the value to use is set
149 at configure time with the --with-moduledir option which will default
150 if omitted to the a subdirectory named "ganglia" in the directory where
151 libganglia will be installed.
152
153 For example, in a 32-bit Intel compatible Linux host that is usually:
154
155 /usr/lib/ganglia
156
157 udp_send_channel
158 You can define as many udp_send_channel sections as you like within the
159 limitations of memory and file descriptors. If gmond is configured as
160 mute this section will be ignored.
161
162 The udp_send_channel has a total of seven attributes: mcast_join,
163 mcast_if, host, port, ttl, bind and bind_hostname. bind and
164 bind_hostname are mutually exclusive.
165
166 For example, the 2.5.x version gmond would send on the following single
167 channel by default...
168
169 udp_send_channel {
170 mcast_join = 239.2.11.71
171 port = 8649
172 }
173
174 The mcast_join and mcast_if attributes are optional. When specified
175 gmond will create the UDP socket and join the mcast_join multicast
176 group and send data out the interface specified by mcast_if.
177
178 You can use the bind attribute to bind to a particular local address to
179 be used as the source for the multicast packets sent or let gmond
180 resolve the default hostname if bind_hostname = yes.
181
182 If only a host and port are specified then gmond will send unicast UDP
183 messages to the hosts specified.
184
185 You could specify multiple unicast hosts for redundancy as gmond will
186 send UDP messages to all UDP channels.
187
188 Be careful though not to mix multicast and unicast attributes in the
189 same udp_send_channel definition.
190
191 For example...
192
193 udp_send_channel {
194 host = host.foo.com
195 port = 2389
196 }
197 udp_send_channel {
198 host = 192.168.3.4
199 port = 2344
200 }
201
202 would configure gmond to send messages to two hosts. The host
203 specification can be an IPv4/IPv6 address or a resolvable hostname.
204
205 The ttl attribute lets you modify the Time-To-Live (TTL) of outgoing
206 messages (unicast or multicast).
207
208 udp_recv_channel
209 You can specify as many udp_recv_channel sections as you like within
210 the limits of memory and file descriptors. If gmond is configured deaf
211 this attribute will be ignored.
212
213 The udp_recv_channel section has following attributes: mcast_join,
214 bind, port, mcast_if, family, retry_bind and buffer. The
215 udp_recv_channel can also have an acl definition (see ACCESS CONTROL
216 LISTS below).
217
218 For example, the 2.5.x gmond ran with a single udp receive channel...
219
220 udp_recv_channel {
221 mcast_join = 239.2.11.71
222 bind = 239.2.11.71
223 port = 8649
224 }
225
226 The mcast_join and mcast_if should only be used if you want to have
227 this UDP channel receive multicast packets the multicast group
228 mcast_join on interface mcast_if. If you do not specify multicast
229 attributes then gmond will simply create a UDP server on the specified
230 port.
231
232 You can use the bind attribute to bind to a particular local address.
233
234 The family address is set to inet4 by default. If you want to bind the
235 port to an inet6 port, you need to specify that in the family
236 attribute. Ganglia will not allow IPV6=>IPV4 mapping (for portability
237 and security reasons). If you want to listen on both inet4 and inet6
238 for a particular port, explicitly state it with the following:
239
240 udp_recv_channel {
241 port = 8666
242 family = inet4
243 }
244 udp_recv_channel {
245 port = 8666
246 family = inet6
247 }
248
249 If you specify a bind address, the family of that address takes
250 precedence. f your IPv6 stack doesn't support IPV6_V6ONLY, a warning
251 will be issued but gmond will continue working (this should rarely
252 happen).
253
254 Multicast Note: for multicast, specifying a bind address with the same
255 value used for mcast_join will prevent unicast UDP messages to the same
256 port from being processed.
257
258 The sFlow protocol (see http://www.sflow.org) can be used to collect a
259 standard set of performance metrics from servers. For servers that
260 don't include embedded sFlow agents, an open source sFlow agent is
261 available on SourceForge (see http://host-sflow.sourceforge.net).
262
263 To configure gmond to receive sFlow datagrams, simply add a
264 udp_recv_channel with the port set to 6343 (the IANA registered port
265 for sFlow):
266
267 udp_recv_channel {
268 port = 6343
269 }
270
271 Note: sFlow is unicast protocol, so don't include mcast_join join.
272 Note: To use some other port for sFlow, set it here and then specify
273 the port in an sflow section (see below).
274
275 gmond will fail to run if it can't bind to all defined
276 udp_recv_channels. Sometimes, on machines configured by DHCP, for
277 example, the gmond daemon starts before a network address is assigned
278 to the interface. Consequently, the bind fails and the gmond daemon
279 does not run. To assist in this situation, the boolean parameter
280 retry_bind can be set to the value true and then the daemon will not
281 abort on failure, it will enter a loop and repeat the bind attempt
282 every 60 seconds:
283
284 udp_recv_channel {
285 port = 6343
286 retry_bind = true
287 }
288
289 If you have a large system with lots of metrics, you might experience
290 UDP drops. This happens when gmond is not able to process the UDP fast
291 enough from the network. In this case you might consider changing your
292 setup into a more distributed setup using aggregator gmond hosts.
293 Alternatively you can choose to create a bigger receive buffer:
294
295 udp_recv_channel {
296 port = 6343
297 buffer = 10485760
298 }
299 B<buffer> is specified in bytes, i.e.: 10485760 will allow 10MB UDP
300 to be buffered in memory.
301
302 Note: increasing buffer size will increase memory usage by gmond
303
304 tcp_accept_channel
305 You can specify as many tcp_accept_channel sections as you like within
306 the limitations of memory and file descriptors. If gmond is configured
307 to be mute, then these sections are ignored.
308
309 The tcp_accept_channel has the following attributes: bind, port,
310 interface, family and timeout. A tcp_accept_channel may also have an
311 acl section specified (see ACCESS CONTROL LISTS below).
312
313 For example, 2.5.x gmond would accept connections on a single TCP
314 channel.
315
316 tcp_accept_channel {
317 port = 8649
318 }
319
320 The bind address is optional and allows you to specify which local
321 address gmond will bind to for this channel.
322
323 The port is an integer than specifies which port to answer requests for
324 data.
325
326 The family address is set to inet4 by default. If you want to bind the
327 port to an inet6 port, you need to specify that in the family
328 attribute. Ganglia will not allow IPV6=>IPV4 mapping (for portability
329 and security reasons). If you want to listen on both inet4 and inet6
330 for a particular port, explicitly state it with the following:
331
332 tcp_accept_channel {
333 port = 8666
334 family = inet4
335 }
336 tcp_accept_channel {
337 port = 8666
338 family = inet6
339 }
340
341 If you specify a bind address, the family of that address takes
342 precedence. If your IPv6 stack doesn't support IPV6_V6ONLY, a warning
343 will be issued but gmond will continue working (this should rarely
344 happen).
345
346 The timeout attribute allows you to specify how many microseconds to
347 block before closing a connection to a client. The default is set to
348 -1 (blocking IO) and will never abort a connection regardless of how
349 slow the client is in fetching the report data.
350
351 The interface is not implemented at this time (use bind).
352
353 collection_group
354 You can specify as many collection_group section as you like within the
355 limitations of memory. A collection_group has the following
356 attributes: collect_once, collect_every and time_threshold. A
357 collection_group must also contain one or more metric sections.
358
359 The metric section has the following attributes: (one of name or
360 name_match; name_match is only permitted if pcre support is compiled
361 in), value_threshold and title. For a list of available metric names,
362 run the following command:
363
364 % gmond -m
365
366 Here is an example of a collection group for a static metric...
367
368 collection_group {
369 collect_once = yes
370 time_threshold = 1800
371 metric {
372 name = "cpu_num"
373 title = "Number of CPUs"
374 }
375 }
376
377 This collection_group entry would cause gmond to collect the cpu_num
378 metric once at startup (since the number of CPUs will not change
379 between reboots). The metric cpu_num would be send every 1/2 hour
380 (1800 seconds). The default value for the time_threshold is 3600
381 seconds if no time_threshold is specified.
382
383 The time_threshold is the maximum amount of time that can pass before
384 gmond sends all metrics specified in the collection_group to all
385 configured udp_send_channels. A metric may be sent before this
386 time_threshold is met if during collection the value surpasses the
387 value_threshold (explained below).
388
389 Here is an example of a collection group for a volatile metric...
390
391 collection_group {
392 collect_every = 60
393 time_threshold = 300
394 metric {
395 name = "cpu_user"
396 value_threshold = 5.0
397 title = "CPU User"
398 }
399 metric {
400 name = "cpu_idle"
401 value_threshold = 10.0
402 title = "CPU Idle"
403 }
404 }
405
406 This collection group would collect the cpu_user and cpu_idle metrics
407 every 60 seconds (specified in collect_every). If cpu_user varies by
408 5.0% or cpu_idle varies by 10.0%, then the entire collection_group is
409 sent. If no value_threshold is triggered within time_threshold seconds
410 (in this case 300), the entire collection_group is sent.
411
412 Each time the metric value is collected the new value is compared with
413 the old value collected. If the difference between the last value and
414 the current value is greater than the value_threshold, the entire
415 collection group is send to the udp_send_channels defined.
416
417 It's important to note that all metrics in a collection group are sent
418 even when only a single value_threshold is surpassed.
419
420 In addition a user friendly title can be substituted for the metric
421 name by including a title within the metric section.
422
423 By using the name_match parameter instead of name, it is possible to
424 use a single definition to configure multiple metrics that match a
425 regular expression. The perl compatible regular expression (pcre)
426 syntax is used. This approach is particularly useful for a series of
427 metrics that may vary in number between reboots (e.g. metric names that
428 are generated for each individual NIC or CPU core).
429
430 Here is an example of using the name_match directive to enable the
431 multicpu metrics:
432
433 metric {
434 name_match = "multicpu_([a-z]+)([0-9]+)"
435 value_threshold = 1.0
436 title = "CPU-\\2 \\1"
437 }
438
439 Note that in the example above, there are two matches: the alphabetical
440 match matches the variations of the metric name (e.g. idle, system)
441 while the numeric match matches the CPU core number. The second thing
442 to note is the use of substitutions within the argument to title.
443
444 If both name and name_match are specified, then name is ignored.
445
446 Modules
447 A modules section contains the parameters that are necessary to load a
448 metric module. A metric module is a dynamically loadable module that
449 extends the available metrics that gmond is able to collect. Each
450 modules section contains at least one module section. Within a module
451 section are the directives name, language, enabled, path and params.
452 The module name is the name of the module as determined by the module
453 structure if the module was developed in C/C++. Alternatively, the
454 name can be the name of the source file if the module has been
455 implemented in a interpreted language such as python. A language
456 designation must be specified as a string value for each module. The
457 language directive must correspond to the source code language in which
458 the module was implemented (ex. language = "python"). If a language
459 directive does not exist for the module, the assumed language will be
460 "C/C++". The enabled directive allows a metric module to be easily
461 enabled or disabled through the configuration file. If the enabled
462 directive is not included in the module configuration, the enabled
463 state will default to "yes". One thing to note is that if a module has
464 been disabled yet the metric which that module implements is still
465 listed as part of a collection group, gmond will produce a warning
466 message. However gmond will continue to function normally by simply
467 ignoring the metric. The path is the path from which gmond is expected
468 to load the module (C/C++ compiled dynamically loadable module only).
469 The params directive can be used to pass a single string parameter
470 directly to the module initialization function (C/C++ module only).
471 Multiple parameters can be passed to the module's initialization
472 function by including one or more param sections. Each param section
473 must be named and contain a value directive. Once a module has been
474 loaded, the additional metrics can be discovered by invoking gmond -m.
475
476 modules {
477 module {
478 name = "example_module"
479 language = "C/C++"
480 enabled = yes
481 path = "modexample.so"
482 params = "An extra raw parameter"
483 param RandomMax {
484 value = 75
485 }
486 param ConstantValue {
487 value = 25
488 }
489 }
490 }
491
492 sFlow
493 The sflow group is optional and has the following optional attributes:
494 udp_port, accept_vm_metrics, accept_http_metrics,
495 accept_memcache_metrics, accept_jvm_metrics,
496 multiple_http_instances,multiple_memcache_instances,
497 multiple_jvm_instances. By default, a udp_recv_channel on port 6343
498 (the IANA registered port for sFlow) is all that is required to accept
499 and process sFlow datagrams. To receive sFlow on some other port
500 requires both a udp_recv_channel for the other port and a udp_port
501 setting here. For example:
502
503 udp_recv_channel {
504 port = 7343
505 }
506
507 sflow {
508 udp_port = 7343
509 }
510
511 An sFlow agent running on a hypervisor may also be sending metrics for
512 its local virtual machines. By default these metrics are ignored, but
513 the accept_vm_metrics flag can be used to accept those metrics too,
514 and prefix them with an identifier for each virtual machine.
515
516 sflow {
517 accept_vm_metrics = yes
518 }
519
520 The sFlow feed may also contain metrics sent from HTTP or memcached
521 servers, or from Java VMs. Extra options can be used to ignore or
522 accept these metrics, and to indicate that there may be multiple
523 instances per host. For example:
524
525 sflow {
526 accept_http_metrics = yes
527 multiple_http_instances = yes
528 }
529
530 will allow the HTTP metrics, and also mark them with a distinguishing
531 identifier so that each instance can be trended separately. (If
532 multiple instances are reporting and this flag is not set, the results
533 are likely to be garbled.)
534
535 Include
536 This directive allows the user to include additional configuration
537 files rather than having to add all gmond configuration directives to
538 the gmond.conf file. The following example includes any file with the
539 extension of .conf contained in the directory conf.d as if the contents
540 of the included configuration files were part of the original
541 gmond.conf file. This allows the user to modularize their configuration
542 file. One usage example might be to load individual metric modules by
543 including module specific .conf files.
544
545 include ('/etc/ganglia/conf.d/*.conf')
546
548 The udp_recv_channel and tcp_accept_channel directives can contain an
549 Access Control List (ACL). This ACL allows you to specify exactly
550 which hosts gmond process data from.
551
552 An example of an acl entry looks like
553
554 acl {
555 default = "deny"
556 access {
557 ip = 192.168.0.4
558 mask = 32
559 action = "allow"
560 }
561 }
562
563 This ACL will by default reject all traffic that is not specifically
564 from host 192.168.0.4 (the mask size for an IPv4 address is 32, the
565 mask size for an IPv6 address is 128 to represent a single host).
566
567 Here is another example
568
569 acl {
570 default = "allow"
571 access {
572 ip = 192.168.0.0
573 mask = 24
574 action = "deny"
575 }
576 access {
577 ip = ::ff:1.2.3.0
578 mask = 120
579 action = "deny"
580 }
581 }
582
583 This ACL will by default allow all traffic unless it comes from the two
584 subnets specified with action = "deny".
585
587 The default behavior for a 2.5.x gmond would be specified as...
588
589 udp_recv_channel {
590 mcast_join = 239.2.11.71
591 bind = 239.2.11.71
592 port = 8649
593 }
594 udp_send_channel {
595 mcast_join = 239.2.11.71
596 port = 8649
597 }
598 tcp_accept_channel {
599 port = 8649
600 }
601
602 To see the complete default configuration for gmond simply run:
603
604 % gmond -t
605
606 gmond will print out its default behavior in a configuration file and
607 then exit. Capturing this output to a file can serve as a useful
608 starting point for creating your own custom configuration.
609
610 % gmond -t > custom.conf
611
612 edit custom.conf to taste and then
613
614 % gmond -c ./custom.conf
615
617 gmond(1).
618
620 The ganglia web site is at http://ganglia.info/.
621
623 Copyright (c) 2005 The University of California, Berkeley
624
625
626
627ganglia/3.7.2 2020-02-10 gmond.conf(5)