This technical note explores the scalability of centralized logging with Performance Co-Pilot (PCP).
PCP supports multiple deployment architectures, based on the scale of the PCP deployment. The most common deployment architectures are described below.
Fully Distributed Setup¶
A way to setup decentralized logging is to run pmlogger(1) on each monitored host, which retrieves metrics from a local pmcd(1) instance. A local pmproxy(1) daemon imports the performance metrics into a central Redis database.
In cases where the resource usage on the monitored hosts is constrained, another deployment option is a pmlogger farm. In this setup, a single logger host runs multiple pmlogger(1) processes, each configured to retrieve performance metrics from a different remote pmcd(1) host. The centralized logger host is also configured to run the pmproxy(1) daemon, which discovers the resulting PCP archives logs and loads the metric data into a Redis database.
Federated pmlogger Farm¶
For large scale deployments, we advice deploying multiple pmlogger(1) farms in a federated fashion. For example, one pmlogger(1) farm per rack or data center. Each pmlogger farm loads the metrics into a central Redis database.
Redis Database Deployment¶
The Redis database can run in a clustered fashion, where data is sharded across multiple hosts (see Redis Cluster for more details). Another viable option is to deploy a Redis cluster in the cloud, or to utilize a managed Redis cluster from a cloud vendor.
For PCP versions before 5.3.0, pmlogger farm is the only supported and tested deployment architecture. Other deployment architectures might work, but are not officially supported.
Remote system size¶
The number of CPUs, disks, network interfaces and other hardware resources affects the amount of data collected by each pmlogger on the centralized logging host. In the measurements below, every remote system has 64 CPUs, one disk and one network interface. In these tests, the pmcd hosts are actually all instances of a container running pmcd, exposing only the pmcd tcp port.
The number and types of logged metrics play an important role.
In particular, the per-process
proc.* metrics require a large amount of disk space (e.g. with the standard pcp-zeroconf setup, 10s logging interval, 11 MB without proc metrics vs. 155 MB with proc metrics - a factor of 10 times more).
Additionally, the number of instances for each metric, for example the number of CPUs, block devices and network interfaces also impacts the required storage capacity.
The interval (how often metrics are logged), dramatically affects the storage requirements.
The expected daily PCP archive file sizes are written to the
pmlogger.log file for each pmlogger instance.
These values are uncompressed estimates (see pmlogger
-r option in pmlogger(1)).
Since PCP archives compress very well (approximately 10:1), the actual long term disk space requirements can be determined for a particular site.
sysctl and rlimit settings¶
When archive discovery is enabled, pmproxy requires 4 file descriptors for every pmlogger that it is monitoring/log-tailing, plus additional file descriptors for the daemon logs and pmproxy client sockets, if any. Each pmlogger process uses about 20 file descriptors for the remote pmcd socket, archive files, daemon logs and others. In total, this can exceed the default 1024 soft limit on a system running around 200 pmloggers. The pmproxy daemon in pcp-5.3.0 and later automatically increases the soft limit to the hard limit. On earlier versions of PCP, tuning will be required if a high number of pmloggers are to be deployed.
The pmlogger(1) daemon stores metrics of local and remote pmcds in
To control the logging interval, update the control file located at
/etc/pcp/pmlogger/control.d and add
-t X in the arguments, where
X is the logging interval in seconds.
To configure which metrics should be logged, run
To specify retention settings, i.e. when to purge old PCP archives, update the
/etc/sysconfig/pmlogger_timers file and specify
PMLOGGER_DAILY_PARAMS="-E -k X", where
X is the amount of days to keep PCP archives.
The pmproxy(1) daemon sends logged metrics from pmlogger(1) to a Redis database.
To update the logging interval or the logged metrics, see the section above.
Two options are available to specify the retention settings in the pmproxy configuration file located at
stream.expirespecifies the duration when stale metrics should be removed, i.e. metrics which were not updated in a specified amount of time (in seconds)
stream.maxlenspecifies the maximum number of metric values for one metric per host. This setting should be the retention time divided by the logging interval, for example 20160 for 14 days of retention and 60s logging interval (60*60*24*14/60)
Results and Analysis¶
The following results were gathered on a pmlogger Farm deployment, with a default pcp-zeroconf 5.3.0 installation, where each remote host is an identical container instance running pmcd(1) on a server with 64 CPU cores, 376 GB RAM and 1 disk attached (as mentioned above, 64 CPUs increases per-CPU metric volume).
The logging interval is 10s,
proc metrics of remote nodes are not included, and the memory values refer to the RSS (Resident Set Size) value.
Storage p. Day
per Day (In)
There are known memory leaks in pmproxy(1) in versions before 5.3.0, resulting in higher memory usage than expected.
As a workaround you can limit the memory usage of pmproxy by running:
systemctl edit pmproxy and set:
After saving the file, restart pmproxy by running
systemctl restart pmproxy.