Cluster Installation
Cluster setup
For production setups it is possible to run the Green Metrics Tool (GMT) in a cluster setup. You can find the current setup that our Hosted Service uses in the specification of our Measurement Cluster. Through this it is possible to use specialized hardware, benchmark different systems and offer Benchmarking as a Service (BaaS). It is advised to run your database and GMT backend on a different machine as these could interfere with the measurements. The cluster setup is designed to work on headless machines and job submission is handled through the database. Also job status can be sent via email. This is done so that once the machine is configured no manual intervention is needed anymore.
When submitting a job, a specific machine can be specified on which this job is then exclusively run. The configuration can be machine specific, as we want certain machines to have specific metric providers. We advise to not have unutilized machines running all the time for the case if a job might be submitted. Please think about switching off a machine when it is not used and power it up once a day or when the sun is shining.
There are two main ways to configure a cluster:
1) Systemd Service - Client mode
The tools/client.py
program is a script that should constantly be running and that periodically checks the database if a new job has been queued for this certain machine. If no job can be retried it sleeps for a certain amount of time set in the configuration file config.yml
:
client:
sleep_time_no_job: 300
sleep_time_after_job: 300
You can also set a time that the script should wait after a job has finished execution to give the system time to cool down. Please use the calibrate script to fine tune this value.
After running a job the client program executes the tools/cluster/cleanup.sh
script that does general house keeping on the machine. This is done in a batch fashion to not run when a benchmark is currently run.
To make sure that the client is always running you can create a service that will start at boot and keep running.
Create a file under: ~/.config/systemd/user/green-coding-client.service
:
[Unit]
Description=The Green Metrics Client Service
After=docker.target
[Service]
Type=simple
WorkingDirectory=/home/gc/green-metrics-tool/
ExecStart=/home/gc/green-metrics-tool/venv/bin/python3 /home/gc/green-metrics-tool/tools/client.py
StandardOutput=append:/var/log/green-metrics-client-service.log
Restart=always
RestartSec=30s
TimeoutStopSec=600
KillSignal=SIGINT
RestartKillSignal=SIGINT
FinalKillSignal=SIGKILL
[Install]
WantedBy=default.target
Then activate the service
sudo touch /var/log/green-metrics-client-service.log
sudo chown gc:gc /var/log/green-metrics-client-service.log
systemctl --user daemon-reload # Reload the systemd configuration
systemctl --ser enable green-coding-client # enable on boot
systemctl --user start green-coding-client # start service
systemctl --user status green-coding-client # check status
Create a file under: /etc/systemd/system/green-coding-client.service
:
[Unit]
Description=The Green Metrics Client Service
After=network.target
[Service]
Type=simple
User=gc
Group=gc
WorkingDirectory=/home/gc/green-metrics-tool/
ExecStart=/home/gc/green-metrics-tool/venv/bin/python3 /home/gc/green-metrics-tool/tools/client.py
StandardOutput=append:/var/log/green-metrics-client-service.log
Restart=always
RestartSec=30s
TimeoutStopSec=600
KillSignal=SIGINT
RestartKillSignal=SIGINT
FinalKillSignal=SIGKILL
Environment="DOCKER_HOST=unix:///run/user/1000/docker.sock"
[Install]
WantedBy=multi-user.target
Then activate the service
sudo systemctl daemon-reload # Reload the systemd configuration
sudo systemctl enable green-coding-client # enable on boot
sudo systemctl start green-coding-client # start service
sudo systemctl status green-coding-client # check status
You should now see the client reporting it’s status on the server. It is important to note that only the client ever talks to the server (polling). The server never tries to contact the client. This is to not create any interrupts while a measurement might be running.
After running a job the client program executes the tools/cluster/cleanup.sh
script that does general house keeping on the machine. This is done in a batch fashion to not run when a benchmark is currently run.
This script is run as root and thus needs to be in the /etc/sudoers
file or subdirectories somewhere. We recommend the following:
echo 'ALL ALL=(ALL) NOPASSWD:/home/gc/green-metrics-tool/tools/cluster/cleanup.sh ""' | sudo tee /etc/sudoers.d/green-coding-cluster-cleanup
sudo chmod 500 /etc/sudoers.d/green-coding-cluster-cleanup
2) Cronjob (DEPRECATED)
⚠️ We do not recommend using the cronjob implementation in production as it does not support temperature checking or system cleanups. This mode should only be used for local quick testing setups, when you cannot use NOP Linux. ⚠️
The Green Metrics Tool comes with an implemented queueing and locking mechanism. In contrast to the NOP Linux implementation this way of checking for jobs doesn’t poll with a process all the time but relies on cron which is not available on NOP Linux.
You can install a cronjob on your system to periodically call:
python3 PATH_TO_GREEN_METRICS_TOOL/tools/jobs.py project
to measure projects in database queuepython3 PATH_TO_GREEN_METRICS_TOOL/tools/jobs.py email
to send all emails in the database queue
The jobs.py
uses the Python faulthandler mechanism and will also report to STDERR in case of a segfault.
When running the cronjob we advice you to append all the output combined to a log file like so:
* * * * * python3 PATH_TO_GREEN_METRICS_TOOL/tools/jobs.py project &>> /var/log/green-metrics-jobs.log
Be sure to give the green-metrics-jobs.log
file write access rights.
Also be aware that our example for the cronjob assumes your crontab is using bash
.
Consider adding SHELL=/bin/bash
to your crontab if that is not the case.
General settings
Machine
When using the cluster you will need to configure the machine names in the machines
table in the database and set the corresponding value in the config.yml
:
machine:
id: 1
description: "My Machine Name"
base_temperature_value: 30
base_temperature_chip: "coretemp-isa-0000"
base_temperature_feature: "Package id 0"
The id
and the description
must be unique so that they do not conflict with the other machines in the cluster.
If you are using the NOP Linux setup with the client.py
service you must also setup the temperature checking. Find out what your system has when it is cool. You can either use our calibration script or just let the machine sit for a while until the temperature does not change anymore. Then set the value base_temperature_value
. It has no unit, but is rather just a value in degree (°). It should have the same unit as your output of sensors
on your Linux box.
Since we are using our lm_sensors provider → to query the temperature you must also set the base_temperature_chip
and base_temperature_feature
to query from. Refer to the provider documentation → for more details.
Profiling Machines
Machines that are intended to create a carbon profile as it would be seen in a user machine should have:
- Turbo Boost turned on
- Hyper Threading turned on
- DVFS turned on
- Allow C8-C0 states
This is the minium set we deem reasonable. Please note that this resembles a user machine the best. For server machines some of these configurations should be changed. For servers Hyper Threading is often turned on whereas DVFS is often turned off.
Benchmarking Machines
To have the most stable result benchmarking machines or if you want to use Container Energy Estimations based on CPU Utilization you should have:
- Turbo Boost turned off
- Hyper Threading turned off
- DVFS turned off
- Allow only C0 state
All of these settings can be tweaked best in the BIOS. Additionally for turning DVFS off we recommend booting the kernel with intel_pstate
CPU frequency driver deactived and using the acpi
one which allows for setting the userspace
govenor.
$ sudo nano /etc/default/grub
# Change this line
# GRUB_CMDLINE_LINUX_DEFAULT=""
# to
# GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable acpi=force"
$ sudo update-grub
$ cat <<EOF | sudo tee /etc/systemd/system/fix-cpu-frequency.service
[Unit]
Description=Fix CPU Frequency
[Service]
Type=oneshot
ExecStart=/usr/bin/cpupower frequency-set -g userspace
ExecStartPost=/usr/bin/cpupower frequency-set -f 2.1GHz
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
EOF
$ sudo systemctl enable fix-cpu-frequency
# Reboot here!
$ sudo systemctl start fix-cpu-frequency
$ cat /proc/cpuinfo | grep MHz # to check that frequency is fixed
Client
The GMT refers to client when it is talking about the settings for the client.py
settings only.
When using the client mode the cluster expects a Measurement Control Workload to be set to determine if the cluster accuracy has deviated from the expected baseline.
Please see Accuracy Control → for details.