The infrastructure is managed using modern tools to ensure consistency, scalability, and ease of maintenance. This section focuses on the use of Ansible, Prometheus, and Grafana for automation and monitoring.
We use Ansible to automate tasks and manage the infrastructure:
- Setup and Updates: Servers, containers, and services are configured automatically using scripts called playbooks. These playbooks are like shell script (but in YAML) that the master server run on all inventories (list of managed server) using SSH. It does advanced things, eg. conditional actions based on variables / linux distributions or manual checks (eg. does this file exist?). Running a playbook is as easy as:
# user edits an arbitrary file which lists all hostnames and their respective IP
# The following playbook could be made to take that file, move it to the relevant server based on the service mapping, ensure the changes are applied, and restart it.
ansible-playbook ./add_hostnames_to_dns_server.yaml -i inventories/production_servers
- Version Control: All changes to playbooks are tracked in Git, making it easy to update or roll back configurations.
- Scalability: Onobarding new servers/containers or installing/configuring services is quick and consistent with reusable playbooks.
¶ Monitoring with Prometheus and Grafana
We use Prometheus to collect data and Grafana to display it:
- Prometheus: Tracks server health, resource usage, and network performance.
- Grafana: Shows this data in simple, customizable dashboards for easy monitoring.
- Server uptime and resource usage (CPU, RAM, disk).
- Container activity (current use) vs quotas (theoretical combined peak use) .
- Network traffic and speed.
We plan to document additional practices, such as:
- User access controls.
- Backup and recovery processes.
- Advanced container management workflows.
By using tools like Ansible, Prometheus, and Grafana, we simplify administration and improve reliability.