Migrate Oracle Linux Automation Manager to a Clustered Deployment
Introduction
Whether upgrading from a previous release or starting with a single host installation, both environments can migrate to a clustered deployment. Administrators need to plan their topology before migrating, as the cluster may consist of a combination of Control Plane and Execution Plane nodes and a remote database.
The following tutorial provides instructions for migrating a single host Release installation to a clustered deployment with a remote database.
Objectives
In this lab, you'll learn how to:
- Setup a remote database
- Migrate to a clustered deployment
Prerequisites
A system with Oracle Linux Automation Manager installed.
For details on installing Oracle Linux Automation Manager, see the Oracle Linux Automation Manager Installation Guide .
Verify Release Installation
Note: When using the free lab environment, see Oracle Linux Lab Basics for connection and other usage instructions.
Information: The free lab environment deploys a running Oracle Linux Automation Manager installation. The deployment takes approximately 15-20 minutes to finish after launch. Therefore, you might want to step away while this runs and promptly return to complete the lab.
Open a terminal and configure an SSH tunnel to the deployed Oracle Linux Automation Manager instance.
The Oracle Linux Automation Manager deploys in the free lab environment to the
control-node
VM host.ssh -L 8444:localhost:443 oracle@<hostname or ip address>
Open a web browser and enter the URL.
https://localhost:8444
Note: Approve the security warning based on the browser used. For Chrome, click the
Advanced
button and then theProceed to localhost (unsafe)
link.Login to Oracle Linux Automation Manager with the Username
admin
and the Passwordadmin
created during the automated deployment.After login, the WebUI displays.
Migrate to a Cluster Deployment
While Oracle Linux Automation Manager Release runs as a single host deployment, it also supports running as a cluster with a remote database and separate control plane and execution plane nodes. The installation configures the single-host instance as a hybrid
node. The first step in migrating to a cluster deployment is converting this instance to a control plane
node.
For more information on different installation topologies, see the Planning the Installation chapter of the Oracle Linux Automation Manager Installation Guide documentation .
Prepare the Control Plane Node
Switch to the terminal connected to the
control-node
instance running Oracle Linux Automation Manager.Note: From now on, we'll refer to this instance as the control plane node.
Stop the Oracle Linux Automation Manager service.
sudo systemctl stop ol-automation-manager
Create a backup of the database.
sudo su - postgres -c 'pg_dumpall > /tmp/olamv2_db_dump'
Install the Remote Database
Copy the database backup from the control plane node to the new remote database host.
scp /tmp/olamv2_db_dump oracle@10.0.0.160:/tmp/
The address of
10.0.0.160
is the internal IP address of the remote database host defined in the free lab environment. This connection is possible due to the free lab environment configuring passwordless ssh logins between the instances.Open a new terminal and connect via ssh to the
remote-db
instance.Use the external IP address reference on the Luna Lab Resources page. Direct connections using the above-referenced internal IP address of
10.0.0.160
are not possible from the Luna Desktop.ssh oracle@<hostname or ip address>
Enable the database module stream.
Oracle Linux Automation Manager allows the use of Postgresql database version 12 or 13. We'll use and enable the version 13 module stream in this lab environment.
sudo dnf -y module reset postgresql sudo dnf -y module enable postgresql:13
Install the database server.
sudo dnf -y install postgresql-server
Add the database firewall rule.
sudo firewall-cmd --add-port=5432/tcp --permanent sudo firewall-cmd --reload
Initialize the database.
sudo postgresql-setup --initdb
Set the database default storage algorithm.
sudo sed -i "s/#password_encryption.*/password_encryption = scram-sha-256/" /var/lib/pgsql/data/postgresql.conf
If interested in more details regarding this database functionality, see Password Authentication in the upstream documentation.
Update the database host-based authentication file.
echo "host all all 0.0.0.0/0 scram-sha-256" | sudo tee -a /var/lib/pgsql/data/pg_hba.conf > /dev/null
This additional line performs SCRAM-SHA-256 authentication to verify a user's password when connecting from any IP address.
Update the IP address on which the database listens.
sudo sed -i "/^#port = 5432/i listen_addresses = '"$(hostname -i)"'" /var/lib/pgsql/data/postgresql.conf
Start and enable the database service.
sudo systemctl enable --now postgresql
Import the database dump file.
sudo su - postgres -c 'psql -d postgres -f /tmp/olamv2_db_dump'
Set the password for the Oracle Linux Automation Manager database user account.
sudo su - postgres -c "psql -U postgres -d postgres -c \"alter user awx with password 'password';\""
This command sets the
awx
password topassword
. Choose a more secure password if running this command outside the free lab environment.Close the terminal window connected to
remote-db
, as that completes the necessary steps to set up the remote database.
Add the Remote Database Settings
Switch back to the control plane node terminal running on the
control-node
instance and reconnect if necessary.Add the remote database settings to a new custom configuration file.
cat << EOF | sudo tee /etc/tower/conf.d/db.py > /dev/null DATABASES = { 'default': { 'ATOMIC_REQUESTS': True, 'ENGINE': 'awx.main.db.profiled_pg', 'NAME': 'awx', 'USER': 'awx', 'PASSWORD': 'password', 'HOST': '10.0.0.160', 'PORT': '5432', } } EOF
Use the same password set previously for the
awx
database user account.Stop and disable the local database on the control plane node.
sudo systemctl stop postgresql sudo systemctl disable postgresql
Start Oracle Linux Automation Manager.
sudo systemctl start ol-automation-manager
Verify connection to the new remote database.
sudo su -l awx -s /bin/bash -c "awx-manage check_db"
The output returns the remote database version details if a connection is successful.
Remove the Local Database Instance
Removing the original local database is safe after confirming the connection to the remote database is working.
Remove the database packages.
sudo dnf -y remove postgresql
Remove the
pgsql
directory containing the old database data files.sudo rm -rf /var/lib/pgsql
Change the Node Type of the Control Plane Node
When converting to a clustered deployment, switch the single-host instance node_type from hybrid
to control
.
Confirm the current node type of the control plane node.
sudo su -l awx -s /bin/bash -c "awx-manage list_instances"
The output shows the
node_type
set to a value ofhybrid
.Remove the default instance group.
sudo su -l awx -s /bin/bash -c "awx-manage remove_from_queue --queuename default --hostname $(hostname -i)"
Define the new instance and queue.
sudo su -l awx -s /bin/bash -c "awx-manage provision_instance --hostname=$(hostname -i) --node_type=control" sudo su -l awx -s /bin/bash -c "awx-manage register_queue --queuename=controlplane --hostnames=$(hostname -i)"
Add the default queue name values in the custom settings file.
cat << EOF | sudo tee -a /etc/tower/conf.d/olam.py > /dev/null DEFAULT_EXECUTION_QUEUE_NAME = 'execution' DEFAULT_CONTROL_PLANE_QUEUE_NAME = 'controlplane' EOF
Update Receptor settings.
cat << EOF | sudo tee /etc/receptor/receptor.conf > /dev/null --- - node: id: $(hostname -i) - log-level: info - tcp-listener: port: 27199 - control-service: service: control filename: /var/run/receptor/receptor.sock - work-command: worktype: local command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner params: worker allowruntimeparams: true verifysignature: false EOF
Restart Oracle Linux Automation Manager
sudo systemctl restart ol-automation-manager
The conversion of the single-host hybrid node to a control plane node with a remote database is complete. Now we'll add an execution plane node to make this cluster fully functional.
Add an Execution Plane Node to the Cluster
Before the cluster is fully functional, add one or more execution nodes. Execution nodes run standard jobs using ansible-runner, which runs playbooks within an OLAM EE Podman container-based execution environment.
Prepare the Execution Plane Node
Open a new terminal and connect via ssh to the
execution-node
instance.ssh oracle@<hostname or ip address>
Install the Oracle Linux Automation Manager repository package.
sudo dnf -y install oraclelinux-automation-manager-release-el8
Disable the repository for the older release.
sudo dnf config-manager --disable ol8_automation
Enable the current release's repository.
sudo dnf config-manager --enable ol8_automation2
Install the Oracle Linux Automation Manager package.
sudo dnf -y install ol-automation-manager
Add the Receptor firewall rule.
sudo firewall-cmd --add-port=27199/tcp --permanent sudo firewall-cmd --reload
Edit the Redis socket configuration.
sudo sed -i '/^# unixsocketperm/a unixsocket /var/run/redis/redis.sock\nunixsocketperm 775' /etc/redis.conf
Copy the secret key from the control plane node.
ssh oracle@10.0.0.150 "sudo cat /etc/tower/SECRET_KEY" | sudo tee /etc/tower/SECRET_KEY > /dev/null
Important: Every cluster node requires the same secret key.
Create a custom settings file containing the required settings.
cat << EOF | sudo tee /etc/tower/conf.d/olamv2.py > /dev/null CLUSTER_HOST_ID = '$(hostname -i)' DEFAULT_EXECUTION_QUEUE_NAME = 'execution' DEFAULT_CONTROL_PLANE_QUEUE_NAME = 'controlplane' EOF
The
CLUSTER_HOST_ID
is a unique identifier of the host within the cluster.Create a custom settings file containing the remote database configuration.
cat << EOF | sudo tee /etc/tower/conf.d/db.py > /dev/null DATABASES = { 'default': { 'ATOMIC_REQUESTS': True, 'ENGINE': 'awx.main.db.profiled_pg', 'NAME': 'awx', 'USER': 'awx', 'PASSWORD': 'password', 'HOST': '10.0.0.160', 'PORT': '5432', } } EOF
Deploy the ansible-runner execution environment.
Open a shell as the
awx
user.sudo su -l awx -s /bin/bash
Migrate any existing containers to the latest podman version while keeping the unprivileged namespaces alive.
podman system migrate
Pull the Oracle Linux Automation Engine execution environment for Oracle Linux Automation Manager.
podman pull container-registry.oracle.com/oracle_linux_automation_manager/olam-ee:latest
Exit out of the
awx
user shell.exit
Generate the SSL certificates for NGINX.
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/tower/tower.key -out /etc/tower/tower.crt
Enter the requested information or just hit the
ENTER
key.Replace the default NGINX configuration with the configuration below.
cat << 'EOF' | sudo tee /etc/nginx/nginx.conf > /dev/null user nginx; worker_processes auto; error_log /var/log/nginx/error.log; pid /run/nginx.pid; # Load dynamic modules. See /usr/share/doc/nginx/README.dynamic. include /usr/share/nginx/modules/*.conf; events { worker_connections 1024; } http { log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; include /etc/nginx/mime.types; default_type application/octet-stream; # Load modular configuration files from the /etc/nginx/conf.d directory. # See http://nginx.org/en/docs/ngx_core_module.html#include # for more information. include /etc/nginx/conf.d/*.conf; } EOF
Update the Receptor configuration file.
cat << EOF | sudo tee /etc/receptor/receptor.conf > /dev/null --- - node: id: $(hostname -i) - log-level: debug - tcp-listener: port: 27199 - tcp-peer: address: 10.0.0.150:27199 redial: true - control-service: service: control filename: /var/run/receptor/receptor.sock - work-command: worktype: ansible-runner command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner params: worker allowruntimeparams: true verifysignature: false EOF
node:id
is the hostname or IP address of the current node.tcp-peer:address
is the Receptor mesh's hostname or IP address and port on the control plane node.
Start and enable the Oracle Linux Automation Manager service.
sudo systemctl enable --now ol-automation-manager.service
Provision the Execution Plane Node
Switch to the terminal connected to the control plane node running on the
control-node
instance.The provisioning step must be run on one of the control plane nodes within the cluster and applies to all clustered instances of Oracle Linux Automation Manager.
Define the execution instance and queue.
sudo su -l awx -s /bin/bash -c "awx-manage provision_instance --hostname=10.0.0.151 --node_type=execution" sudo su -l awx -s /bin/bash -c "awx-manage register_default_execution_environments" sudo su -l awx -s /bin/bash -c "awx-manage register_queue --queuename=execution --hostnames=10.0.0.151"
register_queue
takes aqueuename
to create/update and a list of comma-delimitedhostnames
where jobs run.
Register the service mesh peer relationship.
sudo su -l awx -s /bin/bash -c "awx-manage register_peers 10.0.0.151 --peers $(hostname -i)"
Verify the Execution Plane Node Registration
Switch to the terminal connected to the execution node running on the
execution-node
instance.Verify the Oracle Linux Automation Manager mesh service is running.
sudo systemctl status receptor-awx
Check the status of the service mesh.
sudo receptorctl --socket /var/run/receptor/receptor.sock status
Example Output:
[oracle@execution-node ~]$ sudo receptorctl --socket /var/run/receptor/receptor.sock status Node ID: 10.0.0.151 Version: +g System CPU Count: 2 System Memory MiB: 15713 Connection Cost 10.0.0.150 1 Known Node Known Connections 10.0.0.150 {'10.0.0.151': 1} 10.0.0.151 {'10.0.0.150': 1} Route Via 10.0.0.150 10.0.0.150 Node Service Type Last Seen Tags 10.0.0.151 control Stream 2022-11-06 19:46:53 {'type': 'Control Service'} 10.0.0.150 control Stream 2022-11-06 19:46:06 {'type': 'Control Service'} Node Work Types 10.0.0.151 ansible-runner 10.0.0.150 local
For more details about Receptor, see the upstream documentation .
Verify the running cluster instances and show the available capacity.
sudo su -l awx -s /bin/bash -c "awx-manage list_instances"
The output appears
green
once the cluster establishes communication across all instances. If the results appearred
, wait 20-30 seconds and try rerunning the command.Example Output:
[oracle@control-node ~]$ sudo su -l awx -s /bin/bash -c "awx-manage list_instances" [controlplane capacity=136] 10.0.0.150 capacity=136 node_type=control version=19.5.1 heartbeat="2022-11-08 16:24:03" [default capacity=0] [execution capacity=136] 10.0.0.151 capacity=136 node_type=execution version=19.5.1 heartbeat="2022-11-08 17:16:45"
That completes the migration of Oracle Linux Automation Manager Release to a clustered deployment.
(Optional) Verify the Cluster is Working
Refresh the web browser window used to display the previous WebUI, or open a new web browser window and enter the URL.
https://localhost:8444
The port used in the URL needs to match that of the SSH tunnel local port.
Note: Approve the security warning based on the browser used. For Chrome, click the
Advanced
button and then theProceed to localhost (unsafe)
link.Login to Oracle Linux Automation Manager again with the USERNAME
admin
and the passwordadmin
.After login, the WebUI displays.
Using the navigation menu on the left, click
Inventories
under the Resources section.In the main window, click the
Add
button and then selectAdd inventory
.On the
Create new inventory
page, enter the reqired information.For
Instance Groups
select the search icon to display theSelect Instance Groups
pop-up dialog. Click the checkbox next to theexecution
group and then click theSelect
button.Click the
Save
button.From the
Details
summary page, click theHosts
tab.From the
Hosts
page, click theAdd
button.On the
Create new host
page, enter the IP address or hostname of an available instance.In the free lab environment, we'll use the IP address of
10.0.0.160
which is the internal IP address of theremote-db
VM.Click the
Save
button.Navigate in the menu on the left, and click on
Credentials
.On the
Credentials
page, click theAdd
button.On the
Create New Credential
page, enter the required information.For the
Credential Type
click the drop-down menu and selectMachine
. That displays the credentialsType Details
.Enter a
Username
oforacle
and browse for theSSH Private Key
. Clicking theBrowse...
button displays anOpen File
dialog window.Right-click on the main window of that dialog and then select
Show Hidden Files
.Then select the
.ssh
folder and theid_rsa
file. Clicking theOpen
button causes the contents of the private key file to copy into theSSH Private Key
dialog box. Scroll down and click theSave
button.Navigate in the menu on the left and click on
Inventories
.From the
Inventories
page, click on theTest
inventory.From the
Details
summary page, click theHosts
tab.On the
Hosts
page, click the checkbox next to the10.0.0.160
host.Then click the
Run Command
button.From the
Run command
dialog, select theping
module from theModules
list-of-values and click theNext
button.Select the
OLAM EE (latest)
execution environment and click theNext
button.Select the
remote-db
machine credential and click theNext
button.A preview of the command will display.
After reviewing the details, click the
Launch
button.The job will launch and display the job
Output
page.If everything ran successfully, the output shows a
SUCCESS
message that Oracle Linux Automation Manager execution plane node contacted theremote-db
VM using the Ansibleping
module.
For More Information
Oracle Linux Automation Manager Documentation
Oracle Linux Automation Manager Training
Oracle Linux Training Station