Some Woody solutions such as IN2IT exchange, IN2IT social and IN2IT live can be deployed in a clustered environment, to enable data replication and load balancing across multiple nodes.
This article describes how clustering works in Woody environment.
To learn more about installation of a clustered environment, please refer to Woody installation guide:
TABLE OF CONTENTS
- Main concepts in a clustered Woody environment
- Prerequisites to deploy a clustered Woody environment
- Examples of deployments
- Load balancing
- Data replication and failover
- High availability
- Working with resource pools
- Rebooting a Woody cluster
Main concepts in a clustered Woody environment
- Node: A node is a server hosting Woody software. If there is more than one node in your installation, you can build a cluster (a farm). In a cluster, the same software is installed on every node. The role in the cluster is determined during the execution of the IN2IT installation wizard.
- First node: This server is the primary entry point for front-end connections (access to Woody UI, configuration pages, API,...) and the primary (active) database. The first node is also handling the directory of the cluster and the messaging across all IN2IT nodes, including jobs load balaning. In a cluster, the first node is also executing jobs, like all other nodes.
- Additional node: An additional node is an IN2IT server configured to join an existing cluster. It will execute jobs and can host a replica (passive) instance of the database for data replication. An additional node hosting a replica database can become First node by running IN2IT wizard again.
- Database: IN2IT solutions rely on ArangoDB as database. ArangoDB is installed automatically with Woody software. The database contains all configurations (profiles, watchfolders, setup, users...) and some information about the executed workflows and jobs. In a cluster, only one node runs the active database. However, this database can be replicated on additional nodes.
- IN2IT wizard: This is the utility handling IN2IT deployment and configuration. This utility is run during MSI package installation. It can also be run afterwards to change the configuration. IN2IT wizard also allows to execute maintenance tasks (config export, license update) without doing a full installation run.
- Cluster endpoint: The cluster DNS hostname is used by clients to access IN2IT user interfaces and API.
- Cluster IP address: The cluster IP address is used by additional nodes to communicate with the first node.
Prerequisites to deploy a clustered Woody environment
A few precautions are necessary before deploying a clustered environment.
- Please make sure that the licenses you have purchased allow clustered deployment. A standalone license will not allow the deployment of a cluster.
- A SMB shared storage space is necessary to store temporary files. This shared space needs to be accessible permanently by all nodes. Ideally it should not be located on one of the nodes, but preferraby on a third party storage. For some workflows media will be written on the temp storage (i.e. download before processing or processing then upload).
- It is recommended, but not mandatory, to define a DNS alias for the cluster to make its management easier, and to create a SSL certificate for the access to IN2IT user interfaces. You can find the procedure to create and import a certificate here: Install custom SSL Certificate for your Woody installation.
- It is recommended, but not mandatory, to run all Woody nodes on similar hardware. At the moment, it not possible to define a number of concurrent processing streams independently for each node (the same value apply to all nodes)
- All nodes must run the same version of the IN2IT software. we can't have differents versions in the same cluster.
Examples of deployments
Click to enlarge the diagram
Front-end connections, watchfolders and jobs are load balanced across the available nodes in a clustered environment.
- Front-end and API client connections are redirected to one of the cluster nodes thanks to the reverse proxy integrated in Woody installation.
- Watchfolders are distributed equally on all available nodes, when the first node starts. If you start additionals nodes after the first node, they won't run watchfolders untill you restart it.
- Jobs triggered from user interfaces, watchfolders and API are exectuted on one of the available nodes.
Please note that for one job, a step can be executed by a node and the next step can be executed by another node, depending on the current workload.
The maximum number of simultaneous streams for each node can be configured in the setup tab of the configuration. The same value apply to all nodes.
Data replication and failover
A clustered installation allows various levels of failover depending on which node or component fails.
- If a processing component fails on any node, the ongoing task (step) will be automatically re-submitted to another component of same type on another node.
- If an additional node fails, all ongoing and future tasks will be automatically re-submitted to another node, and front-end connections will not be redirected to this node anymore. Watchfolders will be automatically restarted on another node.
- If the First node fails, a manual intervention is necessary to retrieve a running state of the cluster. This consists in making one of the additional nodes hosting a replica database to become the first node of the cluster. The procedure to perform this operation is available here: Woody database failover procedure. In this case, configuration data and jobs history will be preserved. However, running tasks and watchfolders will require a manual restart.
We call high availability the possibility to ensure operations continuity whatever node fails, without manual intervention.
In this architecture, the system itself is electing a First node.
This is not possible in the current Woody version.
Our software architecture team is working to provide this feature soon. This feature will be available for clusters of 3 nodes and more.
Working with resource pools
In a clustered installation, all servers can belong to the same pool (default), or you can create different pools, that means different groups of servers in the same farm.
This feature is useful if you want to dedicate some servers of the farm to process certain jobs in particular.
At the moment, pools are only available for jobs triggered through API integration. The pool name can be given as parameter in API jobs. This is not a mandatory parameter.
This feature will be extended to other type of jobs soon.
Rebooting a Woody cluster
A dedicated article describes how to reboot a Woody cluster: