2.2.3. Parsl and RADICAL-Pilot Integration
RADICAL-Pilot (RP) is a runtime system that enables the execution of heterogeneous (funtions and executables) MPI workloads on heterogeneous (GPUs and CPUs) HPC resources. The integration of Parsl and RP allows RP to benefit from Parsl flexible programming model and its workflow management capabilities to build dynamic workflows. Additionally, RadicalPilotExecutor benefits Parsl by offering the heterogeneous runtime capabilities of RP to support many MPI computations more efficiently.
For this tutorial we are required to update the existing default Parsl package with Parsl that has the integration files (Parsl-RP integration will be relased in Parsl soon).
[ ]:
%%capture capt
# remove the exisitng Parsl from conda
!conda remove --force parsl -y
# install a specific Parsl version
!pip install git+https://github.com/AymenFJA/parsl.git@master
Next we need to locate the installed nwchem
executable in our environment
[ ]:
# locate the NWChem executable path
nwchem_path = !which nwchem
nwchem = nwchem_path[0]
Gather the MongoDB server information and set the RADICAL_PILOT_DBURL
environment variable.
[2]:
%%capture capt
import os
mdb_host = os.environ.get('MDB_SERVER', 'mongodb')
mdb_port = os.environ.get('MDB_PORT', '27017')
mdb_name = os.environ.get('MDB_NAME', 'guest')
mdb_pswd = os.environ.get('MDB_PSWD', 'guest')
mdb_dtbs = os.environ.get('MDB_DTBS', 'default')
%env RADICAL_PILOT_DBURL=mongodb://$mdb_name:$mdb_pswd@$mdb_host:$mdb_port/$mdb_dtbs
2.2.3.1. Example: MPI NWChem Workflow
The following example application shows the execution of MP2 geometry optimization followed by a CCSD(T) energy evaluation at the converged geometry. A Dunning correlation-consistent triple-zeta basis is used. The default of Cartesian basis functions must be overridden using the keyword spherical on the BASIS directive. The 1s core orbitals are frozen in both the MP2 and coupled-cluster calculations (note that these must separately specified).
First, we need to write the NWChem
example to a file so that we can use it as an input for the NWChem
executable.
[ ]:
input = """
start n2
geometry
symmetry d2h
n 0 0 0.542
end
basis spherical
n library cc-pvtz
end
mp2
freeze core
end
task mp2 optimize
ccsd
freeze core
end
task ccsd(t)
"""
nwchem_input = '{0}/{1}'.format(os.getcwd(), 'mp2_optimization.nw')
with open(nwchem_input,'w+') as f:
f.writelines(input)
Now, we import the Parsl and RP Python modules in our application, alongside the RadicalPilotExecutor (RPEX) from Parsl
[3]:
import parsl
import radical.pilot as rp
from parsl.config import Config
from parsl.executors import RadicalPilotExecutor
RadicalPilotExecutor is capable of executing both functions and executables concurrently. The functions execution layer is based on the manager-worker paradigm. The managers are responsible for managing a set of workers and can execute function tasks as well. In contrast, the workers are only responsible for the function tasks execution. The manager-worker paradigm requires a set of input parameters for resource distribution, such as: 1. Number of managers and workers per node 2. Number of ranks per manager and worker. 3. Number of nodes per manager and worker. 4. Etc.
In order to specify this information, we create a configuration file rpex.cfg
that describes these parameters and pass it to RadicalPilotExecutor. In the cell below, we ask RadicalPilotExecutor to allocate 4 cores for all tasks.
[ ]:
# we ask Parsl to start the executor locally with 4 cores
rpex_cfg = 'configs/rpex.cfg'
config = Config(
executors=[RadicalPilotExecutor(
rpex_cfg=rpex_cfg, bulk_mode=True,
resource='local.localhost', login_method = 'local',
walltime=30, managed= True, cores= 4
)])
parsl.load(config)
Create a simple Parsl @bash_app
to invoke the NWChem
task. The bash_app
requires the type of the task and the number of cpu_processes
on which to run. In this case, the type of the task is MPI
, and the number of cpu_processes
is 2 MPI
ranks, where each rank takes 1 core.
Once the bash_app
(executable task) is invoked, the RadicalPilotExecutor submits the task to the runtime system and wait for them to be executed. RadicalPilotExecutor creates a designated sandbox
folder that contains the tasks and their stdout/stderr
files.
[ ]:
@parsl.bash_app
def nwchem_mp2_optimization(cpu_processes=2, cpu_process_type=rp.MPI):
return '{0} {1}'.format(nwchem, nwchem_input)
# invoke the nwchem_mp2_optimization
future = nwchem_mp2_optimization()
# wait for the results of the NWChem task.
if future.result() == 0:
print('Parsl task {0} finished'.format(future.tid))
# rp has a different task id than Parsl (task.id)
task_id = str(future.tid).zfill(6)
# RP tasks output located in the sandbox folder
task_path = '{0}/radical.pilot.sandbox/{1}/pilot.0000/task.{2}/task.{2}.out'.format(os.path.expanduser('~'),
config.executors[0].session.uid, task_id)
print task output from the task file
[ ]:
task_out = open(task_path, 'r').readlines()
print(''.join(task_out))
Finally, shutdown the executor, otherwise it will always stays ready to get more tasks
[ ]:
config.executors[0].shutdown()