Publication Details
Framework for Planning, Running and Monitoring Cooperating Computations
Realistic simulations need for their run very powerful computers. Computing infrastructures are growing in parallelism and becoming more diverse. This heads towards using more sophisticated computational techniques to take full advantage of the machine power. To describe a scientific problem, a number of different and cooperating models is used. This tends to force users to construct, execute, validate and analyse these models. The situation is much more complicated if the user is not an IT specialist. This causes a huge human effort to actions that might be out of a scientist's scope or could be provided automatically. This work presents a tool providing an automated planning, executing and monitoring cooperating and extensive computations. The approach used introduces the HPC as a service. Modular design enables extensions and unifies the access to different HPC systems through a simple client-server interface using standard web services. The dispatch server detects and enables concurrent execution of tasks and offers a level of fault tolerance.
Realistic simulations need for their run very powerful computers. Computing infrastructures are growing in parallelism and becoming more diverse. This heads towards using more sophisticated computational techniques to take full advantage of the machine power. To describe a scientific problem, a number of different and cooperating models is used. This tends to force users to construct, execute, validate and analyse these models. The situation is much more complicated if the user is not an IT specialist. This causes a huge human effort to actions that might be out of a scientist's scope or could be provided automatically. This work presents a tool providing an automated planning, executing and monitoring cooperating and extensive computations. The approach used introduces the HPC as a service. Modular design enables extensions and unifies the access to different HPC systems through a simple client-server interface using standard web services. The dispatch server detects and enables concurrent execution of tasks and offers a level of fault tolerance.
@INPROCEEDINGS{FITPUB11475, author = "Marta \v{C}udov\'{a}", title = "Framework for Planning, Running and Monitoring Cooperating Computations", pages = "20--23", booktitle = "Po\v{c}\'{i}ta\v{c}ov\'{e} architekt\'{u}ry \& diagnostika PAD 2017", year = 2017, location = "Bratislava, SK", publisher = "Slovak University of Technology in Bratislava", ISBN = "978-80-972784-0-3", language = "english", url = "https://www.fit.vut.cz/research/publication/11475" }