About

I graduated from the North China Electric Power University (NCEPU) with a BS in Network Engineering. I graduated from ''M2 Recherche'' in Computer Science at University of Paris-Sud. Then I began my PhD ''A parallel iterative solver for large sparse linear systems enhanced with randomization and GPU accelerator, and its resilience to soft errors'' at ''Laboratoire de Recherche en Informatique'' (LRI). After that, during two years I worked as a postdoctoral fellow at ''Laboratoire AstroParticule et Cosmologie'' (APC), I did research on High performance numerical techniques for Cosmic Microwave Background data analysis.

During my thesis, I worked on a parallel sparse iterative solver called pARMS (parallel Algebraic Multilevel Solver). At the algorithmic level, I improved this solver by implementing randomized algorithms, and evaluated its performance on supercomputers with large scale sparse problems (from SuiteSparse/PDE). The results showed an acceleration of the execution time by a factor of 10%.

At the architectural level, I tried to integrate GPU kernels into the sparse solver to improve its performance. The hybrid CPU/GPU solver reduced the execution time by about 25%-30%.

I also studied the resilience of pARMS to soft errors, I evaluated the negative effects of two soft fault models to the convergence of the solver. The experiments were carried out with a supercomputer using 400 cores on 2D elliptic problem of size up to 64×106.

I used one of these supercomputers (Edison located at NERSC, ranked 18th in TOP500 2014) to do performance benchmarks, I am familiar with HPC applications such as numerical libraries (LAPACK, MAGMA, SuperLU, etc) and their characteristics.

During my postdoctoral research, I studied the CMB map-making problem and the PCG solver in MIDAPACK library, which is a parallel software tool for high performance CMB data analysis. I implemented a two-level preconditioner (M2lvl) in PCG solver. I did some numerical tests on Cori (located at NERSC), where I compared the performance of the Block Jacobi preconditioner provided in PCG solver and the two-level preconditioner (M2lvl). I observed that the M2lvl preconditioner performs better than the Block Jacobi preconditioner. In general, the M2lvl preconditioner reduces the number of iterations required for convergence from 3 times to 5 times, furthermore it accelerates the convergence time from 2 times to 3 times.

Recent Interests

High Performance Computing (HPC)
Linear algebra libraries (solvers and preconditioners)
Parallel programming (MPI, OpenMP)
GPU computing (CUDA)
CMB data analysis
CMB map-making problem

Publications

MAPPRAISER: A massively parallel map-making framework for multi-kilo pixel CMB experiments
Hamza El Bouhargani, Aygul Jamal, Dominic Beck, Josquin Errard, Laura Grigori and Radek Stompor
Astronomy and Computing, Volume 39, Avril 2022
MIDAPack project on Github

A parallel iterative solver for large sparse linear systems enhanced with randomization and GPU accelerator, and its resilience to soft errors
Ph.D. Thesis, University of Paris-Saclay, France, September 2017

A Comparison of Soft-Fault Error Models in the Parallel Preconditioned Flexible GMRES
Evan Coleman, Aygul Jamal, Marc Baboulin, Amal Khabou and Masha Sosonkina
12th International Conference on Parallel Processing and Applied Mathematics, Lublin, Poland, September 2017

A Hybrid CPU/GPU Approach for the Parallel Algebraic Recursive Multilevel Solver pARMS
Aygul Jamal, Marc Baboulin, Amal Khabou and Masha Sosonkina
18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, Romania, September 2016

Using Ransom Butterfly Transformation in parallel schur complement-based preconditioning
Aygul Jamal, Marc Baboulin and Masha Sosonkina
In Federated Conference on Computer Science and Information Systems, Lodz, Poland, September 2015

Using random butterfly transformations in iterative methods for sparse linear systems
Master's degree internship report, Université Paris-Sud, Orsay, September 2014

Teachings

Polytech Paris-Sud

Architectures matérielles et parallèles (ET4)
→ Source codes for TP3 : tp3.tar.gz
→ Source codes for TP6 : tp6.tar.gz
→ Source codes for TP7 : tp7.tar.gz

→ sum_bench.cpp : sum_bench.cpp

Introduction aux bases de données (PEIP2)

Université Paris-Sud

Programmation impérative (L1)

Programmation impérative avancée (L1)

Hobbies

Robots, Drones