"Equal work results in equal execution time" is an assumption that has fundamentally driven design and implementation of 
parallel applications for decades. However, increasing hardware variability on current architectures 
(e.g., through Turbo Boost, dynamic voltage and frequency scaling or thermal effects) necessitate a revision of this assumption. 
Expecting an increase of these effects on future (exascale-)systems, in this paper, we present reactive work stealing 
across nodes on distributed memory machines using only MPI and OpenMP. We develop a novel distributed work stealing concept 
that - based on on-line performance monitoring - selectively steals and remotely executes tasks across MPI boundaries. 
This concept has been implemented in the parallel adaptive mesh refinement (AMR) framework sam(oa) 2 for OpenMP tasks of traversing a grid section. 
Corresponding performance measurements in the presence of enforced CPU clock frequency imbalances demonstrate 
that a state-of-the-art cost-based (chains-on-chains partitioning) load balancing mechanism is insufficient and can even degrade 
performance while distributed work stealing successfully mitigates the frequency-induced imbalances. 
Furthermore, our results indicate that our approach is also suitable for load balancing work-induced imbalances in a realistic AMR test case.