"Equal work results in equal execution time" is an assumption that has fundamentally driven design and implementation of parallel applications for decades. However, increasing hardware variability on current architectures (e.g., through Turbo Boost, dynamic voltage and frequency scaling or thermal effects) necessitate a revision of this assumption. Expecting an increase of these effects on future (exascale-)systems, in this paper, we present reactive work stealing across nodes on distributed memory machines using only MPI and OpenMP. We develop a novel distributed work stealing concept that - based on on-line performance monitoring - selectively steals and remotely executes tasks across MPI boundaries. This concept has been implemented in the parallel adaptive mesh refinement (AMR) framework sam(oa) 2 for OpenMP tasks of traversing a grid section. Corresponding performance measurements in the presence of enforced CPU clock frequency imbalances demonstrate that a state-of-the-art cost-based (chains-on-chains partitioning) load balancing mechanism is insufficient and can even degrade performance while distributed work stealing successfully mitigates the frequency-induced imbalances. Furthermore, our results indicate that our approach is also suitable for load balancing work-induced imbalances in a realistic AMR test case.