As the world transitions to use more intermittent renewable energy sources such as wind and solar in response to global warming, consuming electricity when renewable energy is available becomes beneficial. This applies across sectors, but aligning computing with low-carbon energy generation is particularly critical. The ICT infrastructure needed for computing – including data centres, devices, and networks – consumes massive amounts of energy, already rivalling aviation at 2-3% of global consumption. Around a third of this can be attributed to cloud data centres and more workloads are expected to migrate to the cloud in the future.
Clouds provide uniquely flexible compute environments, particularly for scalable batch processing applications like data analytics, machine learning, and scientific workflows. The elasticity of these environments and applications allows the shaping of computational loads to closely match the availability of low-carbon energy. That is, applications can be scheduled and scaled to use most cloud resources when the carbon intensity of energy grids is lowest.
Carbon-aware cloud computing has been proposed repeatedly over the last few years, and the emission savings are expected to be substantial. However, the potential has so far not been realised beyond simulations, laboratory experiments, and specific internal applications. Much of the work to date ignores the need for fine-grained insights into application performance – such as runtimes and scalability of individual computational steps and overhead for pausing applications while carbon intensity is high. In addition, the basic idea has been criticised for ignoring the risk of further increasing peak cloud loads, with negative economic and environmental consequences.
Addressing these challenges, Casper aims to develop the first Stepwise Performance-, Interruption-, Resource-, Carbon-Aware Scheduler (SPIRCS) for large scalable batch data processing applications in elastic compute clusters. For this, we will research how the availability of spare resources in public and private clouds can be optimally anticipated to reduce the footprint of delay-tolerant applications (WP1). We will furthermore develop fine-grained performance models that accurately capture how individual processing steps of large-scale batch data processing applications can be executed, paused, and resumed to maximise cloud usage when low-carbon energy is available (WP2). Finally, we will implement these results in one integrated SPIRCS system for data analytics and workflow applications (WP3) and will evaluate its effectiveness in use cases covering both public and private cloud services (WP4).
Our objectives are to develop the methods needed for reducing the emissions of large scalable batch data processing applications in clouds by 10-25% (Objective A), implement a prototype for widely used open-source cluster computing frameworks (Objective B), and evaluate the prototype on top of public cloud and private cloud services (Objective C). In addition, we will strive to raise awareness of cloud application carbon footprints and support overall demand-side management, sharing, for example, how spare cloud capacity aligns with low-carbon energy availability.
To ensure that Casper is set up to make pioneering contributions beyond the confines of a university laboratory, we will actively engage with our partners, who represent two different routes to real-world impact: Where AWS and the BBC bring in the viewpoints and expertise of a major public cloud provider and user, HU Berlin does so for a private cloud managed and used by a major research organisation.