<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:contributor>Tom Augspurger</dc:contributor>
  <dc:contributor>Anderson Banihirwe</dc:contributor>
  <dc:contributor>Charles C. Blackmon-Luca</dc:contributor>
  <dc:contributor>Timothy Crone</dc:contributor>
  <dc:contributor>Chelle Gentemann</dc:contributor>
  <dc:contributor>Joseph Hamman</dc:contributor>
  <dc:contributor>Naomi Henderson</dc:contributor>
  <dc:contributor>Chiara Lepore</dc:contributor>
  <dc:contributor>Theo McCaie</dc:contributor>
  <dc:contributor>Niall Robinson</dc:contributor>
  <dc:contributor>Richard P. Signell</dc:contributor>
  <dc:creator>Ryan Abernathey</dc:creator>
  <dc:date>2021</dc:date>
  <dc:description>&lt;div class="abstract-text row"&gt;&lt;div class="col-12"&gt;&lt;div class="u-mb-1"&gt;&lt;div&gt;Scientific data have traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow toward the petabyte scale. A “cloud-native data repository,” as defined in this article, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access and inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing’s full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</dc:description>
  <dc:format>application/pdf</dc:format>
  <dc:identifier>10.1109/MCSE.2021.3059437</dc:identifier>
  <dc:language>en</dc:language>
  <dc:publisher>IEEE</dc:publisher>
  <dc:title>Cloud-native repositories for big scientific data</dc:title>
  <dc:type>article</dc:type>
</oai_dc:dc>