<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:contributor>Aaron M. Jubb</dc:contributor>
  <dc:contributor>Samuel Saxe</dc:contributor>
  <dc:contributor>Emil D. Attanasi</dc:contributor>
  <dc:contributor>Alexei Milkov</dc:contributor>
  <dc:contributor>Mark A Engle</dc:contributor>
  <dc:contributor>Philip A. Freeman</dc:contributor>
  <dc:contributor>Christopher Shaffer</dc:contributor>
  <dc:contributor>Madalyn S. Blondes</dc:contributor>
  <dc:creator>Jenna L. Shelton</dc:creator>
  <dc:date>2021</dc:date>
  <dc:description>&lt;p&gt;&lt;span&gt;Understanding the geochemistry of waters produced during petroleum extraction is essential to informing the best treatment and reuse options, which can potentially be optimized for a given geologic basin. Here, we used the US Geological Survey’s National Produced Waters Geochemical Database (PWGD) to determine if major ion chemistry could be used to classify accurately a produced water sample to a given geologic basin based on similarities to a given training dataset. Two datasets were derived from the PWGD: one with seven features but more samples (PWGD7), and another with nine features but fewer samples (PWGD9). The seven-feature dataset, prior to randomly generating a training and testing (i.e., validation) dataset, had 58,541 samples, 20 basins, and was classified based on total dissolved solids (TDS), bicarbonate (HCO&lt;/span&gt;&lt;sub&gt;3&lt;/sub&gt;&lt;span&gt;), Ca, Na, Cl, Mg, and sulfate (SO&lt;/span&gt;&lt;sub&gt;4&lt;/sub&gt;&lt;span&gt;). The nine-feature dataset, prior to randomly splitting into a training and testing (i.e., validation) dataset, contained 33,271 samples, 19 basins, and was classified based on TDS, HCO&lt;/span&gt;&lt;sub&gt;3&lt;/sub&gt;&lt;span&gt;, Ca, Na, Cl, Mg, SO&lt;/span&gt;&lt;sub&gt;4&lt;/sub&gt;&lt;span&gt;, pH, and specific gravity. Three supervised machine learning algorithms—Random Forest, k-Nearest Neighbors, and Naïve Bayes—were used to develop multi-class classification models to predict a basin of origin for produced waters using major ion chemistry. After training, the models were tested on three different datasets: Validation7, Validation9, and one based on data absent from the PWGD. Prediction accuracies across the models ranged from 23.5 to 73.5% when tested on the two PWGD-based datasets. A model using the Random Forest algorithm predicted most accurately compared to all other models tested. The models generally predicted basin of origin more accurately on the PWGD7-based dataset than on the PWGD9-based dataset. An additional dataset, which contained data not in the PWGD, was used to test the most accurate model; results suggest that some basins may lack geochemical diversity or may not be well described, while others may be geochemically diverse or are well described. A compelling result of this work is that a produced water basin of origin can be determined using major ions alone and, therefore, deep basinal fluid compositions may not be as variable within a given basin as previously thought. Applications include predicting the geochemistry of produced fluid prior to drilling at different intervals and assigning historical produced water data to a producing basin.&lt;/span&gt;&lt;/p&gt;</dc:description>
  <dc:format>application/pdf</dc:format>
  <dc:identifier>10.1007/s11053-021-09949-8</dc:identifier>
  <dc:language>en</dc:language>
  <dc:publisher>Springer Link</dc:publisher>
  <dc:title>Machine learning can assign geologic basin to produced water samples using major ion geochemistry</dc:title>
  <dc:type>article</dc:type>
</oai_dc:dc>