Integration of historic groundwater data into the Continent Scale Geochemistry initiative

ABSTRACT The groundwater dataset developed by Angela Giblin comprises over 5000 samples across much of Australia and is a useful contribution to the mapping of groundwater chemistry across Australia (the ‘Continental Scale Hydrogeochemistry’ initiative). Sampling and analytical methodology used by Giblin differed from many other protocols. In particular, water samples were not filtered in the field, or indeed prior to analysis. However, given the number of samples and their potential utility, it would be advantageous to modify data from these samples so that they can be readily integrated with other studies. The combined data were initially sorted to ensure consistent detection limits. Data that did not pass a thorough QA/QC assessment were rejected. Correction, and where necessary removal, of data (owing to the analytical artefacts from solids contamination during analysis) was applied. The degree of contamination in bailed samples, and what data therefore to reject, is calculated using an algorithm developed and tested for Western Australian groundwaters. Geochemical changes between sampling and delayed analysis (days or weeks later) cannot be directly quantified. Other research predicted which elements should be most affected by this. This was further tested by comparing overlapping results from Giblin with more recent data from the same sites in Western Australia and Queensland. Results showed good agreement for salinities and major ions, and for the ratio indices (K:Na, etc.) determined from such data. Saturation Index calculations for sulfates gypsum, celestine and barite also closely matched between differing datasets. There was good agreement between datasets for F, HCO3, Si, V, As, Mo, Ba and U, and moderate agreement for Li, Cr and Au. Weak agreement for pH meant that saturation indices for carbonate minerals such as calcite, dolomite, magnesite, siderite and rhodochrosite did not align, and there was very poor agreement for P, Co, Ni, Cu, Zn, Pb, Mn and Fe. However, in general, these comparisons indicate groundwater to be a robust geochemical medium. Based on this study, this modified data should now be readily usable and ‘seamlessly’ comparable with other datasets. Combining data, across varying sources if necessary, allows hydrogeochemistry to be used to map geology, alteration, prospectivity and geomorphological factors from mine scale to the size of Australia.


Introduction
As the culmination of over 40 years of hydrogeochemical research, CSIRO is developing a Hydrogeochemical Atlas of Australia, as a component of the 'Continental Scale Hydrogeochemistry' Initiative ( Figure 1). This aims to integrate useful groundwater chemical data from varying sources, with some specific outcomes expected being: groundwater chemistry as an input into geological modelling; discrimination of basin and crystalline rock aquifers, and the capacity to create mixing models; geochemical 'background' for mineral prospecting using groundwater; and establishment of environmental background hydrogeochemistry.
A critical input to this compilation will be the dataset owing to the late Ms Giblin, from CSIRO. Over a long and productive career, Angela Giblin and co-workers (including this author) did extensive groundwater sampling across Australia (e.g., Deutscher, Mann, & Giblin, 1980;Giblin, 1996;Giblin & Dickson, 1984;Giblin & Mazzucchelli, 1996;Pirlo & Gibson, 2004; Figure 1), including investigations into isotope geochemistry (Giblin & Dickson, 1997). She worked closely with companies, consultants and geological surveys, and a summary of her work was published but not widely known (Giblin, 2001). Ms Giblin sadly passed away in 2011. This paper, and the accompanying data access, aims to enable the Giblin dataset to be more widely circulated and combined with other compilations.
In any data compilation, the sample media and analytical data need to be readily comparable. Giblin (2001) did not filter waters in the field or prior to analysis-instead, the sampling protocols were developed as fit for purpose for the mineral exploration industry. Other groundwater experts (such as groups within the US EPA) have also encouraged usage of unfiltered waters for monitoring contaminants in specified circumstances (Vail, 2013). Regardless of the viewpoint on this debate, if these data are to be incorporated with other results from filtered samples within the Continental Scale database, then the differences in final results caused by these differing methods need to be understood. Based on such comparisons, the Giblin dataset is being manipulated to be consistent with other datasets based on filtered samples. These data will be publicly available (e.g., https://data.csiro.au/dap/ search?qDHydrogeochemistry).
Another issue for both Giblin's and Gray's research within CSIRO is that of bailed vs pumping water samples (Gray, Noble, Reid, Sutton, & Pirlo, 2016). Therefore, understanding the issues of contamination and influences on analytical results is essential.
By combining the Giblin data into one file, with consistent detection limits, correcting for sampling and analytical differences, and comparing with recent data, these data should now be readily usable and 'seamlessly' comparable. The results shown here also give strong support for the robustness of hydrogeochemistry with respect to most dissolved elements, for different times of sampling and differing methods. Finally, we will indicate how combining data across many sources allows hydrogeochemistry to be used to map geology, alteration, prospectivity and geomorphological factors from mine scale to the size of Australia. This move towards hydrogeochemical mapping of Australia, has been initiated by the author as 'Continental Scale Hydrogeochemistry' (Figure 1).

Sample collection and analysis protocols
Sample collections were varied (e.g., Giblin, 1996Giblin, , 2001) but dominantly came from exploration bores and farm wells and bores. Bailed samples were collected using a flow-through tube sampler that closed at the selected sample depth. If possible, groundwater was sampled at least 5 m below the water-table.
Three samples were collected at each site: 1. a 1 litre sample for Au, made alkaline with lime, 1.2 g of NaCN or KCN added, followed by a 1 g sachet of  Giblin (2001) samples, along with more recent sampling by CSIRO and Geological Surveys (Forbes et al., 2013;Gray, Reid, Dick, & Flitcroft, 2012, 2015, 2016, Geoscience Australia (GA) Curnamona (de Caritat et al., 2005) and Great Artesian Basin (GAB) sampling (Radke et al., 2000), and data downloaded and processed from State Government datasets -c, Gray & Bardwell, 2016a activated charcoal; after adsorption, the Au from a 1 litre sample is contained in 1 g of charcoal, with a 1000fold increase in concentration for subsequent analysis; 2. a 500 mL sample for other analyses; 3. a small sub-sample for field measurements, commonly conductivity, pH, Eh, Fe 2C and dissolved O 2 .
The 500 mL sample was returned to the North Ryde laboratories where, commonly, a portion was left for solids to settle, or centrifuged if necessary to obtain a clear extract. Various reports (e.g., Giblin, 1996) indicate that this extract was acidified with high-purity concentrated HNO 3 . In some reports, this step is not described, but it is possible this was still conducted. Filtering was not a part of the sample preparation.
Unacidified portions were analysed for Cl and SO 4 by ion chromatography, total CO 3 with a thermal conductivity analyser, F with an ion-selective electrode and, for some projects, As with a hydride generator system.
Commonly, a standard was included as every 10th sample, and a spiked sample as every 11th sample.
The charcoal sachets were washed with distilled water, dried and sent to Becquerel Laboratories for Neutron Activation Analysis for Au.
The combined dataset included 5543 samples, which was reduced to 5086 in the final processed data.

Initial meta-analysis
Data archives were checked, and where reports were found, they were checked for information on sample treatment and analysis protocols.
The data include several decades of sampling, and it is critical to remove issues related to differing detection limits. Table 1 lists the selected element detection limits in the corrected dataset. For some earlier datasets, detection limits for some elements were higher than these detection limits. These element suites in these datasets were deleted.

Removal of spurious data
Groundwater pH values less than 2.5 and greater than 11.5 were removed, as previous research (Gray, 2001) suggests such values to be potentially erroneous and/or indicating altered zones (e.g. acid mine drainage). The Lake Tyrrell data were not used: these were a specific series of seeps and surface waters, as well as groundwaters, and had high detection limits for most elements. The TASAR river samples were removed, as these are not groundwaters. Any other samples described as 'seeps' and 'sumps' were removed.
A common test of analytical accuracy is the electrical balance, which checks whether the calculated sum of positive charges equals the negative charges. For samples with all major-element data, the electrical balance (Bal) is calculated according to: Bal D In solution, cations will match anions, and the balance will be zero. Unless there are significantly high levels of other elements such as Al or Fe, a positive deviation indicates that a cation value(s) is erroneously high, and/or an anion value is erroneously low. Similarly, a negative balance indicates cation (s) being under-determined and/or erroneously high anion(s). For saline samples, a Bal value between -0.05 and 0.05 is deemed acceptable, with higher errors acceptable in fresher waters owing to other solutes becoming important. This is indicated by the red lines in Figure 2. Balances derived from the original data (small circles in Figure 2) indicate a small proportion of samples with unacceptable errors, and each of these samples was investigated to see if particular values were erroneous and could be taken out of the dataset, or whether the entire data point needed to be removed.
Of particular note is the 'RU' sample set, which shows a systematic cation excess. When checked, measured Na concentrations appeared higher than would be expected from the field conductivity and relative to other cations, compared with other groundwaters. It is probable there was an analytical error with regard to Na, and so the Na data were excised from the 'RU' samples.
Similar tests were done on all erroneous samples. In some samples, there appeared to be transcription errors for a particular ion value. If that particular, potentially erroneous, value was then changed by a factor of 10, the new calculated balance was close to zero. In these cases, that particular value  Figure 2 represent the final results after error checking. Alkalinity was checked against pH. Alkalinity will be below detection when pH is below 4.5, and generally detectable when pH > 5. If one parameter was obviously wrong, that value was removed. If it was not obvious which parameter was incorrect, both the pH and alkalinity values were removed. Additionally, high alkalinity values (>150 mg/L HCO 3 equivalent) for pH < 5.3 are not thermodynamically possible, and in these cases, the alkalinity data were removed.
The 'JA' sample set commonly had Si values greater than 50, possibly because these are measured as SiO 2 rather than Si. This could not be confirmed, so all Si values were removed for that sample set.
Dissolved Ga data were removed, as it commonly has an analytical interference from Ba using ICP-MS. Any other observed erroneous values were removed. For example, one sample had a pH of 3.5, and an Eh of -350 mV. This appears highly unlikely (i.e. water would be unstable relative to H 2 ) and possibly is a transcription error(s)-both the pH and Eh values were removed.
Note that these changes represent a small proportion (<2%) of the data.
Salinity is measured as total dissolved solids (TDS) and would generally be calculated as the sum of all major ions (i.e. first formula below; the 0.49£ factor for HCO 3 is the ratio in molecular weights to account for two HCO 3 to one CO 3 during drying). However, in a few cases, not all major-element data were available. To deal with this, the calculated TDS was plotted against other partial data (e.g., Cl C SO 4 C HCO 3 ) for the samples with complete major element data. The best linear correlation was for (Cl C SO 4 C HCO 3 ) vs TDS, so where cation data were incomplete, TDS was calculated as (Cl C SO 4 C HCO 3 ) / 0.635, and so on. Therefore, salinity was determined, in order (number of samples calculated in parentheses), according to:

Effect of suspended solids in analyses
Groundwater samples were not filtered prior to analysis, leaving the possibility of analytical contamination from suspended solids. Below pH 4.5, Al concentrations are expected to be above detection and commonly increase with acidity (red circles in Figure 3), as previously observed (Gray, 2001). However, sporadic high to very high 'dissolved' Al are also observed for neutral groundwaters (blue squares in Figure 3). These high Al results correlate with other high 'dissolved' values for elements such as Ti ( Figure 4). Titanium is expected to have very low dissolved concentrations across the pH range observed, and indeed the 'normal' high Al in acidic groundwaters have dissolved Ti below detection (red dots in Figure 4). In contrast, for the circum-neutral samples, high 'dissolved' Ti correlates with Al, and the Ti:Al ratio matches results for surficial Australian materials (Figure 4; Cornelius, Robertson, Cornelius, & Morris, 2008;GSWA, 2014). Thus, neutral samples with high dissolved Al and Ti are postulated as representing solids contamination in the analysis (blue squares in Figure 4). Potential sources for such contamination are varied, including minerals from the aquifer, surficial soil falling into the bore or well during sampling, or colloidal materials, which can have similar Al:Si and Al:Fe ratios (Taylor, 1988) as observed for these data ( Figure 5). Analogous phenomena are observed for Si ( Figure 5). In surficial groundwater, dissolved Si can reach a maximum concentration of approximately 53 mg/L, at which stage water is saturated with respect to amorphous silica (Gray, 2001). High-Al samples designated as high solid contamination samples ( Figure 4) also had anomalous 'dissolved' Si levels (up to 320 mg/L; Figure 5). The slope of Si vs Al (solid line in Figure 5) approximately has a molar ratio of 1:1, as would be observed in clay minerals such as kaolinite [Al 2 Si 2 O 5 (OH) 4 ] or for particular Al-Si colloids (Taylor, 1988). Deviation from this trend could be due to other either Alor Si-rich phases.
Based on this hypothesis, a correction factor for dissolved Si ( Figure 6) in non-acid samples was derived:  where: Si-c is 'corrected' Si, Si-m is measured Si, and Al-m is measured Al. Because of various 'noise' factors such as variation in the contaminating solid phase, it was considered that this correction would not be sufficiently accurate at higher degrees of solid contamination, and Si data were removed for [Al-m] > 10 mg/L (solid vertical line in Figure 6). Given a median [Si-c] of 13.7 mg/L (Table 2), the changes to the remaining Si data were minor.
Measured Fe had a smaller response to solids contamination (Figure 7), with a correction of: Fe data were removed for [Al-m] > 5 mg/L (solid vertical line in Figure 6). The highest Fe correction (for [Al-m] D 5 mg/ L) is 2.5 mg/L. Thus, for these samples, if [Fe-c] is greater than 1 mg/L, data should still be usable. However, given the     (Table 2), effects on much of the lower concentration Fe data will be critical. Most trace elements seemed to have only minor solid contamination issues, with the strongest effect observed for V ( Figure 8; Table 2): At the cutoff of [Al-m] D 5 mg/L, this correction would be 0.006 mg/L of V. Given a V detection limit of 0.005 mg/L, this effect will be significant, but dissolved V data, albeit with lower precision, is considered still to be useful.
The solids contamination corrections are summarised in Table 2. Note that the magnitude of the correction, once data were culled, is commonly insignificant to minor. Lead and V show stronger effects, but data are potentially usable with caution. Dissolved Fe will be problematic below 0.1 mg/L. Dissolved Ti is not usable.

Element solubility changes in unfiltered samples
In-solution effects of not filtering are more difficult to determine. In normal groundwater sampling, waters are filtered (commonly to <0.45 mm) as quickly as possible and then acidified, so as to give as accurate a determination as possible for the element concentration in the groundwater prior to interaction with the atmosphere. The lack of filtering for the Giblin samples potentially creates changes between sampling and analysis. This is because secondary phases can form as the water equilibrates with the terrestrial atmosphere. For example, the lower partial pressure of the atmosphere can lead to degassing of CO 2 : This results in increased pH, and potentially the formation of secondary carbonates: These phases tend to be relatively pure, so they are not expected to have a strong effect on other elements, although they have the potential to reduce dissolved Mg, Ca, Sr or Ba. Based on geochemical modelling, in normal circumstances any effect will be minor, relative to the original dissolved compositions, with the possible exception of Ba. Other minor elements such as F and PO 4 could be affected by potential precipitation of fluorite and apatite, respectively.
Additionally, equilibration of groundwater sampled at depth with atmospheric O 2 can be a major issue if Fe and/or Mn are present in solution. These elements are generally present as divalent ions, which are precipitated when the groundwater is oxidised.
Oxidation can cause various changes in the water chemistry. Decreased pH could cause carbonates, and possibly other alkaline minerals such as oxides, to dissolve. Base metals (e.g. Cu, Zn, Pb), and other elements associated with Fe and Mn oxide surfaces such as PO 4 , could be affected in varying ways, depending on initial concentrations, pH and oxidation conditions. These elements could be released from surfaces if pH decreases. Conversely, a proportion could be co-precipitated with secondary Fe and Mn oxides.
This suggests that base metals and PO 4 could be strongly affected by leaving suspensions untreated until analysis. Gray (1993) tested filtering effects at Granny Smith, in the NE Yilgarn Craton, Western Australia. Groundwaters include both Fe-and Mn-rich samples, and salinities up to 7% (70 000 mg/L). Salinities and salinity controlled elements (e.g., Na, K, Mg, Ca, Sr, Cl, SO 4 ) show no differences between treatments. However, when waters were left unfiltered for a week, there was a major decrease (up to 90%) in Fe concentration (Figure 9), presumably owing to oxidation/hydrolysis (Eq 7). Dissolved Al was commonly, but not always, lower in the unfiltered samples ( Figure 10), with little variation for Mn.
Base metal concentrations tended to be higher in the unfiltered samples (possibly owing to release from the surface of solids), with the highest effect for Cu (Figure 11), then Zn, and Ni, with Co, Pb and Cd showing little difference. The elements Figure 9. Dissolved Fe for unfiltered vs filtered groundwaters from Granny Smith (Gray, 1993). The line is for equality between treatments. Figure 10. Dissolved Al for unfiltered vs filtered groundwaters from Granny Smith (Gray, 1993). The line is for equality between treatments.
Si, Ba, Cs, U, Au and oxy-anions As, Sb and Mo showed little difference between treatments. Other elements, including W, rare earth elements, Ag, Ti, Cr, Hg, Tl, Bi, Th and Ga, were too close to detection to make accurate checks but did not indicate any significant effects. These data are summarised in Table 3.
Data for filtered vs unfiltered groundwaters from the Atacama Desert (Leybourne & Cameron, 2008; Table 3) indicate a very similar loss of Fe and Al. However, there are significant differences in the behaviour of the base metals between the two sites: base metals are higher in the unfiltered treatments from Granny Smith groundwaters, suggesting release from surfaces of mineral grains and suspended solids, yet lower in the Atacama, suggesting adsorption. These differences are entirely feasible, owing to the potential differing behaviours depending on the pH-Eh of the solution, amounts of Fe and Mn in solution and mineralogy of the suspended solids.
A semi-quantitative estimate of the potential variation from not filtering is estimated by adding the variation for the two studies when in opposite directions and using the greater when the same (Table 3). The magnitude is Fe > Al D Pb > Cu >> Zn > Mn > Si > Ba > Co > As > V, and less than 20% for other elements listed.
In summary, there are two issues in comparing the Giblin data with other 'normal' analyses: the solids contamination for analysis, which we can compensate for; and the in-solution issue of solid-water interaction in unfiltered samples. An assessment of the magnitude of these issues leads to the initial assessments that: all dissolved Fe values below 0.1 mg/L will not be usable; dissolved Al data should not be used in non-acid samples; Pb and Cu should be viewed with caution; Mn and Zn are unlikely to be accurate.
Data to test for F and PO 4 were unavailable.

Derived indices
In addition to elemental data, various derived major ion ratio parameters have utility in hydrogeochemical modelling (Gray et al., 2016). These parameters are included with the Giblin data (Supplementary Papers). Figure 11. Dissolved Cu for unfiltered vs filtered groundwaters from Granny Smith (Gray, 1993). The line is for equality between treatments. Table 3. Estimated variation for unfiltered vs filtered treatments (as percentage difference) for Granny Smith (Gray, 1993) and Atacama, Chile (Leybourne & Cameron, 2008

Mineral saturation indices
Solution chemical speciation are determined from solution compositions using the program PHREEQE (Parkhurst, Thorstenson, & Plummer, 1980). This program can then calculate the degree of saturation with respect to specific minerals as Saturation Indices {SI; specifically log 10 [(ion activity product)/ (solubility constant)]}. If the SI for a mineral is within the zero range, the water is in equilibrium with that mineral, under the conditions specified. The zero range is estimated for every mineral based on stoichiometry, thermodynamic accuracy and analytical issues; ranging from -0.2 to 0.2 for major-element minerals such as halite or gypsum to (for example) -1.5 to 1.5 for a complex minor-element mineral such as carnotite (KUO 2 VO 4 ). Where the SI is below the zero range, the solution is under saturated with respect to that mineral, so that, if present, the phase may dissolve. If the SI is greater than zero, the solution is over-saturated with respect to this mineral, which could potentially precipitate from solution.

Element ratios
Comparing various major elements, some data showed element excess or deficit relative, to the ratio observed for sea water. The distance away from the sea water dilution/evaporation line was determined, and provided a numerical measurement of the excess or depletion. This was done for K, Mg, Ca and B with respect to Na, Mg and Sr with respect to Ca, Rb with respect to K, and Br and SO 4 ( Figure 12) with respect to Cl. At close scales (<1 km; Gray & Noble, 2006), sulfate excess was particularly important for evaluating changes related to weathering sulfide ore bodies in shallow groundwater. At broader sampling (> km spacing), sulfate excess is subdued and more related to faults and other geological structures (Gray et al., 2016).
The other major element indices are strongly controlled by lithology and hydrothermal alteration. For example, Sr relative to Ca is useful in distinguishing basic and acid lithologies (Gray et al., 2016). The derived formulas are listed below (all in mg/L except Rb in mg/L). Note that the ratio used in each equation is the relevant ratio between the two elements in sea water. Two different equations are used for each ratio calculation; the variant for lower salinity (e.g., Na < 500 mg/L for KNaSW, etc.) is derived so as to minimise issue for errors at low value of the denominator). Figure 12  Cl 500 mg/L The different calculation methods for lower ion concentrations are to minimise skewing data owing to analytical errors close to detection limits. At higher concentrations, these become a ratio difference, e.g. for Na > 500 mg/L: KNaSW D 2 means the K/Na sample ratio is 3 £ sea water KNaSW D 1 means the K/Na sample ratio is 2 £ sea water KNaSW D 0 means the K/Na sample ratio is equal to that of sea water KNaSW D -0.5 means the K/Na sample ratio is half that of sea water KNaSW D -0.75 means the K/Na sample ratio is onequarter that of sea water KNaSW D -0.95 means the K/Na sample ratio is one twentieth that of sea water These indices can be considered a form of major-element 'signature.' While care should be taken in interpretation, as a change in ratio can be due to a variety of factors, the variations in these ratios are readily mappable and can indicate geological/geomorphological transformations.

WA and Queensland samples used for comparison
Samples from the Giblin dataset overlapped with two other datasets: the north Yilgarn (Western Australia) groundwater sampling (Gray et al., 2016), with 221 sites sampled in common, which has good meta-data, and extensive analytical suites; north central Queensland, for which there are 103 sites sampled in common with Queensland government water sampling www.data.qld.gov.au).

Quantifying contamination at the sample site
Groundwater samples from both the Giblin dataset and the Gray et al. (2016) sampling in WA (hereafter denoted as the Gray dataset) involve a mix of pumping bores and/or stagnant wells. Stagnant wells could have organic or anthropogenic contaminants, and this needs to be accounted for in comparisons of data (Gray et al., 2016). For hydrogeochemical sampling, the greatest concerns are additions from the decomposition of organic matter in the water and/or from metals from the pipe/bore materials. To determine which samples were contaminated and to what degree, samples were separated based upon whether they were free flowing or bailed, and a contamination value (CV) calculated, according to (Gray et al., 2016): CV D mean log 10 P 0:277 ; log 10 OC 6:94 ; log 10 Fe 0:234 ; log 10 Mn 0:059 ; log 10 Zn 0:546 ðall solutes in mg=LÞ: As described in detail in Gray et al. (2016), the CV was used to characterise samples into contamination factor (CF) classes: Actively flowing CF1 CV < -0.8 CF1.8 (uncontaminated) -0.8 < CV < -0.417 CF2.2 (very slightly contaminated) -0.417 < CV < -0.1 CF 3 (slightly contaminated) -0.1 < CV < 0.3 CF4 (contaminated) 0.3 < CV CF5 (highly contaminated) This relatively simple metric was able to effectively rank contamination effects for Yilgarn groundwaters (Gray et al., 2016). A broad summary of element sensitivity to contamination is given in Table 4.
This understanding is critical for effective comparison of the Giblin and Gray data. In Figure 13, red circles indicate samples that have low CV values and therefore very low contamination for both the Giblin and Gray sampling. Thus, these are bores that were probably actively pumping both times samples were collected. In contrast, purple circles in Figure 13 are samples that had a low CV (i.e. uncontaminated) when sampled for the Giblin dataset but were contaminated when sampled more than 10 years later by Gray et al. (2016). Similarly, the blue circles are samples with a high CV (i.e. contaminated) when sampled for Giblin, but with lower CVs for the Gray sampling. Grey circles are samples that were contaminated for both sampling. This colour coding is used for Figures 14-22.
This contamination coding is important in comparing data. Thus, when uncontaminated (red) and very slightly contaminated (green, yellow and orange) data are considered, there is very good agreement between salinity measured from the Giblin sampling and that for Gray sampling of the same bores ( Figure 14). However, when the Gray sample was highly contaminated (purple; Figure 14), TDS tends to be higher than for the same Giblin sample. Similarly, when the Giblin sample was contaminated (blue; Figure 14), the TDS for the Giblin sample tended to be higher than the other subsets. This is possibly caused by evaporation from open stagnant wells and/or release of ions by biological activity. Other major ions related to salinity (Na, K, Mg, Ca, Sr, Cl, SO 4 ) show a similar agreement between datasets, as did dissolved F.
There is a poorer agreement between pH values, with a variation of up to 1 pH unit or more. It is highly likely that Giblin and co-workers consistently measured pH in the field, with measurement errors likely to be <0.1 pH unit, suggesting that this variation between sample sets represents temporal and/or sampling effects. In contrast, alkalinity ( Figure 15) shows a good correlation between the datasets, although the Giblin data are marginally lower. This may reflect that HCO 3 were commonly determined by C analysis, whereas the comparative Gray data were determined by alkalinity titration. There were significant effects from contamination, with higher HCO 3 for contaminated (purple) samples. These comparisons between datasets yield very similar conclusions to statistical comparisons within the northern Yilgarn Craton sample set (Gray et al., 2016). Many minor and trace elements showed good to moderate agreements between datasets (e.g., Figures 16, 17), when allowing for contamination effects, again consistent with previous conclusions (Gray et al., 2016).
Some elements show extreme differences between the Giblin dataset and other sample sets. There was very poor agreement for dissolved Fe, between the Giblin and other datasets ( Figure 18). This may have various causes, including   Figure 13 shows colour coding.
temporal variation for an element strongly affected by pH, Eh and other factors, and/or the lack of pre-filtering for the Giblin dataset (Table 3). The importance of the pre-filtering issue is supported by the commonly higher field Fe 2C concentration values, relative to laboratory Fe, for the least contaminated samples (i.e. red dots; Figure 18) in the Giblin dataset. Less extreme effects were observed for Mn, although this is further complicated by (expected) effects from contamination (Table 4). Dissolved P also had poor agreement between sample sets, along with major increases in contaminated samples. Base metals Cu (Figure 19), Zn and Pb also showed poor agreement. These effects are expected for these elements on the basis of previous examination of the implications of not filtering after sampling. That is, those elements showing poor repeatability between sample treatment (Fe, Mn, P, Cu, Zn and Pb) are elements commonly active in surficial environments, highly sensitive to bore/well contamination, and already considered among the least robust for hydrogeochemistry (Gray et al., 2016).
If there are errors in analytical data, this could affect calculations of mineral solubility. Comparisons between datasets of the saturation index for sulfate minerals such as gypsum (CaSO 4 Figure 13 shows colour coding. for uncontaminated waters (red circles in Figure 20). This is consistent with internal testing by the author demonstrating that SI calculations for these minerals are insensitive to errors in pH, although requiring accurate Ca, SO 4 and HCO 3 determinations. In contrast, carbonate minerals such as calcite (CaCO 3 ; Figure 21), dolomite [CaMg(CO 3 ) 2 ] and magnesite (MgCO 3 ) have weaker agreement between sample sets, presumably owing to the poorer correlation for pH measurements.
Additionally, there is good agreement for all of the element ratio indices, although some are sensitive to contamination, as expected (Gray et al., 2016). For example, SO 4 :Cl index comparisons indicate SO 4 loss in highly contaminated samples (purple dots in Figure 22). When only uncontaminated data are used (as recommended by Gray et al., 2016), the agreement between datasets is very good.

Sample set comparisons and sampling sensitivity
Previous research indicated that major elements, and also oxy-anions such as Mo, may be little affected when the groundwater sample is not filtered (Table 3). Elements such as As, Ba and Si show moderate effects, and base metals such as Figure 17. Dissolved V concentration for Giblin vs other sampling. Figure 13 shows colour coding. Figure 18. Dissolved Fe concentrations for Giblin vs other sampling. Figure 13 shows colour coding.
Cu, Pb, Zn and particularly Fe were very sensitive to not filtering. Such indications are in broad agreement with the comparisons between sample sets: good agreement for TDS and majors, and the ratio indices (K:Na, etc.); Saturation Index calculations for sulfates gypsum, celestine, barite also showed close agreement; reasonable agreement for F, HCO 3 , Si, V, As, Mo, Ba and U; moderate agreement for Li, Cr and Au (although with similar ranges); weaker agreement for pH and therefore saturation indices for carbonate minerals calcite, dolomite, magnesite, siderite and rhodochrosite; poor agreement for P, Co, Ni, Cu, Zn and Pb; and manganese and particularly Fe show very poor agreement between datasets.   Figure 13 shows colour coding.
These results yield useful observations about the robustness of hydrogeochemistry in general. These comparisons were between data collected sometimes decades apart, with the Giblin samples left unfiltered for days or weeks before analysis. Suspended solids/colloids only affected a few of the samples, and the corrections were minor apart from a few, normally insoluble, elements such as Al and Ti. Despite differing sampling times (commonly decades apart), collection and water treatments, the concentrations of major elements, Si, HCO 3 , F and many trace elements (particularly those that occur as oxy-anions) match between the datasets, and therefore appear robust. This is an excellent result for assessing the long-term utility of hydrogeochemistry for lithological and prospectivity mapping.
Cobalt, Ni, Cu, Zn, Pb and particularly Fe and Mn are most readily affected by Fe and Mn oxy-hydroxide precipitation as O 2 enters the water sample after sampling, and have a poor repeatability between the Giblin and other datasets. Thus, Figure 22. SO4ClSW values for Giblin vs other sampling. Figure 13 shows colour coding.  Figure 13 shows colour coding. these elements cannot be included when combining the Giblin and other datasets, and need to be interpreted for each dataset independently. These elements are commonly problematic for any hydrogeochemical survey, being strongly affected by surficial reactions and sensitive to bore/well contamination (Gray et al., 2016). Data for these elements are unlikely to be dependable, and oxy-anions such as As, Mo or W yield more dependable results (Gray et al., 2016). Thus, for example, Cr gives a more dependable indicator for ultramafic rocks because it commonly occurs as the CrO 4 2oxy-anion (Gray, 2003), compared with Ni, which occurs in solution as Ni 2C .
Metrics for contamination (Gray et al., 2016) improve the robustness of the sample set comparisons significantly. For HCO 3 , there is a close correlation between datasets for uncontaminated samples (red circles in Figure 15). When the later sampling by Gray et al. (2016) was contaminated (pink-purple circles in Figure 15), the HCO 3 was higher than expected. Other elements are affected by contamination in contrasting ways. Dissolved V and SO 4 are lower than expected when samples are contaminated 22). All these observations of contamination effects match previous conclusions based on statistical analysis (Gray et al., 2016) and the contamination parameter CV is therefore used to filter out any data presumed to be altered by sample contamination.
Thus, these results give us further confidence in the validity of hydrogeochemical analysis, and support previous conclusions on the robust elements in hydrogeochemistry (Gray et al., 2016). The specific observations of this data set and the manipulations made enable us to combine much of the Giblin data with other datasets for further mapping and modelling.

Examples of data integration
This section shows some uses of the Giblin dataset for regional analysis, in conjunction with other data (Figure 1). This is not meant to be exhaustive but is intended to generally give a flavour of the utility of this dataset. In addition, the reader can also re-examine the Giblin reports and papers, or download the data, for more site-specific studies.

Northern Yilgarn
There is a significant overlap between recent CSIRO sampling (Gray et al., 2016) and the earlier sampling for the Giblin dataset ( Figure 1). Many of the groundwater parameters compare well, and spatially the data plot seamlessly (e.g. Figure 23). Strontium relative to Ca (i.e. the SrCaSW index) is very useful in discriminating granites from greenstones in this region (Gray et al., 2016), with high Ca:Sr over mafic rocks and high Sr:Ca of granites. The Giblin data work equally well and seamlessly with Gray et al. (2016) data, thus spatially extending the range of lithological discrimination (Figure 23). Figure 23. Ca relative to Sr (using the SrCaSW Index), for Giblin samples (triangles), along with more recent sampling (Gray et al., 2016) in the northern Yilgarn Craton, Western Australia (see Figure 1 for location). High Ca (blue symbols) correlates with greenstones and high Sr (orange and red) with granitoid lithologies.
Dissolved Cr is specifically enriched in ultramafic and other basic rocks (Gray, 2003). This effect is seen for both datasets. Combining U, V and Cr data into a combined 'Lithol1' index (Gray et al., 2016) can improve discrimination, and this is seamless across datasets ( Figure 24).

Northern Queensland
Northern Queensland includes data from the Giblin dataset, GA (Radke, Ferguson, Cresswell, Ransley, & Habermehl, 2000) and the Queensland State Government ; Figure 1), although only the Giblin dataset includes many of the trace and minor elements. Use of the SO4ClSW index (calculated for all three datasets; Figure 25) demonstrates a clear boundary between SO 4 -rich groundwaters within the Mount Isa Inlier and SO 4 -poor to very poor in the more recent basins to the east. Within the Mount Isa Inlier area, the Giblin data indicate major areas of high-U groundwaters (Figure 26). These may be lithological effects and/or may indicate potential for dissolved U as an indicator for mineral systems such as iron oxide copper gold (IOCG).
The northern granites (»19 S, 141-142 E) are characterised by particularly low SO 4 :Cl (Figure 25), very high dissolved F (>20 mg/L; Figure 27) and W (>100 mg/L; Figure 28) and moderate dissolved Cr (>5 mg/L), reflecting a distinct chemistry of the parent rocks. These elements alone are showing geochemical distinctions within a single rock group, which is enhanced using other elements and ion ratios.

Southern Australia
A final example integrates data from Giblin, CSIRO/GSSA sampling (Forbes et al., 2013), and SA State data  within SA, and with GA sampling in the Curnamona Province (de Caritat, Kirste, Carr, & McCulloch, 2005) straddling the SA/NSW border (Figure 1). Dissolved U is high in the Curnamona Province (Figure 29;de Caritat et al., 2005) owing to weathering of U-rich rocks. Solubility analyses indicate these groundwaters approach secondary U (carnotite) saturation, suggesting potential for secondary U in this region. The Forbes et al. (2013) and Giblin data show similar levels of dissolved U contents in the northern Stuart Shelf (Figure 29) at approximately 137 E/31 S. This suggests secondary U potential in this region, as well as indicating U-sources from the rocks themselves.
There are several high concentrations of dissolved Au in the Gawler and Stuart Shelf (Figure 30), as would be expected given the occurrence of IOCGs and economic Au. Such data at a regional scale provide lithological backgrounds useful for developing thresholds for exploration. Significant parts of the Giblin dataset are for Au and other commodity mine-sites, Figure 24. Calculated Lithol1 Index for Giblin samples (triangles), along with more recent sampling (Gray et al., 2016) in the northern Yilgarn Craton, Western Australia (see Figure 1 for location). High Lithol1 (blue symbols) indicates V and/or Cr excess, and correlates with greenstones. Negative Lithol1 (orange and red) indicates U excess and correlates with granitoid lithologies. Figure 25. SO4ClSW Index data (using the same scaling as shown in Figure 12) for Giblin, GAB (Radke et al., 2000) and Queensland  datasets, northern Queensland (see Figure 1 for location).    (Radke et al., 2000) and Queensland groundwater  datasets, northern Queensland (see Figure 1 for location).     (Forbes et al., 2013), GA (de Caritat et al., 2005), and SA State  datasets, southern Australia (see Figure 1 for location).
which also provide useful data for developing exploration protocols across Australia.
Other elements such as Mo ( Figure 31) are high in specific regions, such as the western Curnamona and sporadically on the Stuart Shelf. Such data for varying elements such as As and W will become useful for lithological discrimination and detection of hydrothermal dispersion (e.g., Gray et al., 2016).
The Curnamona Province has waters reaching gypsum saturation (Figure 32), suggesting S sources. This effect is more marked for the Stuart Shelf. Gypsum saturation can arise from various processes: 1. higher salinity (i.e. >3.5%) owing to evaporation (not expected in this region); 2. dissolution of gypsum already within the sediments; or 3. high S owing to release from sulfides.
Thus, these data again suggest that the Stuart Shelf groundwaters have anomalous characteristics. These data are available to readers and can be examined in more detail as an input to exploration in the region.

Conclusion
The groundwater dataset developed by Angela Giblin comprises over 5000 samples across much of Australia, and is a useful component of the 'Continental Scale Hydrogeochemistry' initiative. The sampling and analytical methodology differed from that of many other protocols (Gray et al., 2016). In particular, water samples were not filtered in the field, or indeed prior to analysis. However, given the number of samples, analyses for minor and trace elements, and some of the specific regions targeted, it is important to continue to use the data from these samples.
Integration of all data into one file with consistent detection limits was completed. Data were subject to a thorough QA/QC assessment. Correction, and where necessary removal, of data (owing to the analytical artefacts from solids contamination during analysis) was applied. The degree of contamination in bailed samples, and what data therefore to reject, is calculated using an algorithm developed and tested for Western Australian groundwaters. Geochemical changes between sampling and delayed analysis (days or weeks later) cannot be directly quantified. Other research predicted which elements should be most affected by this. This was further tested by comparing overlapping results from Giblin with more recent data from the same sites in Western Australia, South Australia and Queensland. Results closely followed predictions-a number of parameters compared well, whereas others compared badly. Salinity and major elements were good, as were elements that occur as oxy-anions in solution. The poorest results were for Cu, Zn, Pb and P, and particularly Mn and Fe, though most other elements showed good agreement.
Utilising this information, the Giblin data are being suitably combined with other datasets, to map the groundwater chemistry of major regions within Australia. This successful integration of data treated differently from many 'standard' methods indicates the surprisingly robust nature of hydrogeochemical data (when specific issues are dealt with) and insensitivity to temporal variation. It enhances our present data resources in hydrogeochemistry and encourages the integration of other sources.