Signed Off

ARC PROJECT MANAGEMENT

Beaches

last updated on Fri, 10/26/2007 - 12:33

Contents

General Information

Description

This project was for a public health research study, that was trying to determine the number of people who became ill after swimming at beaches in southern California.

We performed the necessary tasks to produce datasets for beach attendance and bacterial counts from the raw data the researches provided. The datasets we produced were subsequently used for modeling illness rates.

Purpose and goal of the project

The goals of this project were to:
  1. Convert two very large and piecemeal datasets into usable form
  2. Adjust the data to account for detection limits and sampling frequency
  3. Impute missing values so data would be available at a daily interval


Client contact

Gajapathi SharavanaKumar - Grad student - PSU Community Health (Gajas@pdx.edu)
Mitch Brinks - Grad Student - OSU (brinksm@yahoo.com)
Jan Semenza - Professor - PSU Community Health (503-725-8262, semenzaj@pdx.edu)

Personnel

Josh Caplan (data management)
Mahmoud El-Gohary (imputation)

Scope and Scale

Customer

PSU Community Healthe Department.

Users/ audience

The final product of this work (after modelling) was publicized in a peer-reviewed journal. Readers likely included academics and public health professionals from around the world.

Scope

The study directly pertains to southern California, though the results of the study will be available (and potentially of interest) nationally, and to some degree internationally.

Partners

The research team includes members from PSU's School of Community Health, OSU, UC Irvine's Department of Environmental Health, Science and Policy.

Architecture/Design

Project Specification

For attendence data:

  • Wrote a VB script to compile values from different years on a single sheet, where each site had a single column,and each day had a row.
  • Where two beaches had correlated attendance values (or two years were correlated at one beach), regression was used for imputation. Where there were small gaps in the data, cubic spline interpolation was used for imputation.
  • For beaches where only monthly totals were available, the mean daily attendance values at all other beaches were scaled such that the known monthly total would be acheived.

For bacterial data:

  • Wrote VB scripts to spread data values across all of dates of interest (where there would be blanks for dates with no data) and each monitoring station would have a separate column. The scripts also did the following:
  1. Where date fields included time, the month/day/year was extracted.
  2. Where values fell below detection limits, zero/half the limit replaced the orignial value (on separate sheets)
  3. Where two samples were taken on the same day, the values were averaged
  • Where two beaches had correlated bacterial counts, regression was used for imputation. Where there were small gaps in the data, cubic spline interpolation was used for imputation.


Requirements

Because datasets were given to us as Excel files, we used Visual Basic for Applications for data management and for performing simple calculations (everything but imputation). Matlab was used for imputing missing values.

Project home and resource storage

Files are stored in: I:/students/pojects/geodata/arc/beaches

Related Projects

None.

Project status

Timeline

As of early summer 2006, all of our tasks were completed. Modelling proceeded, and a manuscript will be sent for review once results become available and are interpreted.