Signed Off
Beaches
last updated on Fri, 10/26/2007 - 12:33
General Information
Description
This project was for a public health research study, that was trying to determine the number of people who became ill after swimming at beaches in southern California.We performed the necessary tasks to produce datasets for beach attendance and bacterial counts from the raw data the researches provided. The datasets we produced were subsequently used for modeling illness rates.
Purpose and goal of the project
The goals of this project were to:- Convert two very large and piecemeal datasets into usable form
- Adjust the data to account for detection limits and sampling frequency
- Impute missing values so data would be available at a daily interval
Client contact
Gajapathi SharavanaKumar - Grad student - PSU Community Health (Gajas@pdx.edu)
Mitch Brinks - Grad Student - OSU (brinksm@yahoo.com)
Jan Semenza - Professor - PSU Community Health (503-725-8262, semenzaj@pdx.edu)
Personnel
Josh Caplan (data management)
Mahmoud El-Gohary (imputation)
Scope and Scale
Customer
PSU Community Healthe Department.
Users/ audience
The final product of this work (after modelling) was publicized in a peer-reviewed journal. Readers likely included academics and public health professionals from around the world.
Scope
The study directly pertains to southern California, though the results of the study will be available (and potentially of interest) nationally, and to some degree internationally.
Partners
The research team includes members from PSU's School of Community Health, OSU, UC Irvine's Department of Environmental Health, Science and Policy.
Architecture/Design
Project Specification
For attendence data:
- Wrote a VB script to compile values from different years on a single sheet, where each site had a single column,and each day had a row.
- Where two beaches had correlated attendance values (or two years were correlated at one beach), regression was used for imputation. Where there were small gaps in the data, cubic spline interpolation was used for imputation.
- For beaches where only monthly totals were available, the mean daily attendance values at all other beaches were scaled such that the known monthly total would be acheived.
For bacterial data:
- Wrote VB scripts to spread data values across all of dates of interest (where there would be blanks for dates with no data) and each monitoring station would have a separate column. The scripts also did the following:
- Where date fields included time, the month/day/year was extracted.
- Where values fell below detection limits, zero/half the limit replaced the orignial value (on separate sheets)
- Where two samples were taken on the same day, the values were averaged
- Where two beaches had correlated bacterial counts, regression was used for imputation. Where there were small gaps in the data, cubic spline interpolation was used for imputation.
Requirements
Because datasets were given to us as Excel files, we used Visual Basic for Applications for data management and for performing simple calculations (everything but imputation). Matlab was used for imputing missing values.
Project home and resource storage
Files are stored in: I:/students/pojects/geodata/arc/beaches
Related Projects
None.
Project status
Timeline
As of early summer 2006, all of our tasks were completed. Modelling proceeded, and a manuscript will be sent for review once results become available and are interpreted.
