In Progress

ARC PROJECT MANAGEMENT

Tramp

last updated on Mon, 10/22/2007 - 11:51

Contents

General Information

Description

The TRAMP project (TexAQS II Radical Measurement Project) seeks to create a webpage where a multitude of specific users can access and add data from the TRAMP component of the TexAQS II air quality study. It will enable the users to merge, download, visualize, and get statistical analysis of the data they and others in the study have observed.

The TRAMP component of the TexAQS II air quality study contains a wide variety of information. The project seeks to enable users to access and share their data and the data others have. The database will also enable those involved in the study to use the advanced Geographical Information Systems to analyze their data. These tools will facilitate superior levels of interpretation of the information, than typically available.

Purpose and goal of the project

The purpose of the TRAMP project is to enable a multitude of users to make available and merge all of the data from the TexAQS II air quality study. The database will give the study participants access to the data, and allow for merging, downloading, along with visual and statistical analysis of the stored data.

The project seeks to be significantly superior to the traditional analysis of this extremely broad and dense data resource and speed the time to publication of this critical air quality data. It is believed that this database, with the integral GIS capabilities, will allow for new types spatio-temporal analysis of the chemical and physical data and for us to link our data with other census and epidemiological data.

Should the TRAMP data merge / analysis project prove effective, the TCEQ (Texas Commission on Environmental Quality) may seek to use this resource as a data merge for the entire regional study.

Client contact

PSU:
Monica Wright
wrightm@pdx.edu

Dean Atkinson
atkinsond@pdx.edu

TCEQ:
Raj Nadkarni
RNadkarn@tceq.state.tx.us

UH:
Barry Leffer
bleffer@uh.edu

Bernhard Rappenglück
brappenglueck@uh.edu

Personnel

Cris Holm
holmc@pdx.edu

Scope and Scale

Customer

TCEQ (Texas Commission on Environmental Quality)

Users/ audience

The tramp site itself will be available to the Study Participants, and members from NOAA (National Oceanic and Atmospheric Administration), HARC (Houston Advanced Research Center), DOE (U.S. Department of Energy), and a wide variety of Univerisites throughout the country.

Scope

The results from the work done on the website will eventually be passed on to the public, and be seen locally, nationally, and even internationally by a large variety of people, agencies, schools and government agencies.

Partners

NOAA (National Oceanic and Atmospheric Administration
HARC (Houston Advanced Research Center)
DOE (U.S. Department of Energy)
Many college universities throughout the country

Architecture/Design

Project Specification

  1. User authentication/administration
    1. Authentication of a limited number of known users, stored in the database.
    2. Ability to add/modify/remove users by an "admin" user.
    3. Ability for a given user to alter their own profile (change password, etc).
  1. Upload
    1. Ability to upload a single data file for storage in the db.
    2. Ability to set up some kind of batch upload.
      1. By connecting to an ftp repository.
      2. By selecting multiple files on the client's machine.
    3. Ability to parse the file(s) for upload, identify metadata associated with it.
      1. Should check if the metadata already exists in the db.
      2. If not, should present the metadata to the user for modification/validation.
        1. Allow the metadata to be changed before it's stored in the db.
        2. Allow the user to identify types of data found in the file as wind direction data that are dependent on wind speed data (also found in the file).
      3. If not, should add to the db after validation.
    4. If in non-batch mode, need to present data in the upload file to the user for validation before adding to the db.
    5. If in batch mode, data that is error-free will be added to the db, and any errors that occur will be logged.
    6. Associate data with georeference information (lat, long).
  2. Analysis (averaging data over a defined time interval)
    1. Ability to select data repositories by PI, PI organization and/or db location (db table name).
    2. Ability to specify the fields of interest from the tables identified in step a.
    3. Ability to specify a date range over which analysis will occur.
    4. Ability to break this date range into a series of intervals.
    5. Ability to specify the minimum number of data points necessary for each interval.
    6. Ability to calculate averages over each interval for each field selected.
    7. Ability to specify the number of results to be displayed at a given time.
    8. Ability to download the results in a comma or tab delimited text file, and possibly download the results as a command file (text) for use in R.
    9. Ability to retrieve and view metadata associated with a given table being analyzed.
  3. Mapping
    1. Ability to create a digital, interactive map that displays data over time from the data stored in the db that is georeferenced.

 

Database Structure and Interaction

There are 2 types of tables in the database: data tables and reserved tables. Data tables contain measurement data that was uploaded. Reserved tables contain meta-data about other things, including the data tables and their fields.

When data from a new source is initially uploaded, the upload file contains meta-data about the owner of the data, it's location, dates, variables and the data itself. A new data table is created for this data, with field names equivalent to the variable names found in the file. Then meta-data about the data table is added to 2 of the reserved tables -- the variables table and the appropriate meta-data table (so far only icartt_metadata, but the system is set up to track other kinds of meta-data).

The reserved tables are:

  1. reserved_tables

    This table stores the names of all tables that are considered 'reserved', including itself.
  2. variables

    This table stores data about all variables that exist for all data tables. Each variable record contains a variety of data on the variable, including:
    1. name (original name of the variable that may have been altered to make it appropriate for a field in a table).
    2. units
    3. is_independent
    4. is_time (only non-time variables are used in analysis)
    5. scale_factor (a multiplicative number used in units conversion -- defaults to 1)
    6. missing_data_flag (indicates that data is missing -- usually -9999)
    7. ulod_value (upper limit of detection value; a value to use if a data value is the ulod flag -- usually -8888)
    8. llod_value (lower limit of detection value; a value to use if a data value is the llod flag -- usually -7777)
    9. tablename (the name of the table that this variable belongs to)
    10. wind_dependency (the name of another variable from the same table that this variable is dependent on -- used in calculating wind direction/speed vector averages)
  3. users
  4. icartt_metadata

    This table stores the information that is typically found in the header of an ICARTT file. The ICART specification can be found at http://www-air.larc.nasa.gov/missions/etc/IcarttDataFormat.htm

All data tables have a structure like:

  1. Table name is derived from the ICARTT style data ID and location ID, concatenated by an underscore (example: crdtn_uhmt).
  2. Field names are the same as the variable names found in the uploaded data files. The field name may have been alterted to be appropriate for a table, which means only alphanumeric characters and underscore, and each field name must begin with either a character or underscore. The original variable names are stored in the variables table.
  3. There are at least 3 fields:
    1. A unique system-generated ID.
    2. The independent variable (a UNIX timestamp).
    3. At least one non-time dependent variable.
  4. There may be additional fields in the table, consisting of:
    1. More time variables (these are typically ignored and are only added to the table for completeness because they were in the orginial data file).
    2. More data variables.
  5. The all data fields have the following structure:
    1. Time fields are PostgreSQL numeric(13, 3) data types, which have an exact precision (no loss) and up to 10 digits before the decimal and up to 3 digits after the decimal (for a total of 13). Time values are UNIX timestamps with a precision of milliseconds.
    2. Data fields are numeric(20, 10) data types, having a maximum of 10 digits before and 10 digits after the decimal point.

Requirements

Project home and resource storage

TCEQ website
Project Home

Related Projects

Project status

Timeline

March 1, 2007: Phase One completed
End of May: A demonstration of the functionality of the site
July 1, 2007: Phase Two completed
August 1, 2007: Phase Three completed