|
Publications
Talks
Background
Service
pool,
food,
film.
Pictures from ICDE 2005 in Tokyo.
|
Teaching
CS410/510: Scientific Data Management
Research
Science and Engineering databases, unstructured metadata
Scientific Data
The large datasets produced by simulations typically have a grid structure that is not amenable to storage within traditional database systems. We've developed an algebra of GridFields that allows convenient manipulation of grid-structured datasets much in the way the relational algebra allows convenient manuipulation of table-structured data. This work is in the context of CORIE, an Environmental Observation and Forecasting System. The data management arm of the CORIE project has been dubbed cormorant.
Do you manage or process simulation results? We want to help!
Unstructured Metadata
In a related project, we've developed a storage system for large-scale unstructured metadata. Given a stream of resource-property-value triples, we identify simple patterns in the data. These patterns are used to build an index that supports a generic API. Unlike a relational database, the system requires no data modeling effort up front and can be used right out of the box.
We've used Quarry to manage metadata for a scientific repository, and with guidance from Nick Rayner, we've begun deploying it for a medical informatics application. For both applications, we find Quarry to be useful when exploring an unfamiliar dataspace. You can use it to test hypotheses about the data (and find exceptions) without writing complicated queries.
Document Metadata
A few years ago, I worked on the Forest project building a tool called MDX that used a controlled vocabulary to extract metadata from documents and present results with XML and XSLT. MDX became a component in a document retrieval system called Metadata++, the dissertation work of Matthew Weaver.
Dissertation
Publications
Other publications
Journal articles
- Algebraic Manipulation of Scientific Datasets (extended)
Bill Howe, David Maier
VLDB Journal, 14(4), November 2005
-
A Language for Spatial Data Manipulation
(pdf)
Bill Howe, David Maier, Antonio Baptista
Journal of Environmental Informatics, 2(2), December 2003
Other Refereed publications
- Smoothing the ROI Curve for Scientific Data Management Applications
(pdf)
Bill Howe, David Maier, Laura Bright
Third Biennial Conference on Innovative Data Systems Research (CIDR 2007)
- Retrofitting a Data Model to an Existing Environmental Repository
(pdf)
Bill Howe, David Maier
17th International Statistical and Scientific Database Management Conference (SSDBM 2005)
- Querying and Visualizing Gridded Datasets for e-Science
(pdf)
(quality color pdf handout)
(smaller bw pdf handout)
Bill Howe, David Maier
21st International Conference on Data Engineering (ICDE 2005) (demo)
- Algebraic Manipulation of Scientific Datasets
(pdf)
Bill Howe, David Maier
30th International Conference on Very Large Data Bases (VLDB 2004)
- Emergent Semantics: Towards Self-Organizing Scientific Metadata
(pdf)
Bill Howe, Kuldeep Tanna, Paul Turner, David Maier
International Conference on Semantics for a Networked World (SFNW 2004), co-located with SIGMOD 2004.
-
Representing, Exploiting, and Extracting Metadata using Metadata++
(doc)
Mathew Weaver, Bill Howe, Lois Delcambre, Tim Tolle and David Maier
Digital Government Conference (DG.O), May 2002 Los Angeles, CA
Non-refereed publications
- Logical and Physical Data Independence for Native Scientific Data Repositories
(pdf)
Bill Howe and David Maier
IEEE Data Engineering Bulletin, 27(4), December 2004
-
Modeling Data Product Generation
(pdf)
Bill Howe, David Maier
Workshop on Data Derivation and Provenance, August 2002, Chicago IL
Internal documents
Some Talks (powerpoint)
Some of these talks contain macros that require a visualization ActiveX control that you don't have, so you may safely respond with "disable macros" if prompted with a dialog.
All movies will appear as still images by default. If you want the movies to play, download download them, unzip them in the same directory as the presentation, and make sure you open the presentation wth the correct working directory (i.e., by double-clicking the file rather than by using File->Open.)
-
Smoothing the ROI curve for Scientific Data Management Applications, presented at CIDR 2007, January 17th, 2006.
-
Thesis Defense, presented December 8th, 2006.
-
Ten Things I Like About CS Graduate School (and Five I Don't), an informal talk given to undergraduates at Clark Atlanta University on February 20, 2006.
-
Downloading the World: Middleware for Computational Science, a talk on GridFields given while visiting Georgia Tech on February 21, 2006.
-
GridFields: Model-Driven Data Transformation for the Physical Sciences, my thesis proposal, presented January 17, 2006 at Portland State University. Some folks have asked for the Thesis Proposal document itself.
-
GridFields: Algebraic Manipulation of Scientific Datasets, presented September 3, 2004 at VLDB 2004 in Toronto, Ontario, Canada.
- Emergent Semantics: Towards Self-Organizing Scientific Metadata, presented June 2004 at SFNW 2004 in Paris France (co-located with SIGMOD 2004)
-
GridField Results, presented internally, November 18, 2003
-
Three Flavors of Scientific Data, presented internally, August 14 2003
-
Data Products and Product Management, a lecture I gave to Antonio Baptista's class, "Environmental Observation and Forecasting Systems", May 2002
-
Modelling Data Product Generation, presented at the Data Derivation and Provenance Workshop, August 2002, Chicago IL
Professional Service
- Program Committee, dg.o 2006
- Program Committee, dg.o 2005
- Demonstrations Program Committee, SIGMOD 2005
- Student Session Program Committee, dg.o 2004
Educational background
I have a Bachelor's degree in Industrial and Systems Engineering from Georgia Tech.
All the problems seemed to be about automation and optimization, and
software seemed to be required for both. So I started studying Computer
Science. Professional background
I've been working
with databases since 1995 when I worked for Delta Airlines as a co-op
in their Technical Operations facility. Since I graduated from Georgia
Tech, I've worked at Deloitte Consulting designing and building Customer Relatonship Management (CRM) systems, mainly with Siebel.
I also did some contracting at Microsoft working on an internal time
and expense system Microsft's own consulting practice. I've also done
some independent consulting work building systems for companies as
diverse as newly deregulated telecommunications carriers to providers
of oil field exploration support services.
|