Abstract
The ubiquity of data lakes has created fascinating new challenges for data management research. In this tutorial, we review the state-of-the-art in data management for data lakes. We consider how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management.
Download the slides.
Authors
- Fatemeh Nargesian (Universtiy of Toronto),
- Erkang Zhu (University of Toronto),
- Renée J. Miller (Northeastern University),
- Ken Q. Pu (Ontario Tech University),
- Patricia C. Arocena (TD Bank Group)