University of Waterloo  ·  CS 848  ·  Summer 2026

Advanced Topics in Databases Data Lake and Model Lake Management

Instructor
Renée J. Miller
Term
Summer 2026
Format
Seminar — Mon 1–4 PM
Location
DC 2568

Syllabus

This graduate seminar examines the research frontiers in large-scale data management, with a focus on data lakes — repositories that store raw, heterogeneous data at scale — and the emerging concept of model lakes, which manage collections of trained machine learning models alongside the data used to produce them.

Students will read and critically evaluate recent research papers, lead seminar discussions, and complete an original research project. Topics span data integration, metadata management, data discovery, provenance, versioning, and the intersection of data management with modern AI pipelines.

Prerequisites

CS 448/648 (Databases) or equivalent graduate-level background in data management systems.

Format

Paper-based seminar. You will submit reviews of two papers per week (though more papers may be presented); students rotate as discussion leads.

Textbook

No required textbook. All readings distributed as PDFs via the course page.

Communication

Announcements via Piazza. Email for private matters only.

ComponentDescriptionWeight
Paper ReviewsWritten critique/reviews (one PDF page) submitted before each class15%
Discussion LeadingLead seminar discussion on assigned paper(s)20%
ParticipationActive, prepared engagement in all sessions15%
Research ProjectOriginal research; proposal, progress report, final paper & talk50%

Paper reviews

Paper reviews should be one page with three strong points and three weak points backed up by evidence. An example latex template for the reviews is here. You are not require to use latex though, you may use whatever editor you like.

Lectures & Readings

The schedule below is subject to revision. Readings will be posted at least one week in advance.

May 11
What are data lakes and model lakes? Scope, challenges, open problems.
May 25
Overview of joinable table discovery. Reading list →
June 1
Union Search
Overview of unionable table discovery. Reading list →

Projects

The course project is the central deliverable. Projects may be done individually or in pairs. You are encouraged to pursue work that could lead to a workshop or conference submission. Topics should relate to data lake or model lake management; see the instructor for approval of adjacent ideas.

Due June 12

Proposal (2 pages)

State the problem, motivation, related work, and proposed approach. Include a plan with milestones.

Due July 10

Progress Report (3 pages)

Summarize work completed, preliminary results or observations, revised timeline, and any changes from the proposal.

Jul 20, Jul 27, Aug 4

Project Presentations (30 min)

In-class presentations of your current direction and results. Feedback from instructor and peers.

August (TBA)

Final Paper (5–8 pages, ACM format)

Research paper describing problem, related work, technical approach, experiments, and conclusions.

Contact

Instructor

Renée J. Miller

University Research Chair
Cheriton School of Computer Science
University of Waterloo

rjmiller@uwaterloo.ca

Office hours: Wed 11 AM, DC 3355
or by appointment

Course Information

CS 848 — Summer 2026

Mon 1–4 PM
DC 2568

Discussion & Announcements
Piazza (enroll with UW email)

Paper Submissions
TBA