Syllabus
This graduate seminar examines the research frontiers in large-scale data management, with a focus on data lakes — repositories that store raw, heterogeneous data at scale — and the emerging concept of model lakes, which manage collections of trained machine learning models alongside the data used to produce them.
Students will read and critically evaluate recent research papers, lead seminar discussions, and complete an original research project. Topics span data integration, metadata management, data discovery, provenance, versioning, and the intersection of data management with modern AI pipelines.
Prerequisites
CS 448/648 (Databases) or equivalent graduate-level background in data management systems.
Format
Paper-based seminar. You will submit reviews of two papers per week (though more papers may be presented); students rotate as discussion leads.
Textbook
No required textbook. All readings distributed as PDFs via the course page.
Communication
Announcements via Piazza. Email for private matters only.
| Component | Description | Weight |
|---|---|---|
| Paper Reviews | Written critique/reviews (one PDF page) submitted before each class | 15% |
| Discussion Leading | Lead seminar discussion on assigned paper(s) | 20% |
| Participation | Active, prepared engagement in all sessions | 15% |
| Research Project | Original research; proposal, progress report, final paper & talk | 50% |
Paper reviews
Paper reviews should be one page with three strong points and three weak points backed up by evidence. An example latex template for the reviews is here. You are not require to use latex though, you may use whatever editor you like.
Lectures & Readings
The schedule below is subject to revision. Readings will be posted at least one week in advance.
Projects
The course project is the central deliverable. Projects may be done individually or in pairs. You are encouraged to pursue work that could lead to a workshop or conference submission. Topics should relate to data lake or model lake management; see the instructor for approval of adjacent ideas.
Proposal (2 pages)
State the problem, motivation, related work, and proposed approach. Include a plan with milestones.
Progress Report (3 pages)
Summarize work completed, preliminary results or observations, revised timeline, and any changes from the proposal.
Project Presentations (30 min)
In-class presentations of your current direction and results. Feedback from instructor and peers.
Final Paper (5–8 pages, ACM format)
Research paper describing problem, related work, technical approach, experiments, and conclusions.
Contact
Instructor
Renée J. Miller
University Research Chair
Cheriton School of Computer Science
University of Waterloo
Office hours: Wed 11 AM, DC 3355
or by appointment
Course Information
CS 848 — Summer 2026
Mon 1–4 PM
DC 2568
Discussion & Announcements
Piazza (enroll with UW email)
Paper Submissions
TBA