© PLOS ONE PHYLOGENY CC BY 2.0 (https://creativecommons.org/licenses/by/2.0/)


In this MSc level seminar we will cover computational methods for genomic data, ranging from algorithms to deep learning.

This is an MSc level seminar in the general area of computational methods for genomic data, giving you a wide range of topics to choose from, according to your own interests. The main idea is to choose a scientific article presenting such a computational method, understand it, and explain it in detail, through an oral presentation and a 15-page seminar report. This will also entail that you do an in-depth presentation of the basic definitions and methods used in the article. Thus, it may also require that you read some related papers or look up some computational methods in a textbook. A tentative list of articles is given below. You can also propose your own topic.


[Taken by PTM] G1. Variation graph toolkit improves read mapping by representing genetic variation in the reference [Algorithms, Data Structures] https://www.nature.com/articles/nbt.4227
G2. Efficient Construction of a Complete Index for Pan-Genomics Read Alignment [Algorithms, Data Structures] https://link.springer.com/chapter/10.1007%2F978-3-030-17083-7_10
G3. On the Complexity of Sequence to Graph Alignment [Algorithms, Complexity] https://link.springer.com/chapter/10.1007%2F978-3-030-17083-7_6
G4. An Efficient, Scalable and Exact Representation of High-Dimensional Color Information Enabled via de Bruijn Graph Search [Algorithms, Data Structures] https://link.springer.com/chapter/10.1007%2F978-3-030-17083-7_1

M1. Dna2vec: Consistent vector representations of variable-length k-mers [Machine Learning, Neural Networks] https://arxiv.org/abs/1701.06279
[Taken by KM] M2. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks [Machine Learning, Neural Networks] https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0171410
[Taken by KH] M3. Predicting effects of noncoding variants with deep learning–based sequence model [Machine Learning, Neural Networks] https://dx.doi.org/10.1038%2Fnmeth.3547
M4. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks [Machine Learning, Neural Networks] https://genome.cshlp.org/content/26/7/990.long

[Taken by LO] P1. Inferring Cancer Progression from Single-cell Sequencing while Allowing Mutation Losses [Phylogenies, Simulated Annealing] https://www.biorxiv.org/content/10.1101/268243v2
[Taken by YB] P2. APPLES: Distance-based Phylogenetic Placement for Scalable and Assembly-free Sample Identification [Phylogenies, Dynamic Programming] https://www.biorxiv.org/content/10.1101/475566v2

- 5.9.2019 - 9.9.2019: You choose a paper and are nominated as opponent of a colleague. Send me 3 choices by e-mail (alexandru.tomescu@helsinki.fi), ranked in order of preference. Deadline 9.9.2019.
- 19.9.2019: You hold a 5-minute presentation briefly outlining your topic.
- 7.11.2019, 14.11.2019, 21.11.2019: Oral presentations (35 minutes + 10 minutes of questions), two presentation per meeting. We decide the exact days based on the number of participants. One week before your presentation you send a draft version of the report and of the slides to your opponent. Your opponent starts the 10-minute question session.
- 21.11.2019: You submit your report, possibly taking into account comments you received during your presentation.
- 21.11.2019 - 5.12.2019: Colleagues write reviews for your report (thus, you also review your colleagues reports).
- 5.12.2019 - 12.12.2019: You revise your report based on the reviews, and submit the final version.

- 7.11.2019: KM (paper M2), opponent LO. YB (paper P2), opponent PTM.
- 14.11.2019: PTM (paper G1), opponent YB. LO (paper P1), opponent KH.
- 21.11.2019: KH (paper M3), opponent KM.

12.8.2019 at 09:00 - 12.12.2019 at 23:59


Alexandru Tomescu's picture

Alexandru Tomescu

Published, 9.9.2019 at 22:48

We now have a nice selection of papers from all three topics. I marked the paper assignments as [Taken by INITIALS] next to the paper title.

If you haven't chosen a paper yet, you still have time to do so, before the next meeting on 19.9.2019.

Alexandru Tomescu's picture

Alexandru Tomescu

Published, 5.9.2019 at 13:49

As discussed at today's meeting, please send me by Monday 9.9, three choices of papers ranked in order of preference. You can also propose your own paper, and I will decide if it fits the seminar.

Next meeting is in two weeks from now, 19.9.2019, when you will hold a 5-minute presentation briefly outlining your topic.


Here is the course’s teaching schedule. Check the description for possible other schedules.

Thu 5.9.2019
12:15 - 14:00
Thu 19.9.2019
12:15 - 14:00
Thu 7.11.2019
12:15 - 14:00
Thu 14.11.2019
12:15 - 14:00
Thu 21.11.2019
12:15 - 14:00


Seminar Report

A seminar report is a short review paper (some 10-15 pages): you explain some interesting results in your own words. A typical seminar report will consist of the following parts:
- an informal introduction,
- a formally precise definition of the problem that is studied,
- a brief overview of very closely related work - here you might cite approx. 3–10 papers and explain their main contributions,
- a more detailed explanation of one or two interesting results, with examples, and
- conclusions.

Superficially, your report should look like a typical scientific article. However, it will not contain any new scientific results, just a survey of previously published work.

Review of Seminar Report

Submit a complete and finished version of your report for peer-review. A drafty version makes the reviewers' work unpleasant. The reviewers will be asked to comment on and evaluate (on a scale from 1 to 5) the following aspects:

Does the report do a good job of summarizing some of the key contributions of the work(s) it is based on? Has the focus of the report been well chosen? Does the report present enough details to understand what it is about and the key technical contents, while at the same time not giving too many technical details so that the main message is lost? Does the report explain (in the introduction / conclusions / review of related work) how the discussed content fits the "big picture", i.e., how the discussed content is situated more broadly in the context of the original works?

Does the report do a good job in explaining the original contributions in own words (rather than directly copying passages from the original works without exhibiting own understanding or insights)? Are additional references beyond the main original work appropriately referenced and discussed? Can you suggest further related articles to discuss? Are original examples of the key concepts and technical contents given appropriately?

Is the report clearly written?
Is it well structured?
Is it easy to read (considering the complexity of the topic)?
Is the language good?
Are enough examples and intuition given to easily grasp the main technical terminology and contributions discussed?
Overall, does the report succeed (considering all of the above) in giving a well thought through and proper overview of its subject of study, or does it need to be significantly revised to reach this goal? Evaluate with one of the following:

A solid report, does not need revising
Minor revision needed
Major revision needed
Concretely, what are the main parts that could be revised to improve on the current version of the report?

How much time did you put into writing this review, including the time used to read the report and other background articles (in hours)? How confident are you about your written assessment overall (very confident - somewhat confident - made educated guesses) and why?

Conduct of the course

The following elements are required:
- Giving a preliminary topic presentation (~5 min) during period I
- Giving a seminar presentation (~40 min) on your topic during period II
- Acting as an opponent for the presentation of one other student
- Active participation in the seminar meetings
- Writing a seminar report (~15 pages) on your topic to accompany the presentation
- Peer-reviewing of seminar reports of two other students
- Revising your seminar report based on the reviewers' comments

Grading (at scale 1-5) is based on an overall assessment of all the required components.


An ability to give scientific presentations. An ability to peer-review and give feedback on written work and on presentations. Improved scientific writing skills on computational biology. In-depth theoretical understanding of an advanced topic in genomic data science and computational biology.