Kaisa_2012_3_photo by Veikko Somerpuro

16.8.2016 klo 08:00 - 11.9.2016 klo 23:59


Tästä osiosta löydät kurssin opetusaikataulun. Tarkista mahdolliset muut aikataulut kuvauksesta.

Ke 7.9.2016
14:15 - 15:45
To 8.9.2016
14:15 - 15:45
Ke 14.9.2016
14:15 - 15:45
To 15.9.2016
14:15 - 15:45
Ke 21.9.2016
14:15 - 15:45
To 22.9.2016
14:15 - 15:45
Ke 28.9.2016
14:15 - 15:45
To 29.9.2016
14:15 - 15:45
Ke 5.10.2016
14:15 - 15:45
To 6.10.2016
14:15 - 15:45
Ke 12.10.2016
14:15 - 15:45
To 13.10.2016
14:15 - 15:45
Ke 19.10.2016
14:15 - 15:45
To 20.10.2016
14:15 - 15:45



Homework 1

Search for corpora of your interest. Write a short report (max. 1 page) how and what did you find. (Try to characterize corpus using vocabulary from the slides – open access, downloadable, tagged, size, type of text etc.) You can also refer to the Glossary.pdf: http://cass.lancs.ac.uk/wp-content/uploads/2013/12/CASS-Gloss-final1.pdf
Homework deadline: 13.09.2016, 23.59

Homework 2 (Readings for the discussion)

Read a chapter on early corpus linguistics from Corpus linguistics: an introduction by Tony McEnery and Andrew Wilson and one from articles by Anatol Stfanovitsch

Focus on the following questions:
1. What were the main points of Chomsky's critique towards corpus linguistics?
2. Is the negative evidence in corpus linguistics possible?

Homework 3

Dotko (http://www.dolnoserbski.de/korpus/) is a corpus of Lower Sorbian, the adjective „Sorbian“ is in Lower Sorbian „serbski“, but there are many spellings of this word:
"S" can be: s, ss, ſ or ß
The consonant „b“ does not always appear
Try to find out all the possible spellings in Dotko. Send me a list of them, the query formula and the amount of results given by the query
Deadline: 20.09.2016 23:59


Target group: Everybody interested in learning Slavic languages or pursuing Slavic languages related study (also cultural and literary).

Jonkin kielen tai kulttuurin ilmiön läheisempi tunteminen.

Yleensä toisena tai kolmantena opintovuotena. Jakson voi liittää myös syventäviin opintoihin, jos ne suoritetaan vähimmäismäärää (90 op) laajempina.

During the course, the participants will be introduced to the Slavic corpora that might support language and culture learning, research and work as a translator. The course will give a detailed insight into the types of corpora (e.g. national, reference, synchronic, diachronic, web corpora) as well as methods of working with them (mainly through on-line corpus managers). No former knowledge of language technology is required. The course is most suitable for students who have at least basic knowledge of one Slavic language or who have completed some other course in Slavic linguistics. The course participants should be able to define their language, research and study interests at the beginning of the course so that the choice of the corpora discussed could be adjusted more precisely.

Preliminary course calendar:


Introduction to corpus linguistics, strengths and limitations of working with corpora


Regular expressions & query structure. Fundamentals for working with corpus manager. Part 1.


Discussions: criticism on corpus linguistics. Is negative evidence possible in corpus linguistics?

Regular expressions & query structure. Fundamentals for working with corpus manager. Part 2.


NoSketch Engine, KonText and their functions. Fundamentals for working with corpus manager. Part 3.


National corpora and other important monolingual corpora


National corpora and other important monolingual corpora


Web as Corpus – useful approach for studies in underresourced Slavic languages


Slavic Treebanks


Multilingual corpora: Parallel corpora: Intercorp, Parasol, EU data bases


Finnish-Slavic corpora


Other interesting multilingual sources, comparable corpora


Digital sources for diachronic studies


Miscellany – dialects, heritage speakers, records, videos etc.


Final conclusions

Luentokurssi. Vaihtoehtoisesti essee opettajan kanssa sovitusta aiheesta.

Completion: participation in discussions, homework, short paper demonstrating independent work with a corpus chosen by the participant.

Prof. Lindstedt.