Matukar Panau corpus building for the study of language use in context
Matukar Panau is a highly endangered Oceanic language spoken near Madang, Papua New Guinea. Although most children are no longer learning Matukar Panau, current speakers (apx. 300) form a vibrant community of multilinguals in dense social networks. As an Oceanic language on the PNG coast, Matukar Panau has many interesting Papuan features. No language of this area has a large corpus available. This project aims to produce 47 hours of audio-visual recordings and a 200,000+ word corpus. Recordings will focus on conversations where participants are varied by age, gender, clan, social connections and differing language portfolios to document speaker interaction. Pairs of speakers will discuss life, customs, and family and will respond to stimuli-based activities. Transcription and translation of data into the PNG lingua franca, Tok Pisin, will be done with an existing trained local team.
This collection will contain recordings from speakers of Matukar village and the neighbouring hamlet of Surumarang. These villages are the only places where the language is spoken. They are located on the North Coast Highway in Madang Province, about 45km north of Madang. The majority of Matukar Panau speakers are older. People of Matukar and Surumarang live in clan and family based groups in small “neighbourhoods” of the villages. Speaker metadata will include age, clan and neighbourhood information for future study of sociolinguistic variation.
This documentation project, and the production of a text corpus, is only possible with the help of the Matukar Panau Transcription and Translation team: Rudof Raward, Justin Willie, Alfred Sangmei, Amos Sangmei, Micheal Balias, and Zebedee Kreno† and the help from consultants to edit the data. My primary consultant is Kadagoi Rawad Forepiso. Other consulting help has come from Kennedy Barui, Cathy Samun Williang, Taleo Kreno, Berry Barui and John Bogg.
The transcription and translation team transcribes the data using ELAN, developing their software specific and general computer skills, awareness of spelling, variation, and language structure. These younger members of the Matukar Community are in their 20s to early 40s. They are not fluent speakers of the language, so they will often pair with language experts to transcribe the material. The language experts are often their relatives, and working with them, the transcribers learn more Matukar Panau. Some older members of the community are then involved in language teaching. This kind of pairing community members is both good for the project and good for language maintenance and revitalization.
The planned contents include audiovisual recordings, a text corpus which will be updated over the course of the project, a lexicon and community materials.
Previously collected data will comprise part of the forthcoming text corpus. Data was collected in 2010, 2011, 2013 and 2016 and funded by the Living Tongues Institute for Endangered Languages, National Geographic Enduring Voices, Firebird Foundation for Anthropological Research and the Australian Research Council Centre of Excellence for the Dynamics of Language.
Matukar Panau is a highly endangered Oceanic language spoken around 45 km north of Madang, Papua New Guinea. Matukar is a village with around 500 people and Surumarang is a smaller hamlet with around 200 people. Of these 700 people, most (540) are under 30 years old and are unlikely to speak more than very basic Matukar Panau. Their first and dominant language is the English-based creole Tok Pisin. Another 130 or so people are between 30 and 50 years old. Their first language is Matukar Panau, but many speaking instead primarily Tok Pisin. The dominant language for most of these people is certainly Tok Pisin, but they can and will still use Matukar Panau, and will often do code-switching with Tok Pisin. Around 25 people are over the age of 50, and while these speakers are also Tok Pisin speakers, they still speak Matukar Panau often and well.
Although the youngest adults and children speak primarily Tok Pisin although there is still societal value in conducting small social rituals in Matukar Panau as opposed to Tok Pisin. For instance, greetings are often done in Matukar Panau such as good morning (tidom mami uyan), good day (sabi uyan), good afternoon (raurau uyan) or good night (tidom uyan), how are you doing? (uyan madonggo [sitting good] or uyan turago [standing good] or mateng ti, abab ti [no sickness, no wounds]?). Someone who primarily speaks Tok Pisin, may still ask in Matukar Panau if someone has betel nut (mariu), lime (kambang), mustard (ful) or cigarettes (kas).The person asked is expected to give one or two small items to the asker.
In addition to the Matukar Panau-Tok Pisin bilingualism, many villagers, especially older villagers, speak another indigenous language. There are many exogamous marriages and so some people have learned the language of their spouse or parent from another village. People may speak a Papuan language like Bargam, or closely related Oceanic language like Takia, or both. Some spouses of native Matukar villagers have also learned Matukar Panau to some extent. Other languages people speak include: Gedaged, English, Manam, Ngain, Pelipoai, Riwo, Waskia, Widar, Yamai, and Yoidik.
Therefore the language situation is complex, with prevalent multilingualism, with children unlikely to learn the language productively, but with many people still having a strong association between language and belonging.
Matukar Panau has interesting typological features such as multiple complex clause types, complex nominalizations, multiple possession strategies and variation of phonology and semantics due to gender and social network. Many of these interesting features seem to have developed due to the long contact and multilingualism with Papuan languages. Further study is required, a goal of this project by producing a large text corpus, to search for determining contexts and for later comparison with both Austronesian and Papuan languages.
Acknowledgement and citation
Users of any part of the deposit should acknowledge Danielle as the data collector and researcher. Users should also acknowledge the Endangered Languages Documentation Programme(ELDP)as the funder. Further, individual speakers whose words and/or images are used should be acknowledged by name. This information on contributors is available in the metadata.
To refer to any data from the corpus, please cite the corpus in this way:
Barth, Danielle. 2018-. Matukar Panau corpus building for the study of language use in context. London: SOAS, Endangered Languages Archive. URL: [insert link here]. Accessed on [insert date here].