Dataset

The Audio-Video Australian-English Speech Data Corpus

Also known as: AVOZES
University of Canberra
Dr Roland Goecke (Owned by)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft.title=The Audio-Video Australian-English Speech Data Corpus&rft.identifier=canberra.edu.au/Collection/Davozes001&rft.publisher=University of Canberra&rft.description= For testing and comparing results published by various research groups in the field of Audio Visual Speech Processing (AVSP), a common basis in the form of a comprehensive, systematically designed AV speech data corpus would be of great value. Many corpora appear to have been designed with a specific application in mind, rather than being based on a general phonemic and visemic analysis. The Audio-Video OZstralian English Speech (AVOZES ) data corpus was designed and recorded with two major goals in mind. Firstly, a new framework for the design of comprehensive, well-structured, multiple-use AV speech data corpora was proposed and followed in the production of the AVOZES data corpus. Secondly, the first publicly available, comprehensive AV speech data corpus for Australian English (AuE) was produced. In addition, it is the first AV speech data corpus to use a stereo vision system. A stereo vision system has the advantage over monocular systems that 3D coordinates can be recovered accurately. Thus, 3D distances can be measured, not just distances in 2D image coordinates, which makes the measurements robust against rotations of the face. These factors relate to the corpus recording process. One can argue that recordings made in laboratories do not mirror exactly the conditions in the real world. However, in terms of facilitating the interpretation of experimental results, it is an advantage to be able to control the experimental conditions. These conditions include the recording equipment, the possible use of markers, the layout of the recording room (e.g. background), the sitting arrangement, the illumination arrangement, and the level of acoustic noise. Going through all possible combinations of these conditions in a systematic way would result in an exponential growth of the corpus and quickly become impractical. It is suggested here to leave all conditions but one constant at a time, and to study the effects of changing that condition, rather than mixing the effects of various changing conditions in one recording. AVOZES currently contains recordings made from 20 native speakers of AuE. The group is gender balanced with ten female and ten male speakers. Six speakers wear glasses, three wear lip make-up, two have beards. At the time of the recordings, these speakers were between 23 and 56 years old. The speakers were tentatively classified into the three speech varieties of AuE (broad, general, cultivated) by the recording assistant, which created groups of 6 speakers for broad AuE, 12 speakers for general AuE, and only 2 speakers for cultivated AuE. Video information is encoded using the NTSC format, 720×480 pixels, 29.97Hz frame rate. The AVOZES AVI files use the Adaptec DVSoft codec, which most media players like RealPlayer, Windows Media Player, etc. have pre-installed. Audio information is encoded as 48kHz, 16-bit stereo. The Audio-Video OZstralian English Speech (AVOZES) data corpus has recently been made publicly available for other interested researchers. It is the first publicly available audio-video speech data corpus for Australian English. It contains recordings from 20 speakers and the sequences provide both a systematic coverage of the phonemes and visemes of Australian English as well as some application-driven utterances. AVOZES is also the first audio-video speech data corpus with stereo-video recordings, which enable a more accurate measurement of geometric facial features. &rft.creator=Anonymous&rft.date=2012&rft.coverage=108.703124,-10.216907 155.109374,-10.562705 154.757811,-44.142274 107.296874,-43.635547 108.703124,-10.216907&rft.coverage=Australia&rft_rights=This licence is primarily intended for users at academic or other non-commercial research institutions for the purpose of evaluation and internal research. It is most likely of particular use for research in AVSP, especially on AuE, but may also be useful for research in other fields. However, the data is provided as is and no warranty is given that it is useful or appropriate for the user's research. Under the non-commercial licence, the user is allowed to make as many copies of the data as is reasonably necessary for back-up purposes and use. The user is not allowed to alter or modify the data in any way, unless prior written permission has been given by the licensor. The user is also not allowed to combine the data with or incorporate the data in any other data for the purpose of publication or external use of such data, unless prior written permission has been given. However, the user is explicitly granted the right to perform experiments with and analyses on the data, use the data in conjunction with other (self-recorded or otherwise acquired) data, and may publish the results of any such work, under the obligation to explicitly make a reference to using AVOZES and cite this paper. Furthermore, the user is allowed to extract video frames (frame grabbing) and audio samples for the purpose of including them in the user's research publications (journal papers, conference papers, student theses) and presentations (conferences, lectures, seminars, web pages), provided that no such publication or presentation contains more than 50 video (still) frames and / or more than 10 audio or AV sequences (where sequence refers to the AVI- and WAV-files provided on the DVDs), unless prior written permission has been given by the licensor.&rft_rights=This licence agreement governs the relationship between the copyright and intellectual property owner (“the licensor”) of the audio-video speech data corpus known as the AVOZES data corpus and “the licensee”. This licence regulates the use of the AVOZES data corpus for non-commercial, in particular academic research, purposes ONLY by the licensee. If at any stage a commercial use of the AVOZES data corpus is intended or actively pursued, a commercial licence must be obtained. http://users.cecs.anu.edu.au/~roland/AVOZES/avozesNoncommercialLicence.pdf&rft_subject=Computer Vision&rft_subject=Information and Computing Sciences&rft_subject=Artificial Intelligence and Image Processing&rft_subject=Image Processing&rft.type=dataset&rft.language=English Go to Data Provider

Please use the contact information below to request access to this data.

Licence & Rights:

Other view details
Unknown

This licence agreement governs the relationship between the copyright and intellectual property owner (“the licensor”) of the audio-video speech data corpus known as the AVOZES data corpus and “the licensee”. This licence regulates the use of the AVOZES data corpus for non-commercial, in particular academic research, purposes ONLY by the licensee. If at any stage a commercial use of the AVOZES data corpus is intended or actively pursued, a commercial licence must be obtained.
http://users.cecs.anu.edu.au/~roland/AVOZES/avozesNoncommercialLicence.pdf

This licence is primarily intended for users at academic or other non-commercial research institutions for the purpose of evaluation and internal research. It is most likely of particular use for research in AVSP, especially on AuE, but may also be useful for research in other fields. However, the data is provided "as is" and no warranty is given that it is useful or appropriate for the user's research. Under the non-commercial licence, the user is allowed to make as many copies of the data as is reasonably necessary for back-up purposes and use. The user is not allowed to alter or modify the data in any way, unless prior written permission has been given by the licensor. The user is also not allowed to combine the data with or incorporate the data in any other data for the purpose of publication or external use of such data, unless prior written permission has been given. However, the user is explicitly granted the right to perform experiments with and analyses on the data, use the data in conjunction with other (self-recorded or otherwise acquired) data, and may publish the results of any such work, under the obligation to explicitly make a reference to using AVOZES and cite this paper. Furthermore, the user is allowed to extract video frames ("frame grabbing") and audio samples for the purpose of including them in the user's research publications (journal papers, conference papers, student theses) and presentations (conferences, lectures, seminars, web pages), provided that no such publication or presentation contains more than 50 video (still) frames and / or more than 10 audio or AV sequences (where sequence refers to the AVI- and WAV-files provided on the DVDs), unless prior written permission has been given by the licensor.

Access:

Other view details

Using the data of the AVOZES data corpus requires a licence. If you haven’t got a licence yet, but would like to use AVOZES, please contact the author. A licence can be acquired by individuals, institutions, or commercial entities.

Contact Information

Roland.Goecke@canberra.edu.au

roland.goecke@ieee.org

Ph: 61 2 6201 2114

Fax: 61 2 6201 5231

University Drive, Bruce, ACT 2617

Brief description

The Audio-Video OZstralian English Speech (AVOZES) data corpus has recently been made publicly available for other interested researchers. It is the first publicly available audio-video speech data corpus for Australian English. It contains recordings from 20 speakers and the sequences provide both a systematic coverage of the phonemes and visemes of Australian English as well as some application-driven utterances. AVOZES is also the first audio-video speech data corpus with stereo-video recordings, which enable a more accurate measurement of geometric facial features.

Full description

For testing and comparing results published by various research groups in the field of Audio Visual Speech Processing (AVSP), a common basis in the form of a comprehensive, systematically designed AV speech data corpus would be of great value. Many corpora appear to have been designed with a specific application in mind, rather than being based on a general phonemic and visemic analysis. The Audio-Video OZstralian English Speech (AVOZES ) data corpus was designed and recorded with two major goals in mind. Firstly, a new framework for the design of comprehensive, well-structured, multiple-use AV speech data corpora was proposed and followed in the production of the AVOZES data corpus. Secondly, the first publicly available, comprehensive AV speech data corpus for Australian English (AuE) was produced. In addition, it is the first AV speech data corpus to use a stereo vision system. A stereo vision system has the advantage over monocular systems that 3D coordinates can be recovered accurately. Thus, 3D distances can be measured, not just distances in 2D image coordinates, which makes the measurements robust against rotations of the face. These factors relate to the corpus recording process. One can argue that recordings made in laboratories do not mirror exactly the conditions in the real world. However, in terms of facilitating the interpretation of experimental results, it is an advantage to be able to control the experimental conditions. These conditions include the recording equipment, the possible use of markers, the layout of the recording room (e.g. background), the sitting arrangement, the illumination arrangement, and the level of acoustic noise. Going through all possible combinations of these conditions in a systematic way would result in an exponential growth of the corpus and quickly become impractical. It is suggested here to leave all conditions but one constant at a time, and to study the effects of changing that condition, rather than mixing the effects of various changing conditions in one recording.

AVOZES currently contains recordings made from 20 native speakers of AuE. The group is gender balanced with ten female and ten male speakers. Six speakers wear glasses, three wear lip make-up, two have beards. At the time of the recordings, these speakers were between 23 and 56 years old. The speakers were tentatively classified into the three speech varieties of AuE (broad, general, cultivated) by the recording assistant, which created groups of 6 speakers for broad AuE, 12 speakers for general AuE, and only 2 speakers for cultivated AuE.

Video information is encoded using the NTSC format, 720×480 pixels, 29.97Hz frame rate. The AVOZES AVI files use the Adaptec DVSoft codec, which most media players like RealPlayer, Windows Media Player, etc. have pre-installed.

Audio information is encoded as 48kHz, 16-bit stereo.

Data time period: 2000 to 2001

108.703124,-10.216907 155.109374,-10.562705 154.757811,-44.142274 107.296874,-43.635547 108.703124,-10.216907

131.203124,-27.1795905

text: Australia

Subjects

User Contributed Tags    

Australian English English Language Linguistics

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers
  • Local : canberra.edu.au/Collection/Davozes001