Track 3 – 14:00-17:30
Duration: 3 hours, 30min coffee break
Location: Conference 2+3
Presenters: Rafael E Banchs1
1) Human Language Technology Department, Institute for Infocomm Research, Singapore
Target Audience: This tutorial is intended to new researchers in the field and to research students interested in learning on the use of the vector space model framework for speech and language processing applications.
The tutorial first introduces some fundamental concepts of distributional semantics and vector space models. More specifically, the concepts of distributional hypothesis, term-document matrices, term-frequency and inverse document frequency are revised, followed by a brief discussion on linear and non-linear dimensionality reduction techniques and their implications to semantic cognition.
Next, some classical examples of vector spaces in monolingual natural language processing applications are briefly discussed. More specifically, examples in the area of information retrieval (Deerwester et al 1998), related term identification (Sahlgren 2006, Banchs 2009), and semantic compositionality (Mikolov et al 2013a, Baroni and Zamparelli 2010) are briefly described.
Then, the tutorial will focus its attention on describing the use of the vector space model paradigm in cross-language applications. To this end, some recent examples will be presented and discussed in detail. More specifically, the discussion will address the problems of cross-language information retrieval (Littman et al 1998, Banchs and Kaltenbrunner 2008, Gupta et al 2014), cross-language sentence matching (Banchs and Costa-jussà 2010), and machine translation (Wälchli 2010, Banchs and Costa-jussà 2011, Mikolov et al 2013b).
Finally, the tutorial concludes with a discussion about future research problems and practical applications related to the use of vector model spaces in a cross-language setting. Future avenues for scientific research will be described, with major emphasis on the extension from vector and matrix representations to tensors (Baroni and Lenci 2010) as well as the problem of encoding word position information into the vector-based representations (Erk and Pado 2008, Recchia et al 2010).
The tutorial materials can be seen at http://www.rbanchs.com/documents/TUTORIAL_BANCHS_IS2015.pdf
R.E. Banchs, A. Kaltenbrunner (2008) “Exploring MDS projections for cross-language information retrieval”, in Proceedings of the 31st Annual International ACM SIGIR Conference, Singapore
R.E. Banchs (2009) “Semantic mapping for related term identification”, in A. Gelbukh (Ed.) Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2009, Lecture Notes in Computer Science 5449, pp 111-124, Springer
R.E. Banchs, M.R. Costa-jussà (2010) “A non-linear semantic mapping technique for cross-language sentence matching”, in Proceedings of the 7th International Conference on Advances in Natural Language Processing (IceTAL), pages 57-66, Reykjavik, Iceland
R.E. Banchs, M.R. Costa-jussà (2011) “A Semantic Feature for Statistical Machine Translation”, in Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 126–134, ACL HLT 2011, Portland, Oregon, USA
P. Gupta, K. Bali, R.E. Banchs, M. Choudhury, P. Rosso (2014) “Query Expansion for Cross-script Information Retrieval”, in Proceedings of the 37th Annual ACM SIGIR Conference