Friday, December 01, 2006 7:07 AM
by
garykac
Publications
Normally, my list of publications lives on http://research.microsoft.com/~garykac, but since that page may be disappearing soon, here is a list (as of 1 December 2006 - with links to papers when available).
Publications
- G.Kacmarcik, M.Gamon Obfuscating Document Stylometry to Preserve Author Anonymity, In The Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics, COLING/ACL 06, Sydney, Australia, 2006. (Poster)
| Abstract: |
This paper explores techniques for reducing the effectiveness of standard authorship attribution techniques so that an author A can preserve anonymity for a particular document D. We discuss feature selection and adjustment and show how this information can be fed back to the author to create a new document D’ for which the calculated attribution moves away from A. Since it can be labor intensive to adjust the document in this fashion, we attempt to quantify the amount of effort required to produce the anonymized document and introduce two levels of anonymization: shallow and deep. In our test set, we show that shallow anonymization can be achieved by making 14 changes per 1000 words to reduce the likelihood of identifying A as the author by an average of more than 83%. For deep anonymization, we adapt the unmasking work of Koppel and Schler to provide feedback that allows the author to choose the level of anonymization. |
- G.Kacmarcik Using Natural Language to Manage NPC Dialog, In Artificial Intelligence and Interactive Digital Entertainment, AIIDE 06, Marina del Rey, California, 2006. (Poster)
| Abstract: |
In this document, we describe our work applying natural language (NL) technologies to improve non-player character (NPC) dialog interactions in games, specifically role-playing games (RPGs). Our approach is to adapt the standard dialog menu interaction so that the menu items are dynamically-generated during game runtime rather than scripted during development time. In our system, menu items are constructed by manipulating abstract semantic representations stored in the NPC knowledgebase, converting them into NL text, and then ranking them so that the most relevant items are placed at the top of the menu. We demonstrate our approach in the context of a small RPG. |
- H.Suzuki, G.Kacmarcik RefRef: A Tool for Viewing and Exploring Coreference Space, In The 5th International Conference on Language Resources and Evaluation, LREC 06, Genoa, Italy, 2006.
| Abstract: |
We present RefRef, a tool for viewing and exploring coreference space, which is publicly available for research purposes. Unlike similar tools currently available whose main goal is to assist the annotation process of coreference links, RefRef is dedicated for viewing and exploring coreference-annotated data, whether manually tagged or automatically resolved. RefRef is also highly customizable, as the tool is being made available with the source code. In this paper we describe the main functionalities of RefRef as well as some possibilities for customization to meet the specific needs of the users of such coreference-annotated text. |
MSR RefRef (the tool described in this paper) can be downloaded here.
- G.Kacmarcik, Multi-Modal Question Answering: Questions without Keyboards, In The 2nd International Joint Conference on Natural Language Processing, IJCNLP 05, Jeju Island, Republic of Korea, 2005. (Poster)
| Abstract: |
This paper describes our work to allow players in a virtual world to pose questions without relying on textual input. Our approach is to create enhanced virtual photographs by annotating them with semantic information from the 3D environment’s scene graph. The player can then use these annotated photos to interact with inhabitants of the world through automatically generated queries that are guaranteed to be relevant, grammatical and unambiguous. While the range of queries is more limited than a text input system would permit, in the gaming environment that we are exploring these limitations are offset by the practical concerns that make text input inappropriate. |
- G.Kacmarcik, Question-Answering in Role-Playing Games, In Papers from the AAAI Workshop on Question Answering in Restricted Domains, Technical Report WS-05-10, AAAI Press, Pittsburgh, Pennsylvania, pp.51-55, 2005.
| Abstract: |
In this paper, we give a general description of the issues associated with performing basic Question-Answering (QA) tasks against non-player characters (NPCs) within a simple role-playing game (RPG) or virtual world environment. We describe the aspects of this kind of QA system and provide an overview of our initial explorations into the implementation and evaluation of a QA system that is appropriate for RPG environments. We also introduce Keystone, a simple RPG environment that provides a small virtual world that allows different NPC controller backends to be plugged in thus allowing these systems to be evaluated in a more realistic game environment. |
- H.Suzuki, G.Kacmarcik, L.Vanderwende, A.Menezes, Mindnet/mnex:意味関係データベースの自動構築と解析のためのツール (Mindnet and mnex: An Environment for Exploring Semantic Space), In 言語処理学会第11回全国大会論文集 (Proceedings of the 11th Annual Meeting of the Society of Natural Language Processing), Takamatsu, Japan, 2005. (in Japanese)
| Abstract: |
Mindnet(マインドネット)は通常のテキストデータから、単語間の相互意味関係を直接的・自動的に抽出したデータベースであり、mnex(ネックス)はこれをさまざまな観点から表示・探索するためのウェブ・ツールである。通常のシソーラスの機能に加え、ひとつの単語の意味関係だけでなく、2単語間の意味関係を、その意味関係のタイプを指定して表示することができる。Mindnetの意味関係の抽出は、人手によることなく、構文解析による意味関係の同定と、その重みづけの2段階に分けて行われる。重みづけは、語と語を結びつける意味関係のパスに対して行われ、有用な意味関係のみを抽出することを目的としている。 Mindnetは、辞書と百科事典をソースデータに用いて、まず英語で開発されたが、現在までに日本語でも辞書と百科事典テキストにもとづいたMindnetが構築されている。 |
- G.Kacmarcik, Making Use of Furigana, In The 1st International Joint Conference on Natural Language Processing, IJCNLP 04, Sanya, Hainan Island, China, pp.159-164, 2004.
| Abstract: |
An interesting aspect of written Japanese that has not been well studied is the use of furigana, or reading cues, to assist linguistic processing of text. Difficulties in processing this material have led to the situation where it is sometimes considered more convenient to simply remove the parenthetical material rather than to process it. This paper describes a system that makes use of the furigana to assist with various tasks, including segmentation, word sense disambiguation and support for OOV items. The system reports an F-measure score of 93.3% on the task of matching the base text with its furigana. |
- E.Brill, G.Kacmarcik, C.Brockett, Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs, In Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, NLPRS 2001, Tokyo, Japan, pp.393-399, 2001. (Alternate link)
| Abstract: |
This paper describes a method of extracting katakana words and phrases, along with their English counterparts from non-aligned monolingual web search engine query logs. The method employs a trainable edit distance function to find <katakana, English> pairs that have a high probability of being equivalent. These pairs can then be used to further bootstrap training of the edit distance function, resulting in improved back-transliteration from katakana to English. In addition, this is an effective method for mining large numbers of katakana strings to enhance a bilingual lexicon. The improved edit distance function and enhanced lexicon can be used for more accurate alignment of bitexts, and for application during runtime MT and multilingual IR. |
- G.Kacmarcik, C.Brockett, H.Suzuki, Robust Segmentation of Japanese Text into a Lattice for Parsing, In Proceedings of the 18th International Conference on Computational Linguistics, COLING 2000, Saarbrüken, Germany, pp.390-396, 2000.
| Abstract: |
We describe a segmentation component that utilizes minimal syntactic knowledge to produce a lattice of word candidates for a broad coverage Japanese NL parser. The segmenter is a finite state morphological analyzer and text normalizer designed to handle the orthographic variations characteristic of written Japanese, including alternate spellings, script variation, vowel extensions and word-internal parenthetical material. This architecture differs from conventional Japanese wordbreakers in that it does not attempt to simultaneously attack the problems of identifying segmentation candidates and choosing the most probable analysis. To minimize duplication of effort between components and to give the segmenter greater freedom to address orthography issues, the task of choosing the best analysis is handled by the parser, which has access to a much richer set of linguistic information. By maximizing recall in the segmenter and allowing a precision of 34.7%, our parser currently achieves a breaking accuracy of ~97% over a wide variety of corpora. |
- H.Suzuki, C.Brockett, G.Kacmarcik, Using a Broad-Coverage Parser for Word-Breaking in Japanese, In Proceedings of the 18th International Conference on Computational Linguistics, COLING 2000, Saarbrüken, Germany, pp.822-827, 2000.
| Abstract: |
We describe a method of word segmentation in Japanese in which a broad-coverage parser selects the best word sequence while producing a syntactic analysis. This technique is substantially different from traditional statistics- or heuristics-based models which attempt to select the best word sequence before handing it to the syntactic component. By breaking up the task of finding the best word sequence into the identification of words (in the word-breaking component) and the selection of the best sequence (a by-product of parsing), we have been able to simplify the task of each component and achieve high accuracy over a wide variety of data. Word-breaking accuracy of our system is currently around 97~98%. |
- G.Kacmarcik, Optimizing PowerPC Code: Programming the PowerPC Chip in Assembly Language, Addison-Wesley, 1995. (ISBN 0201408392)
- G.Kacmarcik, Assembly Language Programming and Optimization Techniques for the POWER Architecture, In The 8th Annual MacHack Conference - The (B)leading Edge, Ann Arbor, Michigan, pp.7-38, June 1993.
- G.Kacmarcik, A Neuroethological Model of the Cockroach Escape Response, Master's Thesis, Case Western Reserve University, CAISR Technical Report TR 91-133, June 1991.
- R.D.Beer, G.J.Kacmarcik, S.Chai, R.E.Ritzmann, H.J.Chiel, Ventral giant interneuron wind fields in the cockroach modeled with constrained back-propagation, Society for Neuroscience Abstracts, 17:1245, 1991.
- R.D.Beer, G.J.Kacmarcik, R.E.Ritzmann, H.J.Chiel, A model of distributed sensorimotor control in the cockroach escape turn, In Advances in Neural Information Processing Systems 3, NIPS 1990, R.P.Lippmann, J.Moody, D.S.Touretzky (eds), Morgan Kaufmann Publishers, 1990.
- R.D.Beer, G.J.Kacmarcik, R.E.Ritzmann, H.J.Chiel, A computer model for escape in the cockroach, Society for Neuroscience Abstracts, 16:759, 1990.