Hang's pic


Research is fun!




Professional Services



Highlights of my research (in inverse chronological order)

Soft Pattern Matching Models for Definitional Question Answering

Lexico-syntactic pattern matching can be applied to many natural language retrieval tasks, such as information extraction and question answering. Previous work tries to induce rigid, hard-matching text patterns by manually construction or supervised learning. In contrast, we demonstrate flexible and soft matching models for lexico-syntactic patterns.  We propose two formal soft matching models: one based on bigrams and the other on the Profile Hidden Markov Model (PHMM). Both models provide a theoretically sound method to model pattern matching as a probabilistic process that generates token sequences. Such generic soft matching models can be extended to other applications that use surface text patterns.

Related papers and software:
[TOIS] [WWW2004] [SIGIR2005-1] [IR4QA]

Fuzzy Matching of Dependency Relations for Factoid QA Passage Retrieval

Most of current question answering (QA) systems employ term-density ranking to retrieve answer passages. Such methods often retrieve incorrect passages as relationships among question terms are not considered. Previous studies attempted to address this problem by matching dependency relations between questions and answers. They used strict matching, which fails when semantically equivalent relationships are phrased differently. We propose fuzzy relation matching based on statistical models. We present two methods for learning relation mapping scores from past QA pairs: one based on mutual information and the other on expectation maximization. Experimental results show that our method significantly outperforms density-based passage retrieval methods by up to 78% in mean reciprocal rank. Relation matching also brings about a 50% improvement in a system enhanced by query expansion. 

Related papers and software:
[SIGIR2005-2] [SIGIR2005-3]

Probabilistic Query Expansion with User Logs

Queries to search engines on the Web are usually short. They do not provide sufficient indications for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are determined only on the analysis of documents. In this study, we propose a new method for query expansion based on user interaction information recorded in the web query logs. The central idea is to extract correlations between query terms and document terms by analyzing query logs. These correlations are then used to select high-quality expansion terms for new queries. In comparison with previous query expansion method, our method takes advantage of the user judgments implied in user logs. Our experimental results show that the log-based query expansion method can produce much better results than both the classical search method and the other query expansion methods.

Note: this work was finished when I was interning at Microsoft Research Asia, with Dr. Ji-Rong Wen.

Related papers:
[TKDE] [WWW2002]

Question Answering Track at TREC

I have been involved as a main player in our participation in the question answering track at TREC (Text Retrieval Conference) in 2003 and 2004. I was not very much involved in TREC2005 due to my summer internship.

NUS @ TREC 2004 - I was the team leader of the NUS team participating in TREC QA 2004. Our team finally was officially ranked the second among all participating teams. The definitional question answering part, which was developed by myself, obtained the best performance in the official evaluations. 

NUS @ TREC 2003 - I was soly responsible for the definitional question answering sub-task. My system was ranked the second among all teams.

Related papers and software:
[TREC2003] [TREC2004] [TREC2005]