TikZ-dependency

TikZ-dependency allows you to draw dependency graphs in LaTeX documents with little or no effort. It also comes with a lot of styling facilities, to let you personalize the look and feel of the graphs at your liking.

Main features:

  • Intuitive syntax to draw dependency graphs directly in LaTeX
  • High-level interface to define the look-and-feel of the graphs
  • Based on PGF/TikZ
  • Convenient macros to enrich the graphs with arbitrary TikZ elements
You can download it directly from CTAN, but the most up-to-date version is more likely to be found on SourceForge.
Please, do not hesitate to send me feedback, bug reports or feature requests!

FLinK - A Framework for the Linearization of Kernel Functions

FLinK stands for “a framework for the linearization of kernel functions”.

The software has been developed as part of my PhD to support our research on the linearization of kernel functions, whose main results so far are published in the following papers authored by me with my co-advisor Alessandro Moschitti:

(My thesis contains more in-depth information, but I didn’t make it public yet because I am still fixing things here and there).

Although it could possibly be estended to support a large variety of kernel functions, at the moment the framework only supports the Syntactic Tree Kernel [Collins and Duffy, 2001] and the Partial Tree Kernel [Moschitti, 2006].

SVM-Light-TK by Alessandro Moschitti is used for learning and classification in Tree Kernel spaces.

For more information about the linearization process, please refer to the aforementioned papers. More details, as well as comprehensive documentation of the user interface and the programmer API can be found here.

FLinK is distributed under a double licensing scheme. For personal, teaching or research uses, the software is available under the GNU Lesser GPL (LGPL) v.3 license. The text of the license is available at http://www.fsf.org/licensing/licenses/lgpl.html. If you use this software for research, please consider referencing these papers [1],[2],[3] in your publications. Please note that research uses do NOT include those involving the development of technology to be employed for commercial or any other kind of revenue purposes. These include selling, releasing, or providing commercial services based on the software. For any other uses, the software is released under a commercial license. The terms of the license are defined on a per-request basis. You can contact me by email for more information.

Follow this link to download the latest version of the software.


References

  1. On Reverse Feature Engineering of Syntactic Tree Kernels, Pighin, Daniele, and Moschitti Alessandro , Conference on Natural Language Learning (CoNLL-2010), 08/2010, Uppsala, Sweden, (2010)
  2. Reverse Engineering of Tree Kernel Feature Spaces, Pighin, Daniele, and Moschitti Alessandro , EMNLP'09: Empirical Methods of Natural Language Processing, 08/2009, Singapore, (2009)
  3. Efficient Linearization of Tree Kernel Functions, Pighin, Daniele, and Moschitti Alessandro , CoNLL'09: Thirteenth Conference on Computational Natural Language Learning, 06/2009, Boulder, CO, USA, (2009)

OpenMT-2 Workshop on Using Linguistic Information for Hybrid Machine Translation - First Call for Papers

Friday, November 18, 2011. Barcelona, Spain. 

http://ixa2.si.ehu.es/lihmt2011

Background

Akin to the OpenMT Workshop on Mixing Approaches to Machine Translation in 2008 (http://ixa2.si.ehu.es/matmt-2008), the aim of the OpenMT-2 Workshop on Using Linguistic Information for Hybrid Machine Translation (HMT) is to promote corpus-based methods and technologies that combine resources and algorithms from the three general approaches to MT: rule-based (RBMT), example-based (EBMT) and statistical (SMT).

The boundaries between these three approaches are becoming narrower:

  1. String based SMT models are being augmented with morphological, syntactic or semantic information.
  2. RBMT systems are using parallel corpora to improve their results by enriching their lexicons and grammars and creating new methods for disambiguation,
  3. Previous projects have shown that benefits can be accrued by simple combination of MT systems created using different MT approaches.

At the same time, data-driven Machine Translation (EBMT and SMT) is nowadays prevalent within the MT research community and translation results obtained with these approaches have now reached a reasonably useful level of quality, especially when the target language is English.

But such data-driven MT systems base their knowledge on bilingual aligned corpora, and the accuracy of their output depends strongly on the quality and the size of that corpora. Large and reliable bilingual corpora are unavailable for many language pairs. In addition, translating into a morphologically rich target language makes the training of data-driven systems a lot more difficult.

Workshop Programme

The one-day workshop is being organised as part of the dissemination effort of the OpenMT-2 project, a Spanish government funded, three-year, multisite research effort addressing, on the one hand, approaches to integrating structural information (morphological, syntactic and semantic) into open-source SMT and, on the other, to developing novel automatic MT evaluation using linguistically motivated metrics. Thus, the central issues to be addressed during the workshop include:

  • methods and techniques for integrating structural information (syntactic and semantic) into HMT,
  • methods and techniques for handling morphologically rich languages (e.g. Basque) within HMT,
  • alternative approaches to automatic MT evaluation relying on linguistic criteria.

The programme will include three invited plenary talks, each addressing one of the central issues above, and the presentation of a number of refereed contributions on related topics. The invited speakers include:

  • Lucia Specia (University of Wolverhampton, UK),
  • Ondrej Bojar (Charles University, Czech Republic),
  • TBA (TBA).

The workshop will conclude with a brief panel discussion summarising the results of the presentations as they impact the central issues.

Topics of Interest

  • We are particularly interested in papers describing research and development in the following areas:
  • methods to compare and combine translation-outputs obtained from different MT systems,
  • methods for dealing with languages with rich morphology within data-driven approaches,
  • approaches to developing morphologically, syntactically or semantically augmented SMT models,
  • new automatic (or manual) MT evaluation methods based on linguistically motivated metrics,
  • descriptions of open-source or free language resources that are available for developing hybrid MT systems.

All contributions will be published in the workshop proceedings.

Important Dates

  • Paper submission deadline: Sept. 9, 2011,
  • Notification of acceptance: Oct. 7, 2011,
  • Final version of paper: Oct 21, 2011,
  • Workshop: Nov 18, 2011.

Submissions

Papers should be in English and up to a maximum of 8 pages long. Please follow the ACL HLT 2011 formatting requirements for long papers found at: http://www.acl2011.org/call.shtml

To submit contributions, please follow the instructions at the EasyChair conference management system submission website at: http://www.easychair.org/conferences/?conf=lihmt2011.

The deadline for submission is September 9, 2011. The contributions will undergo a double-blind review by members of the programme committee.

Please address queries to lihmt@easychair.org

Programme committee (Tentative)

Co-Chair: David Farwell (Technical University of Catalonia, TALP, Barcelona)

Co-Chair: Gorka Labaka (University of the Basque Country, Donostia)

  • Iñaki Alegria (University of the Basque Country, Donostia)
  • Ondrej Bojar (Charles University, Czech Republic)
  • Arantza Díaz de Ilarraza (University of the Basque Country, Donostia)
  • Chris Dyer (Carnegie Mellon University, US)
  • Cristina España (Technical University of Catalonia, TALP, Barcelona
  • Marcello Federico (Fondazione Bruno Kessler, Italy)
  • Mikel Forcada (University of Alacant, Alicante)
  • Adrià de Gispert (University of Cambridge, UK)
  • Kevin Knight (Information Sciences Institute, US)
  • Phillip Koehn (University of Edinburgh, UK)
  • José Mariño (Technical University of Catalonia, TALP, Barcelona)
  • Lluís Màrquez (Technical University of Catalonia, TALP, Barcelona)
  • Hermann Ney (RWTH-Aachen, Germany)
  • Daniele Pighin (Technical University of Catalonia, TALP, Barcelona)
  • Aarne Ranta (Chalmers University of Technology, Gothenburg, Sweden)
  • Marta R. Costa-jussà (Barcelona Media, Spain)
  • Felipe Sánchez-Martínez (University of Alacant, Alicante)
  • Kepa Sarasola (University of the Basque Country, Donostia)
  • Lucia Specia (University of Wolverhampton, UK)
  • Dekai Wu (Hong Kong University of Science and Technology, China)

Local organization

Centre for Speech and Language Applications and Technologies (TALP), Technical University of Catalonia (UPC).

Committee members: David Farwell (Chair), Amarin Deemagarn, Cristina España, Meritxell González, Lluís Màrquez, Daniele Pighin.

About the OpenMT-2 project

The main goal of the OpenMT-2 project is the development of Open Source Machine Translation Architectures based on hybrid models and advanced semantic processors. These architectures will be open-source systems combining the three main Machine Translation frameworks -- Rule-Based MT (RBMT), Statistical MT (SMT) and Example-Based MT (EBMT) -- into hybrid systems. Defined architectures and results of the project will be Open Source, so it will allow rapid development and adaptation of new advanced Machine Translations systems for other languages. We will test the functionality of this system with different languages: English, Spanish, Catalan and Basque; so we will evaluate such architectures in different contexts. While there are many corpus resources for English and Spanish, there are not so many for Catalan and Basque languages. While the structure of some of those languages is very similar (Catalan and Spanish), others are very different (English and Basque). Basque is an agglutinative and highly inflecting language, unlike English, Catalan and Spanish.

In parallel there has been extensive work on developing an automatic Evaluation platform that for the introduction of linguistically motivated morphological, syntactic and semantic metrics into the design of MT Evaluation methodologies as well as the development and testing of concrete, linguistically-based evaluation techniques.

The main innovative points of the OpenMT-2 project are:

  • The design of hybrid systems combining traditional linguistic rules, example-based methods and statistical methods.
  • The development of MT evaluation methods based on linguistically motivated metrics.
  • Open Source Systems.
  • The use of advanced syntactic and semantic processing in MT.

For further details, see the OpenMT-2 website:

http://ixa.si.ehu.es/openmt2

CoNLL 2010 - Relevant Fragments for QC, RE and SRL

This package contains the most relevant syntactic tree kernel fragments identified for each class on three different linguistic benchmarks:

  • question classification (QC)
  • relation extraction (RE)
  • semantic role labeling (SRL)

The fragments were isolated by reverse engineering SVM models, as described in:

On Reverse Feature Engineering of Syntactic Tree Kernels

On Reverse Feature Engineering of Syntactic Tree Kernels, Pighin, Daniele, and Moschitti Alessandro , Conference on Natural Language Learning (CoNLL-2010), 08/2010, Uppsala, Sweden, (2010)

YouTube. Chopin Etude op.10 n.12 by Horowitz.

One of my favourite pieces ever, in the best interpretarion that I ever heard.

TED Talk. Benjamin Zander on music and passion

Please, don't miss this one.

TED Talk. Richard Dawkins on militant atheism

A stunning and incredibly persuasive talk by Richard Dawkins on why atheists should be proud of their condition.

Mixed Features for Semantic Role Labeling

From this page you can download mixed structured (AST1m)/linear features data files that we used in our experiments on Semantic Role Labeling. The features are extracted from Charniak automatic parses as provided for the CoNLL 2005 shared task on SRL. The task and the extraction process are detailed in this paper.

Relevant Fragments for Question Classification

From this page it is possible to download the most relevant tree fragments (structured features) identified for the Coarse Grained Question Classification task. The fragments were selected using the Tree-Kernel model reverse engineering technique which we described in this paper.

Syndicate content