How to use malt parser in python nltk

As a part of my academic project I need to parse a bunch of arbitrary sentences into a dependency graph. After a searching a lot I got the solution that I can use Malt Parser for parsing text with its pre trained grammer.

I have downloaded pre-trained model (engmalt.linear-1.7.mco) from http://www.maltparser.org/mco/mco.html. BUt I don't know how to parse my sentences using this grammer file and malt parser (by the python interface for malt). I have downloaded latest version of malt parser (1.7.2) and moved it to '/usr/lib/'

import nltk; 
parser =nltk.parse.malt.MaltParser()
txt="This is a test sentence"
parser.train_from_file('/home/rohith/malt-1.7.2/engmalt.linear-1.7.mco')
parser.raw_parse(txt)

after executing the last line the following eror message is dispalyed

Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
parser.raw_parse(txt)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 88, in raw_parse
return self.parse(words, verbose)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 75, in parse
return self.tagged_parse(taggedwords, verbose)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 122, in tagged_parse
return DependencyGraph.load(output_file)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/dependencygraph.py", line 121, in load
return DependencyGraph(open(file).read())
IOError: [Errno 2] No such file or directory: '/tmp/malt_output.conll'

Please help me to parse that sentence using this malt parser.

Answers


A couple problems with your setup:

  • The input to train_from_file must be a file in CoNLL format, not a pre-trained model. For an mco file, you pass it to the MaltParser constructor using the mco and working_directory parameters.
  • The default java heap allocation is not large enough to load that particular mco file, so you'll have to tell java to use more heap space with the -Xmx parameter. Unfortunately this wasn't possible with the existing code so I just checked in a change to allow an additional constructor parameters for java args. See here.

So here's what you need to do:

First, get the latest NLTK revision:

git clone https://github.com/nltk/nltk.git

(NOTE: If you can't use the git version of NLTK, then you'll have to update the file malt.py manually or copy it from here to have your own version.)

Second, rename the jar file to malt.jar, which is what NLTK expects:

cd /usr/lib/
ln -s maltparser-1.7.2.jar malt.jar

Then add an environment variable pointing to malt parser:

export MALTPARSERHOME="/Users/dhg/Downloads/maltparser-1.7.2"

Finally, load and use malt parser in python:

>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/home/rohith/malt-1.7.2", 
...                                     mco="engmalt.linear-1.7", 
...                                     additional_java_args=['-Xmx512m'])
>>> txt = "This is a test sentence"
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
'(This (sentence is a test))'

Need Your Help

ABCPDF: Split PDF files into single page PDF files

c# abcpdf

I am using ABCpdf tool and I am trying to split 1TB of PDF files (so efficiency is a concern) into single page PDF files.

DurandalJS URLs are not getting reset if they are invalid

javascript asp.net-web-api single-page-application durandal

If you load up the Durandal SPA template and try to navigate to a bogus URL (by adding something like "assdf" to the URL), the url is maintained and no error is provided. How can this be changed s...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.