lxml memory problem

I'm trying to parse large XML files (>3GB) like this:

context = lxml.etree.iterparse(path)
for action,el in self.context:
    # do sth. with el

With iterparse I thought the data is not completely loaded into RAM, but according to this article I'm wrong: http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ (see Listing 4) Though when I apply this solution to my code, some elements are obviously cleared which have not been parsed so far (especially child-elements of el).

Is there any other solution to this memory problem?

Thanks in advance!

Answers


Don't forget to use clear(), optionally also clearing the root element, as explained here. But as I understand, you're already doing this, but apparently you are trying to access content that you have already cleared, or that is not yet parsed. It would be helpful if you could provide something more than "do sth. with el". Are you using getnext() or getprevious()? Xpath expressions?

Another option, if you really don't want to build a tree, is to use the target parser interface, which is like SAX for lxml/etree (but easier).


Need Your Help

Accessing Elmah.axd with SqlErrorLog in SharePoint without adding user to db

sharepoint-2007 log4net elmah

I have installed/configured Elmah on my personal SharePoint dev environment and everything works great since I'm logged in as admin, etc. I am using the MS Sql Server Error Log. (I am also using

How to convert NSData object to file ouput

iphone objective-c ios cocoa-touch

In my iPhone project,I need to convert NSData object to file.I have to convert NSData to file and save on project's resources folder and need to use it.How to save NSData as files and how to refere...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.