Python's immutable strings and their slices

The strings in Python are immutable and do support the buffer interface. So it would be efficient to return not the new strings, but the parts of the old string when using slices or split() method. But, so far as I know, the new string object is constructed each time. Why is it so? The single reason I see is that it can make a garbage collection a little bit more difficult.

True: in regular sutuations the memory overhead is linear and isn't noticeably: copying is fast, so is, I believe, the allocation. But there is too much done in python to say just that it doesn't worth the effort!

EDIT:

It seems that using this way would make the memory management really more complicate. The case, where 1/5 of the arbitrary string is only used, and we can't deallocate the whole string, is a simle example. We can improve the memory allocalor, so it would be able to deallocate strings partially, but it would be probably mostly disprovement. All the standard functions can be anyway emulated with buffer or memoryview, if the memory usage is critically important. Yes, the code will not be still so concise, but we have to give up something in order to get something.

Answers


The underlying string representation is null-terminated, even though it keeps track of the length, hence you cannot have a string object that references a sub-string that isn't a suffix. This already limits the usefulness of your proposal since it would add a lot of complications to deal differently with suffices and non-suffices (and giving up with null-terminating strings brings other consequences).

Allowing to refer to sub-strings of a string means to complicate a lot garbage collection and string handling. For every string you'd have to keep track how many objects refer to each character, or to each range of indices. This means complicating a lot the struct of string objects and any operation that deals with them, meaning a, probably big, slow down.

Add the fact that starting with python3 strings have 3 different internal representations, and things are going to be too messy to be maintainable, and your proposal probably doesn't give enough benefits to be accepted.


An other problem with this kind of "optimization" is when you want to deallocate "big strings":

a = "Some string" * 10 ** 7
b = a[10000]
del a

After this operations you have the substring b that prevents a, a huge string, to be deallocated. Surely you could do copies of small strings, but what if b = a[:10000](or another big number)? 10000 characters looks like a big string which ought to use the optimization to avoid copying, but it is preventing to realease megabytes of data. The garbage collector would have to keep checking whether it is worth to deallocate a big string object and make copies or not, and all these operations must be as fast as possible, otherwise you end up decreasing time-performances.

99% of the times the strings used in the programs are "small"(max 10k characters), hence copying is really fast, while the optimizations you propose start to become effective with really big strings(e.g. take substrings of size 100k from huge texts) and are much slower with really small strings, which is the common case, i.e. the case that should be optimized.


If you think important then you are free to propose a PEP, show an implementation and the resultant changes in speed/memory usage of your proposal. If it is really worth the effort it may be included in a future version of python.


That's how slices work. Slices always perform a shallow copy, allowing you to do things like

>>> x = [1,2,3]
>>> y = x[:]

Now it would be possible to make an exception for strings, but is it really worth it? Eric Lippert blogged about his decision not to do that for .NET; I guess his argument is valid for Python as well.

See also this question.


Need Your Help

How do you change the class of a TableViewController?

iphone ios xcode controller tableview

I have been trying to change the class of a TableViewController in the identity of it, but when I type in the name of the custom class that I want to change it to, it reverts back to

I cannot get One-to-one relationship and inheritance to work simultaneously in Hibernate

java hibernate jpa orm

I am new to Hibernate and learning all the different annotations can be frustrating at times. Currently, I am stuck on making Doctor extend Person and also have a 1-to-1 relationship between Doctor...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.