Remove character starting with x and ending with x and everything between it in a string

The problem: I want to remove a specific code within a question. The code changes position from question to question so I cannot rely on the position of the code to remove it.

Here is what it looks like:

Now thinking specifically about the home improvement brand _Everest<.br/>On a scale of 0 to 10, where 0 is "Not at all familiar/ knowledgeable" and 10 is "Very familiar/ knowledgeable", how familiar / knowledgeable do you consider yourself to be with..."

The code - <.br> - is always attached to the word before and after.

Solution: I would like to know how, if there is a function, to delete/remove a set of characters which start with x and end with x and removes everything between it.

I hope this makes sense.

Answers


import re

def remove_between_anchors(text, anchor):
    return re.sub(r'{0}.+?{0}'.format(anchor), '', text)

remove_between_anchors('123aa456aa789', 'aa') # returns '123789'

EDIT: if the start/end anchors are different:

def remove_between_anchors(text, start, end):
    return re.sub(r'{0}.+?{1}'.format(start, end), '', text)

remove_between_anchors('123<abc>456', '<', '>') # returns '123456'

Need Your Help

C# Application performance deterioration due to garbage collection?

c# performance garbage-collection

My application's performance deteriorate as it continues to run through the day.

uint8_t Array - Data inside memory

c gcc memory-management

I have a question to a behavior I detect with the gdb.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.