Improve algorithm for finding URLs in a body of text - obj-c

I'm trying to come up with an algorithm to find URLs in a body of text. I currently have the following code (this was my sit down and hack it out code, and I know there has to be a better way):

    statusText.text = @"http://google.com http://www.apple.com www.joshholat.com";

NSMutableArray *urlLocations = [[NSMutableArray alloc] init];

NSRange currentLocation = NSMakeRange(0, statusText.text.length);
for (int x = 0; x < statusText.text.length; x++) {
    currentLocation = [[statusText.text substringFromIndex:(x + currentLocation.location)] rangeOfString:@"http://"];
    if (currentLocation.location > statusText.text.length) break;
    [urlLocations addObject:[NSNumber numberWithInt:(currentLocation.location + x)]];
}
currentLocation = NSMakeRange(0, statusText.text.length);
for (int x = 0; x < statusText.text.length; x++) {
    currentLocation = [[statusText.text substringFromIndex:(x + currentLocation.location)] rangeOfString:@"http://www."];
    if (currentLocation.location > statusText.text.length) break;
    [urlLocations addObject:[NSNumber numberWithInt:(currentLocation.location + x)]];
}
currentLocation = NSMakeRange(0, statusText.text.length);
for (int x = 0; x < statusText.text.length; x++) {
    currentLocation = [[statusText.text substringFromIndex:(x + currentLocation.location)] rangeOfString:@" www." options:NSLiteralSearch];
    if (currentLocation.location > statusText.text.length) break;
    [urlLocations addObject:[NSNumber numberWithInt:(currentLocation.location + 1 + x)]];
}

//Get rid of any duplicate locations
NSSet *uniqueElements = [NSSet setWithArray:urlLocations];
[urlLocations release];
NSArray *finalURLLocations = [[NSArray alloc] init];
finalURLLocations = [uniqueElements allObjects];

//Parse out the URLs of each of the locations
for (int x = 0; x < [finalURLLocations count]; x++) {
    NSRange temp = [[statusText.text substringFromIndex:[[finalURLLocations objectAtIndex:x] intValue]] rangeOfString:@" "];
    int length = temp.location + [[finalURLLocations objectAtIndex:x] intValue];
    if (temp.location > statusText.text.length) length = statusText.text.length;
    length = length - [[finalURLLocations objectAtIndex:x] intValue];
    NSLog(@"URL: %@", [statusText.text substringWithRange:NSMakeRange([[finalURLLocations objectAtIndex:x] intValue], length)]);
}

I feel like it could be improved via the usage of regular expressions or something. Any help in improving this would be greatly appreciated.

Answers


If you target iOS 4.0+, you should let Apple do the work for you and use the built-in data detectors. Create an instance of NSDataDetector with the NSTextCheckingTypeLink option and run it over your string. The documentation for NSDataDetector has some good examples on the usage of the class.

If you don't/can't use data detectors for any reason, John Gruber has posted a good regex pattern for detecting URLs a few months ago: http://daringfireball.net/2010/07/improved_regex_for_matching_urls


Just as a follow up, here's what my code has been changed to:

    statusText.text = @"http://google.com http://www.apple.com www.joshholat.com hey there google.com";

NSError *error = NULL;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:&error];

NSArray *matches = [detector matchesInString:statusText.text
                                     options:0
                                       range:NSMakeRange(0, statusText.text.length)];

for (NSTextCheckingResult *match in matches) {
    if ([match resultType] == NSTextCheckingTypeLink) {
        NSLog(@"URL: %@", [[match URL] absoluteURL]);
    }
}

Need Your Help

PrettyFaces error with required attribute

jsf-2 navigation prettyfaces

I'm developing a web application using JSF 2 and prettyfaces. I annotated one of my @ViewScoped beans with pretty annotations. That's what I have:

Link membase to sphinx

sphinx tap membase couchbase

I was wondering if anyone ever tried to link membase's (or should I say couchbase's now?) TAP output protocol to sphinx (the open source full text search engine).