cs193p – Assignment #6 Extra Task #2

cs193p – assignment #6 extra task #2

Please note, this blog entry is from a previous course. You might want to check out the current one.

Loading Flickr information into your database can be ridiculously inefficient if, for each photo you download, you query to see if it is already in the database, add it if it is not (and maybe update it if it is). Enhance your application to make this download much more efficient, preferably only doing two or three queries total in the database for each “batch” of Flickr photos you download (instead of hundreds, which is what Photomania does). The predicate operator IN might be of value here.

Let’s start by limiting the photo queries, but only if there any photos. If there are none, the request containing a predicate with an IN operator and nil will most likely crash … A mystical helper method (described a little bit later) generates an array of photo IDs used to query the database. If there are matches, create a new array of Flickr photo dictionaries containing only photos not yet in the database. Replace the current photos array buy the new – reduced – one:

+ (void)loadPhotosFromFlickrArray:(NSArray *)photos // of Flickr NSDictionary
         intoManagedObjectContext:(NSManagedObjectContext *)context
{
    if ([photos count]) {
        NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:@"Photo"];
        request.predicate = [NSPredicate predicateWithFormat:@"unique IN %@", [FlickrHelper IDsforPhotos:photos]];
        NSError *error;
        NSArray *matches = [context executeFetchRequest:request error:&error];
        if (!matches || ![matches count]) {
            // nothing to do ...
        } else {
            NSArray *existingPhotoIDs = [matches valueForKeyPath:@"unique"];
            NSMutableArray *newPhotos = [NSMutableArray arrayWithCapacity:[photos count] - [matches count]];
            for (NSDictionary *photo in photos) {
                if (![existingPhotoIDs containsObject:[FlickrHelper IDforPhoto:photo]]) {
                    [newPhotos addObject:photo];
                }
            }
            photos = newPhotos;
        }
    }
    ,,,
}

The helper method is not that tricky. We already used another helper IDforPhoto:. There we requested the value for a key path of a dictionary and got a string as a result. Now we query an array of dictionaries instead and get array of strings as result:

+ (NSArray *)IDsforPhotos:(NSArray *)photos
{
    return [photos valueForKeyPath:FLICKR_PHOTO_ID];
}

Because there are only photos left, not yet in the database, it should not be necessary any more to query for them separately. Nevertheless, what happens if two download processes are started at the same time, and both think that the same photos has not been stored, yet?

+ (Photo *)photoWithFlickrInfo:(NSDictionary *)photoDictionary
        inManagedObjectContext:(NSManagedObjectContext *)context
{
    ...
//    NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:@"Photo"];
//    request.predicate = [NSPredicate predicateWithFormat:@"unique = %@", unique];    
//    NSError *error;
//    NSArray *matches = [context executeFetchRequest:request error:&error];    
//    if (!matches || error || ([matches count] > 1)) {
//    } else if ([matches count]) {
//        photo = [matches firstObject];
//    } else {
        ...
//    }    
    ...
}

To limit photographer and region requests, create two mutable arrays to hold existing photographers and regions. They are mutable because we need to change them, e.g. when two new photos have the same new photographer or region. Processing the second photo we should accesses the newly generated objects from the first photo.

The requests work like the one for the photo above, using to new helper methods. Pass both arrays to the method handling the photo generation:

+ (void)loadPhotosFromFlickrArray:(NSArray *)photos // of Flickr NSDictionary
         intoManagedObjectContext:(NSManagedObjectContext *)context
{
    NSMutableArray *existingPhotographers = [NSMutableArray array];
    NSMutableArray *existingRegions = [NSMutableArray array];
    if ([photos count]) {
        ...        
        request = [NSFetchRequest fetchRequestWithEntityName:@"Photographer"];
        request.predicate = [NSPredicate predicateWithFormat:@"name IN %@", [FlickrHelper ownersOfPhotos:photos]];
        existingPhotographers = [[context executeFetchRequest:request error:&error] mutableCopy];        
        request = [NSFetchRequest fetchRequestWithEntityName:@"Region"];
        request.predicate = [NSPredicate predicateWithFormat:@"placeID IN %@", [FlickrHelper placeIDsforPhotos:photos]];
        existingRegions = [[context executeFetchRequest:request error:&error] mutableCopy];
    }    
    for (NSDictionary *photo in photos) {
        [self photoWithFlickrInfo:photo
           inManagedObjectContext:context
            existingPhotographers:existingPhotographers
                  existingRegions:existingRegions];
    }
    ...
}

The helper methods work like above with one tiny addition. The new photos could contain duplicates of photographers and regions. Thus we want only distinct results:

+ (NSArray *)placeIDsforPhotos:(NSArray *)photos
{
    return [photos valueForKeyPath:[NSString stringWithFormat:@"@distinctUnionOfObjects.%@", FLICKR_PHOTO_PLACE_ID]];
}

+ (NSArray *)ownersOfPhotos:(NSArray *)photos
{
    return [photos valueForKeyPath:[NSString stringWithFormat:@"@distinctUnionOfObjects.%@", FLICKR_PHOTO_OWNER]];
}

Adjust the interface of the photo-generation method to cope with the two new attributes, and pass them to the photographer and region generation methods:

+ (Photo *)photoWithFlickrInfo:(NSDictionary *)photoDictionary
        inManagedObjectContext:(NSManagedObjectContext *)context
         existingPhotographers:(NSMutableArray *)photographers
               existingRegions:(NSMutableArray *)regions
{
        ..        
        photo.photographer = [Photographer photographerWithName:[FlickrHelper ownerOfPhoto:photoDictionary]
                                         inManagedObjectContext:context
                                          existingPhotographers:photographers];        
        photo.region = [Region regionWithPlaceID:[FlickrHelper placeIDforPhoto:photoDictionary]
                                 andPhotographer:photo.photographer
                          inManagedObjectContext:context
                                 existingRegions:regions];
    ...
}

Add the new attribute to the interface of the photo-generation method. Instead of querying Core Data, filter the existing-photographers array to see if the current photographer is available. If we need to create a new one, add it to the existing-photographers array:

+ (Photographer *)photographerWithName:(NSString *)name
                inManagedObjectContext:(NSManagedObjectContext *)context
                 existingPhotographers:(NSMutableArray *)photographers
{
    ...    
    if ([name length]) {
        NSArray *matches = [photographers filteredArrayUsingPredicate:[NSPredicate predicateWithFormat:@"name = %@", name]];        
        if (!matches || ([matches count] > 1)) {
            ...
        } else if (![matches count]) {
            ...
            [photographers addObject:photographer];
        } else {
            ...
        }
    }
    ...
}

… and follow the same steps for the region-generation method:

+ (Region *)regionWithPlaceID:(NSString *)placeID
              andPhotographer:(Photographer *)photographer
       inManagedObjectContext:(NSManagedObjectContext *)context
              existingRegions:(NSMutableArray *)regions
{
    ...    
    if ([placeID length]) {
        NSArray *matches = [regions filteredArrayUsingPredicate:[NSPredicate predicateWithFormat:@"placeID = %@", placeID]];        
        if (!matches || ([matches count] > 1)) {
            ...
        } else if (![matches count]) {
            ...
            [regions addObject:region];
        } else {
            ...
        }
    }
    ...
}

This way we limited the number of Core-Data queries from a couple of hundreds to three. It might be a good idea to allow a couple more, to prevent database inconsistencies …

The complete code for the extra task #2 is available on github.

FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail

Leave a Reply

Your email address will not be published.