Warm tip: This article is reproduced from serverfault.com, please click

How to iterate a collection of NSManagedObject items concurrently for boosting performance?

发布于 2020-11-29 11:37:12

Following is my use case:

I need to export a large core data store into some format (e.g., CSV, JSON) which requires fetching all objects of the main entity, and then iterating each object and serializing it to the desired format. Here is my code:

NSError *error = nil;
NSFetchRequest *request = [NSFetchRequest fetchRequestWithEntityName:@"MyEntity"];
NSArray<NSManagedObject *> *allItems = [managedObjectContext executeFetchRequest:request error:&error];
for (NSManagedObject *item in allItems) {
    [self exportItem:item];
}

Since the for-loop code is running synchronously and in a single thread, it might take a long time to complete. This is especially true when dealing with a large database containing thousands of records.

I wonder if there is a way to iterate the array concurrently, in a way that will take full advantage of the multiple cores available on the iOS device. This would probably boost the performance significantly.

I was thinking in the direction of using the following code to replace the for-loop code above:

[allItems enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(NSManagedObject* item) { 
    [self exportItem:item]; 
} 

However, this would obviously crash the app due to violating core data concurrency rules...

I wonder if there is any to target this use case.

Questioner
Joshua
Viewed
0
Tom Harrington 2020-12-05 07:58:24

You'll have to process them in batches, where each batch is fetched by a separate background context and where the exporting happens in that context's queue. Here's one way you might do that, for an entity named Event. The general approach is to fetch the object IDs for all objects you want to export, then split those into groups that can each be handled by a separate background context.

Since managed objects don't work across queues, start by getting the object IDs and breaking them up into batches. First get all object IDs.

NSManagedObjectContext *context = [self.fetchedResultsController managedObjectContext];
NSFetchRequest<Event *> *fetchRequest = Event.fetchRequest;
fetchRequest.resultType = NSManagedObjectIDResultType;
NSError *fetchError = NULL;

NSArray<NSManagedObjectID *> *allObjectIDs = [context executeFetchRequest:fetchRequest error:&fetchError];

Then loop through that array with subranges. For every batch, create a new background context. Use that context to fetch the managed objects for that batch of object IDs. Then handle exporting the managed objects.

NSInteger batchSize = 100;
NSRange currentRange = NSMakeRange(0, batchSize);
AppDelegate *appDelegate = (AppDelegate *) [[UIApplication sharedApplication] delegate];
NSPersistentContainer *persistentContainer = appDelegate.persistentContainer;

while (currentRange.location < allObjectIDs.count) {
    NSArray<NSManagedObjectID *> *batchObjectIDs = [allObjectIDs subarrayWithRange:currentRange];

    NSManagedObjectContext *batchContext = persistentContainer.newBackgroundContext;
    [batchContext performBlock:^{
        NSFetchRequest<Event *> *fetchRequest = Event.fetchRequest;
        fetchRequest.predicate = [NSPredicate predicateWithFormat:@"self in %@", batchObjectIDs];
        NSError *fetchError = NULL;
        NSArray <Event *> *batchEvents = [batchContext executeFetchRequest:fetchRequest error:&fetchError];


        // Put your export code here, for the objects that were just fetched.


    }];
    
    
    currentRange.location += batchSize;
}

You should to experiment with the batch size to see what works best for you.

This gets trickier though, because your export code might be running on multiple queues at the same time, and you need to make sure that your export file doesn't end up as a corrupt mess. One way to deal with that is using NSFileCoordinator to make sure that only one queue is allowed to write at a time. Create the coordinator before the loop above:

NSFileCoordinator *coordinator = [[NSFileCoordinator alloc] init];

Then where the code above says to put your export code, do something like this:

        [coordinator coordinateWritingItemAtURL:[self exportFileURL] options:0 error:&coordinatorError byAccessor:^(NSURL * _Nonnull newURL) {
            NSFileHandle *exportHandle = [self createExportFileHandle];
            for (Event *event in batchEvents) {
                NSData *exportData = [[event exportString] dataUsingEncoding:NSUTF8StringEncoding];
                NSError *writeError = NULL;
                [exportHandle writeData:exportData error:&writeError];
                if (writeError != NULL) {
                    NSLog(@"Write error: %@", writeError);
                }
            }
        }];

That code assumes that you have a method called exportFileURL that returns an NSURL of the place you want to export the data. It also assumes that your managed object has a method called exportString that returns whatever string you want to export for an object. The createExportFileHandle method uses exportFileURL and-- this is important-- seeks to the end of the file before writing. Something like

- (NSFileHandle *)createExportFileHandle {
    NSError *error = NULL;
    if (![[NSFileManager defaultManager] fileExistsAtPath:[[self exportFileURL] path]]) {
        [[NSFileManager defaultManager] createFileAtPath:[[self exportFileURL] path] contents:nil attributes:nil];
    }
    NSFileHandle *handle = [NSFileHandle fileHandleForWritingToURL:self.exportFileURL error:&error];
    [handle seekToEndOfFile];
    return handle;
}

You need to create the handle inside the file coordinator block, since the end of file location keeps changing and you want to get the current one before you start writing data.

The need to coordinate file access might limit how much of a speedup you get from this. That could probably be improved. For example, rework the code so that the calls to exportString are outside of the coordinator block. Collect them all into one big string for the batch, and coordinate writing just that string. Be careful the batch's string doesn't get too huge though, since it'll be in memory.

Note that this doesn't attempt to put the export file in any specific order. All objects get exported, but the order is unpredictable. Since you didn't use a sort descriptor in your question, I'm guessing it doesn't matter. If it does, the asynchronous processing means you'll have some more work to do.