Warm tip: This article is reproduced from stackoverflow.com, please click
multithreading spring-batch

Spring batch multithreading using partitioning

发布于 2020-03-28 23:12:54

My problem statement is that- I have to pass multiple numbers of files to spring batch reader and reader runs in parellel.if we use grid-size = 100 then there will be 100 threads which is not logical. what is the way to solve this issue i.e. process many files with limited number of threads.

@Bean
    public Step orderStep1() throws IOException {
        return stepBuilderFactory.get("orderStep1")
                .partitioner("slaveStep", partitioner())
                .step(slaveStep())
                .gridSize(100)
                .taskExecutor(taskExecutor())
                 .build();
    }

Task executor will be
@Bean
    public TaskExecutor taskExecutor() {
        SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
        return taskExecutor;
    }

partitoner will be
public Map<String, ExecutionContext> partition(int gridSize) {

      Map<String, ExecutionContext> partitionData = new HashMap<String, ExecutionContext>();  
      for (int i = 0; i < gridSize; i++) {
            ExecutionContext executionContext = new ExecutionContext();
            executionContext.putString("file", fileList.get(i)); //passing filelist
            executionContext.putString("name", "Thread" + i);
           partitionData.put("partition: " + i, executionContext);
        }

        return partitionData;
    }

and passing files dynamically using stepExecutionContext

Questioner
priya
Viewed
69
Mahmoud Ben Hassine 2020-01-31 17:25

if we use grid-size = 100 then there will be 100 threads which is not logical

The grid size and thread pool size are two different things. You can have 100 partitions to process but only 10 worker threads available.

The issue in your case is that you are using the SimpleAsyncTaskExecutor which does not re-use threads (See its Javadoc). So for each partition, a new thread will be created and you end up seeing 100 threads for the 100 partitions.

what is the way to solve this issue i.e. process many files with limited number of threads.

Consider using a ThreadPoolTaskExecutor so you can limit the number of worker threads.