Warm tip: This article is reproduced from stackoverflow.com, please click
arrays parallel-processing ruby sorting

Sorting Ruby array of array items by length evenly

发布于 2020-03-27 10:23:38

In Ruby, how can I sort an array such that its items (also arrays) are arranged by their length size, but not just simply sorted by ascending/descending in length.

I'd like to make the array items distributed evenly so that there are some items that contain large number of objects intermixed with smaller arrays.

For example, I have this array with array items that contain the number of objects shown in the comment. I've broken them in chunks for clarity and calculated their total size (see motivation below).

[
  # chunk 1, inner total length 5
  [{...}], # 2
  [{...}], # 1
  [{...}], # 1
  [{...}], # 1
  # chunk 2, inner total length 11
  [{...}], # 2
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 3, inner total length 9
  [{...}], # 3
  [{...}], # 3
  [{...}], # 1
  [{...}], # 2
  # chunk 4, inner total length 15
  [{...}], # 4
  [{...}], # 3
  [{...}], # 4
  [{...}], # 4
]

I'd like to arrange the array so that it looks more like the below. Note: that this example has them ordered smallest to largest (1..4), but that is not necessary. I'd just like to have them chunked so that the inner array cumulative length are comparable.

[
  # chunk 1, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 2, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 3, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 4, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
]

My motivation for this is to slice up the outer array so I can process the inner arrays in parallel. I don't want one of the parallel processes to get a slice of small chunks, and another process get a slice of really large chunks.

Note: I know that I'll have 4 parallel processes so that may help inform how to arrange the chunks in the array. Thanks!

Questioner
mfink
Viewed
110
Glyoko 2019-07-04 01:40

The algorithm I would use to get a roughly even distribution of size, per my comment on OP:

unchunked_data = [
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}]
]

sorted_data = unchunked_data.sort_by(&:size)
grouped_data = sorted_data.each_with_index.group_by { |_, index| index % 4 }

grouped_data.each do |process_index, data|
  # each_with_index would put data in an array with its index in sorted_data. Calling map(&:first) removes that index.
  data_without_index = data.map(&:first)
  send_data_to_process(process_index, data_without_index)
end

If the data is as it appears in OP's example, this results in a perfect distribution.


Per discussion in the comments, you can get back all the data in single array, as formatted in the original but grouped with this method, by doing:

grouped_data.values.flatten(1)