温馨提示:本文翻译自stackoverflow.com,查看原文请点击:其他 - Sorting Ruby array of array items by length evenly
arrays parallel-processing ruby sorting

其他 - 按长度对Ruby数组项的数组进行均匀排序

发布于 2020-03-27 11:16:13

在Ruby中,如何对数组进行排序,以使其项(以及数组)按其长度大小排列,而不仅仅是按长度的升序/降序排序。

我想使数组项均匀分布,以便有些项包含大量与较小数组混合的对象。

例如,我有一个包含数组项的数组,该数组项包含中显示的对象数comment为了清楚起见,我将它们分成几部分,并计算了它们的总大小(请参见下面的说明)。

[
  # chunk 1, inner total length 5
  [{...}], # 2
  [{...}], # 1
  [{...}], # 1
  [{...}], # 1
  # chunk 2, inner total length 11
  [{...}], # 2
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 3, inner total length 9
  [{...}], # 3
  [{...}], # 3
  [{...}], # 1
  [{...}], # 2
  # chunk 4, inner total length 15
  [{...}], # 4
  [{...}], # 3
  [{...}], # 4
  [{...}], # 4
]

I'd like to arrange the array so that it looks more like the below. Note: that this example has them ordered smallest to largest (1..4), but that is not necessary. I'd just like to have them chunked so that the inner array cumulative length are comparable.

[
  # chunk 1, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 2, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 3, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 4, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
]

My motivation for this is to slice up the outer array so I can process the inner arrays in parallel. I don't want one of the parallel processes to get a slice of small chunks, and another process get a slice of really large chunks.

Note: I know that I'll have 4 parallel processes so that may help inform how to arrange the chunks in the array. Thanks!

查看更多

查看更多

提问者
mfink
被浏览
228
Glyoko 2019-07-04 01:40

The algorithm I would use to get a roughly even distribution of size, per my comment on OP:

unchunked_data = [
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}]
]

sorted_data = unchunked_data.sort_by(&:size)
grouped_data = sorted_data.each_with_index.group_by { |_, index| index % 4 }

grouped_data.each do |process_index, data|
  # each_with_index would put data in an array with its index in sorted_data. Calling map(&:first) removes that index.
  data_without_index = data.map(&:first)
  send_data_to_process(process_index, data_without_index)
end

如果数据如在OP的示例中显示的那样,则可以得到理想的分布。


通过注释中的讨论,您可以通过执行以下操作来获取单个数组中的所有数据,这些数据按原始格式设置但已与此方法组合在一起:

grouped_data.values.flatten(1)