在Ruby中,如何对数组进行排序,以使其项(以及数组)按其长度大小排列,而不仅仅是按长度的升序/降序排序。
我想使数组项均匀分布,以便有些项包含大量与较小数组混合的对象。
例如,我有一个包含数组项的数组,该数组项包含中显示的对象数comment
。为了清楚起见,我将它们分成几部分,并计算了它们的总大小(请参见下面的说明)。
[
# chunk 1, inner total length 5
[{...}], # 2
[{...}], # 1
[{...}], # 1
[{...}], # 1
# chunk 2, inner total length 11
[{...}], # 2
[{...}], # 2
[{...}], # 3
[{...}], # 4
# chunk 3, inner total length 9
[{...}], # 3
[{...}], # 3
[{...}], # 1
[{...}], # 2
# chunk 4, inner total length 15
[{...}], # 4
[{...}], # 3
[{...}], # 4
[{...}], # 4
]
I'd like to arrange the array so that it looks more like the below. Note: that this example has them ordered smallest to largest (1..4), but that is not necessary. I'd just like to have them chunked so that the inner array cumulative length are comparable.
[
# chunk 1, inner total length 10
[{...}], # 1
[{...}], # 2
[{...}], # 3
[{...}], # 4
# chunk 2, inner total length 10
[{...}], # 1
[{...}], # 2
[{...}], # 3
[{...}], # 4
# chunk 3, inner total length 10
[{...}], # 1
[{...}], # 2
[{...}], # 3
[{...}], # 4
# chunk 4, inner total length 10
[{...}], # 1
[{...}], # 2
[{...}], # 3
[{...}], # 4
]
My motivation for this is to slice up the outer array so I can process the inner arrays in parallel. I don't want one of the parallel processes to get a slice of small chunks, and another process get a slice of really large chunks.
Note: I know that I'll have 4 parallel processes so that may help inform how to arrange the chunks in the array. Thanks!
The algorithm I would use to get a roughly even distribution of size, per my comment on OP:
unchunked_data = [
[{...}],
[{...}],
[{...}],
[{...}],
[{...}],
[{...}],
[{...}],
[{...}]
]
sorted_data = unchunked_data.sort_by(&:size)
grouped_data = sorted_data.each_with_index.group_by { |_, index| index % 4 }
grouped_data.each do |process_index, data|
# each_with_index would put data in an array with its index in sorted_data. Calling map(&:first) removes that index.
data_without_index = data.map(&:first)
send_data_to_process(process_index, data_without_index)
end
如果数据如在OP的示例中显示的那样,则可以得到理想的分布。
通过注释中的讨论,您可以通过执行以下操作来获取单个数组中的所有数据,这些数据按原始格式设置但已与此方法组合在一起:
grouped_data.values.flatten(1)
感谢@Glyoko,我倾向于此解决方案。一件事是我不希望我的数组必须分组(或嵌套在第三个数组中),可以将其返回为您的形式
unchunked_data
(一个数组)吗?另外,我倾向于使用grouped_data.map
而不是grouped_data.each
,因为我将在除去那些讨厌的索引号的块之外(根据您的each_with_index注释)对其进行处理。这些代码均未更改
unchunked_data
,因此事实发生后,您仍然可以按照开始时的顺序对其进行操作。至于
grouped_data
,我想你可以使用map,但是要知道那grouped_data
是哈希而不是数组。它的键将是process_index
s,(0、1、2、3),而值将是它们各自的已排序和分组的块。(不要忘记,关于使用map(&:first)
这些值的注释仍然适用于此。)注意,这种方法不一定比我的答案更均匀。它完全取决于输入...例如,假设原始数组大小为:
[1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4]
。我的答案将这些块为:[[1, 1, 1, 2, 2], [2, 2, 3], [3, 3], [3, 4]]
-即,总大小为[7, 7, 6, 7]
; 而此答案会将它们分块为:[[1, 2, 3], [1, 2, 3], [1, 2, 3], [2, 3, 4]]
-即[6,6,6,9]`的总大小。我不确定哪种算法最有可能提供最佳传播。在这种情况下,@ mfink可以执行类似的操作
grouped_data.values.flatten(1)
。@TomLord是的。问题实际上是NP完全的,因此我们的答案都是“最佳猜测”。根据数据,一个可能比另一个更好。