In Ruby, how can I sort an array such that its items (also arrays) are arranged by their length size, but not just simply sorted by ascending/descending in length.
I'd like to make the array items distributed evenly so that there are some items that contain large number of objects intermixed with smaller arrays.
For example, I have this array with array items that contain the number of objects shown in the comment
. I've broken them in chunks for clarity and calculated their total size (see motivation below).
[
# chunk 1, inner total length 5
[{...}], # 2
[{...}], # 1
[{...}], # 1
[{...}], # 1
# chunk 2, inner total length 11
[{...}], # 2
[{...}], # 2
[{...}], # 3
[{...}], # 4
# chunk 3, inner total length 9
[{...}], # 3
[{...}], # 3
[{...}], # 1
[{...}], # 2
# chunk 4, inner total length 15
[{...}], # 4
[{...}], # 3
[{...}], # 4
[{...}], # 4
]
I'd like to arrange the array so that it looks more like the below. Note: that this example has them ordered smallest to largest (1..4), but that is not necessary. I'd just like to have them chunked so that the inner array cumulative length are comparable.
[
# chunk 1, inner total length 10
[{...}], # 1
[{...}], # 2
[{...}], # 3
[{...}], # 4
# chunk 2, inner total length 10
[{...}], # 1
[{...}], # 2
[{...}], # 3
[{...}], # 4
# chunk 3, inner total length 10
[{...}], # 1
[{...}], # 2
[{...}], # 3
[{...}], # 4
# chunk 4, inner total length 10
[{...}], # 1
[{...}], # 2
[{...}], # 3
[{...}], # 4
]
My motivation for this is to slice up the outer array so I can process the inner arrays in parallel. I don't want one of the parallel processes to get a slice of small chunks, and another process get a slice of really large chunks.
Note: I know that I'll have 4 parallel processes so that may help inform how to arrange the chunks in the array. Thanks!
The algorithm I would use to get a roughly even distribution of size, per my comment on OP:
unchunked_data = [
[{...}],
[{...}],
[{...}],
[{...}],
[{...}],
[{...}],
[{...}],
[{...}]
]
sorted_data = unchunked_data.sort_by(&:size)
grouped_data = sorted_data.each_with_index.group_by { |_, index| index % 4 }
grouped_data.each do |process_index, data|
# each_with_index would put data in an array with its index in sorted_data. Calling map(&:first) removes that index.
data_without_index = data.map(&:first)
send_data_to_process(process_index, data_without_index)
end
If the data is as it appears in OP's example, this results in a perfect distribution.
Per discussion in the comments, you can get back all the data in single array, as formatted in the original but grouped with this method, by doing:
grouped_data.values.flatten(1)
Thanks @Glyoko, I'm leaning towards this solution. One thing is I don't want my array necessarily grouped (or nested in a third array), can you return it back to the form of your
unchunked_data
( one array of arrays)? Also, I'm inclined to usegrouped_data.map
instead of thegrouped_data.each
, as i'm going to process it outside of that block that is stripping those pesky index numbers (per your each_with_index comment).None of this code changes
unchunked_data
in place, so you can still manipulate it after the fact with it in the same order it start as.As for
grouped_data
, I suppose you can use map, but be aware thatgrouped_data
is hash not an array. It's keys would be theprocess_index
s, (0, 1, 2, 3), and the values would be their respective sorted and grouped chunks. (Don't forget that the comment about usingmap(&:first)
on the values would still apply for this too.)Note that this method won't necessarily give a more even spread than my answer; it depends entirely on the input... For example, suppose the original array sizes are:
[1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4]
. My answer will chunk these as:[[1, 1, 1, 2, 2], [2, 2, 3], [3, 3], [3, 4]]
- i.e. total sizes of[7, 7, 6, 7]
; whereas this answer will chunk them as:[[1, 2, 3], [1, 2, 3], [1, 2, 3], [2, 3, 4]]
- i.e. total sizes of [6, 6, 6, 9]`. I'm not certain which algorithm is most likely to give the best spread.@mfink in that case you can do something like
grouped_data.values.flatten(1)
. @TomLord That's correct. The problem is actually NP-complete, so both of our answers are just "best guesses". One may be better than the other depending on the data.