Sorting Ruby array of array items by length evenly

Glyoko 2019-07-04 01:40

The algorithm I would use to get a roughly even distribution of size, per my comment on OP:

unchunked_data = [
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}]
]

sorted_data = unchunked_data.sort_by(&:size)
grouped_data = sorted_data.each_with_index.group_by { |_, index| index % 4 }

grouped_data.each do |process_index, data|
  # each_with_index would put data in an array with its index in sorted_data. Calling map(&:first) removes that index.
  data_without_index = data.map(&:first)
  send_data_to_process(process_index, data_without_index)
end

If the data is as it appears in OP's example, this results in a perfect distribution.

Per discussion in the comments, you can get back all the data in single array, as formatted in the original but grouped with this method, by doing:

grouped_data.values.flatten(1)

mfink 2019-07-04 00:56:35

Thanks @Glyoko, I'm leaning towards this solution. One thing is I don't want my array necessarily grouped (or nested in a third array), can you return it back to the form of your unchunked_data ( one array of arrays)? Also, I'm inclined to use grouped_data.map instead of the grouped_data.each, as i'm going to process it outside of that block that is stripping those pesky index numbers (per your each_with_index comment).

Glyoko 2019-07-04 00:59:45

None of this code changes unchunked_data in place, so you can still manipulate it after the fact with it in the same order it start as.

Glyoko 2019-07-04 01:03:18

As for grouped_data, I suppose you can use map, but be aware that grouped_data is hash not an array. It's keys would be the process_indexs, (0, 1, 2, 3), and the values would be their respective sorted and grouped chunks. (Don't forget that the comment about using map(&:first) on the values would still apply for this too.)

Tom Lord 2019-07-04 01:24:56

Note that this method won't necessarily give a more even spread than my answer; it depends entirely on the input... For example, suppose the original array sizes are: [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4]. My answer will chunk these as: [[1, 1, 1, 2, 2], [2, 2, 3], [3, 3], [3, 4]] - i.e. total sizes of [7, 7, 6, 7]; whereas this answer will chunk them as: [[1, 2, 3], [1, 2, 3], [1, 2, 3], [2, 3, 4]] - i.e. total sizes of [6, 6, 6, 9]`. I'm not certain which algorithm is most likely to give the best spread.

Glyoko 2019-07-04 01:35:55

@mfink in that case you can do something like grouped_data.values.flatten(1). @TomLord That's correct. The problem is actually NP-complete, so both of our answers are just "best guesses". One may be better than the other depending on the data.

Related issues

match query malformed, no start_object after query name" Elasticsearch 7.1

Elasticsearch mapping with dynamic index_name

Unable to use "pod install" in MacOS 11.0

How to optimize mapping hash that contains similar keys and values?

Access params from URL

How to track custom events in paper_trail?

ruby not equal operator doesn't work but equal does

"ld: library not found for -lSystem" when installing homebrew ruby on Big Sur

How to copy multiple lines of code into byebug?

Ruby FastJsonAPI dynamic set_type?