Skip to content

feat: bigframes.bigquery.array_agg(SeriesGroupBy|DataFrameGroupby) #663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 16, 2024

Conversation

chelsea-lin
Copy link
Contributor

@chelsea-lin chelsea-lin commented May 6, 2024

This change introduces the bigframes.bigquery.array_agg method for SeriesGroupBy and DataFrameGroupby. By default, aggregated arrays are ordered by the underlying sorting columns. Additionally, array_agg is the inverse operation of (Series|Dataframe).explode().

Fixes internal bug: 338232748🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels May 6, 2024
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_arrayagg branch from 6e92dd6 to c30464f Compare May 8, 2024 16:57
@chelsea-lin chelsea-lin marked this pull request as ready for review May 8, 2024 17:03
@chelsea-lin chelsea-lin requested review from a team as code owners May 8, 2024 17:03
@chelsea-lin chelsea-lin requested a review from shobsi May 8, 2024 17:03
@chelsea-lin chelsea-lin requested review from TrevorBergeron and removed request for shobsi May 8, 2024 17:03
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_arrayagg branch from f703d45 to 333d0ac Compare May 8, 2024 17:37
@chelsea-lin chelsea-lin requested a review from tswast May 8, 2024 22:12
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_arrayagg branch from 1c44cef to e2b0854 Compare May 8, 2024 23:28
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_arrayagg branch from 824c27b to c5cc12e Compare May 10, 2024 19:59
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_arrayagg branch from c5cc12e to 3e071ad Compare May 14, 2024 18:36
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit regarding the helper function naming, but otherwise looks good!

@@ -163,6 +169,54 @@ def get_column_type(self, key: str) -> bigframes.dtypes.Dtype:
bigframes.dtypes.ibis_dtype_to_bigframes_dtype(ibis_type),
)

def _aggregate_helper(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get a better name for this method, please. "helper" is very generic, so it's hard to understand what this method is doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it into _aggergate_base to match the object name BaseIbisIR.

@chelsea-lin chelsea-lin force-pushed the main_chelsealin_arrayagg branch from a66d01b to 55c9d4f Compare May 14, 2024 22:51
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_arrayagg branch from 0115c73 to ea3614e Compare May 16, 2024 01:25
@chelsea-lin chelsea-lin added the automerge Merge the pull request once unit tests and other checks pass. label May 16, 2024
Copy link

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label May 16, 2024
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_arrayagg branch from ea3614e to 5596095 Compare May 16, 2024 17:35
@chelsea-lin
Copy link
Contributor Author

The end-to-end tests that failed are not caused by this particular change.

@chelsea-lin chelsea-lin merged commit 412f28b into main May 16, 2024
20 of 21 checks passed
@chelsea-lin chelsea-lin deleted the main_chelsealin_arrayagg branch May 16, 2024 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants