feat: add DataFrame.eval, DataFrame.query #361

TrevorBergeron · 2024-01-30T02:38:37Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

bigframes/eval.py

tswast · 2024-01-30T16:02:28Z

bigframes/eval.py

+from typing import Optional
+
+import pandas
+import pandas.core.computation.parsing as pandas_eval_parsing


This is a private method. We'll need to be very careful with this if we want to go with this implementation.

Please instead pull this function into third_party so we can avoid surprises.

Done. Was even considering vendoring the entire eval implementation. That way, I could get rid of the series.values.dtype calls and also be more resilient to any pandas changes.

Was even considering vendoring the entire eval implementation.

Let's do that. Even though eval is public, it does spook me to pass in anything but pandas objects into it.

Have now vendored the pandas eval implementation

bigframes/eval.py

bigframes/core/eval.py

tswast · 2024-01-30T21:22:55Z

bigframes/eval.py

+from typing import Optional
+
+import pandas
+import pandas.core.computation.parsing as pandas_eval_parsing


Was even considering vendoring the entire eval implementation.

Let's do that. Even though eval is public, it does spook me to pass in anything but pandas objects into it.

tswast

Awesome! A few questions to resolve before we merge.

tswast · 2024-03-22T19:48:06Z

bigframes/dataframe.py

+    def eval(self, expr: str) -> DataFrame:
+        import bigframes.core.eval as bf_eval
+
+        return bf_eval.eval(self, expr, target=self)


Shouldn't we only set target if inplace=True?

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eval.html

eval uses .copy() on the target if inplace=False

tswast · 2024-03-22T19:55:23Z

tests/system/small/test_dataframe.py

+)
+def test_df_query(scalars_dfs, expr):
+    # local_var is referenced in expressions
+    local_var = 3  # NOQA


Wow! Didn't realize it went as far as snatching up locals to pass through.

Yeah, its kind of scary, you tell it how many stack frames to look up and it will bring all those variables in scope for the evaluation.

tswast · 2024-03-22T19:58:45Z

third_party/bigframes_vendored/pandas/core/common.py

+    """
+    for element in line:
+        if iterable_not_string(element):
+            yield from flatten(element)


TIL: yield from https://stackoverflow.com/a/26109157/101923 I always just iterated and yielded.

Edit: Added in Python 3.3. https://docs.python.org/3/whatsnew/3.3.html#pep-380 I had to be compatible with Python 2.x for far too long, so I missed a lot of these features.

tswast · 2024-03-22T20:08:41Z

third_party/bigframes_vendored/pandas/core/computation/engines.py

+
+    if overlap:
+        s = ", ".join([repr(x) for x in overlap])
+        raise NumExprClobberingError(


I imagine we don't actually use the NumExpr package (https://github.com/pydata/numexpr) right? Maybe we can delete this function?

I wonder how much the "python" engine has changed over the years? If it's been pretty stable, then I think there's probably more benefit in cutting the vendored code to just the "python" engine code. If it's pretty actively changing, then keeping this file as close to as in sync with upstream makes more sense.

Seems the python engine itself is pretty stable. Went ahead and removed remaining NumExpr stuff

tswast · 2024-03-22T20:09:51Z

third_party/bigframes_vendored/pandas/core/computation/eval.py

+    str
+        Engine name.
+    """
+    from pandas.core.computation.check import NUMEXPR_INSTALLED


Ditto, I doubt we actually want to use numexpr, right?

tswast · 2024-03-22T20:26:50Z

third_party/bigframes_vendored/pandas/core/computation/ops.py

+    use Python.
+    """
+    try:
+        return x.isin(y)


Aside: if y is a Series of an ARRAY column, would this do an element-wise "is in" check?

I don't think so

tswast · 2024-03-22T20:29:17Z

third_party/bigframes_vendored/pandas/core/computation/ops.py

+        return pprint_thing(f"{self.op}({self.operand})")
+
+    @property
+    def return_type(self) -> np.dtype:


Funny to see this pattern here too. We're gonna have so many expression trees, lol.

tswast · 2024-03-22T20:30:24Z

third_party/bigframes_vendored/pandas/core/computation/parsing.py

+BACKTICK_QUOTED_STRING = 100
+
+
+def create_valid_python_identifier(name: str) -> str:


Aside: I wonder if we should use this function when creating column names? Could make them more interpretable.

Could borrow this approach with sql special characters

tswast · 2024-03-22T20:31:54Z

third_party/bigframes_vendored/pandas/core/frame.py

+
+        Operates on columns only, not specific rows or elements.  This allows
+        `eval` to run arbitrary code, which can make you vulnerable to code
+        injection if you pass user input to this function.


tswast · 2024-03-22T20:32:02Z

third_party/bigframes_vendored/pandas/core/frame.py

+
+    def query(self, expr: str) -> DataFrame | None:
+        """
+        Query the columns of a DataFrame with a boolean expression.


* feat: add DataFrame.eval, DataFrame.query * address pr comments * add docstring, disable new tests for legacy pandas * vendor the pandas eval implementation * amend eval docstring * fix doctest expectation * amend doctest * pr comments * Fix doctest for eval

feat: add DataFrame.eval, DataFrame.query

fb7eb1d

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jan 30, 2024

TrevorBergeron marked this pull request as ready for review January 30, 2024 02:46

TrevorBergeron requested review from a team as code owners January 30, 2024 02:46

TrevorBergeron requested a review from shobsi January 30, 2024 02:46

tswast requested changes Jan 30, 2024

View reviewed changes

address pr comments

7e5d266

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jan 30, 2024

Merge branch 'main' into eval2

6069ced

tswast reviewed Jan 30, 2024

View reviewed changes

TrevorBergeron added 4 commits March 6, 2024 20:50

Merge remote-tracking branch 'github/main' into eval2

78dad7c

add docstring, disable new tests for legacy pandas

4d85e0e

Merge remote-tracking branch 'github/main' into eval2

8d708a9

vendor the pandas eval implementation

2b0d902

product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Mar 21, 2024

TrevorBergeron requested a review from tswast March 21, 2024 20:11

TrevorBergeron added 7 commits March 21, 2024 21:20

Merge remote-tracking branch 'github/main' into eval2

8cec454

Merge remote-tracking branch 'github/main' into eval2

7b3a4ca

amend eval docstring

fc4b26a

fix doctest expectation

59888b4

Merge remote-tracking branch 'github/main' into eval2

79fe94f

Merge remote-tracking branch 'github/main' into eval2

ca5b670

amend doctest

838ff14

tswast reviewed Mar 22, 2024

View reviewed changes

TrevorBergeron added 2 commits March 22, 2024 23:32

Merge remote-tracking branch 'github/main' into eval2

b9de8f9

pr comments

b143097

TrevorBergeron added 3 commits March 23, 2024 00:34

Merge remote-tracking branch 'github/main' into eval2

89f6abb

Fix doctest for eval

6887fc7

Merge remote-tracking branch 'github/main' into eval2

29f0fd2

TrevorBergeron requested a review from tswast March 23, 2024 20:48

Merge branch 'main' into eval2

f8dcce1

tswast approved these changes Mar 26, 2024

View reviewed changes

tswast merged commit 5e28ebd into main Mar 26, 2024
16 checks passed

tswast deleted the eval2 branch March 26, 2024 19:06

release-please bot mentioned this pull request Mar 26, 2024

chore(main): release 1.1.0 #509

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DataFrame.eval, DataFrame.query #361

feat: add DataFrame.eval, DataFrame.query #361

TrevorBergeron commented Jan 30, 2024

tswast Jan 30, 2024

TrevorBergeron Jan 30, 2024

tswast Jan 30, 2024

TrevorBergeron Mar 21, 2024

tswast Jan 30, 2024

tswast left a comment

tswast Mar 22, 2024

TrevorBergeron Mar 22, 2024

tswast Mar 22, 2024

TrevorBergeron Mar 22, 2024

tswast Mar 22, 2024

tswast Mar 22, 2024

tswast Mar 22, 2024

TrevorBergeron Mar 23, 2024

tswast Mar 22, 2024

TrevorBergeron Mar 23, 2024

tswast Mar 22, 2024

TrevorBergeron Mar 23, 2024

tswast Mar 22, 2024

tswast Mar 22, 2024

TrevorBergeron Mar 22, 2024

tswast Mar 22, 2024

TrevorBergeron Mar 26, 2024

tswast Mar 22, 2024

TrevorBergeron Mar 26, 2024

		BACKTICK_QUOTED_STRING = 100


		def create_valid_python_identifier(name: str) -> str:

feat: add DataFrame.eval, DataFrame.query #361

feat: add DataFrame.eval, DataFrame.query #361

Conversation

TrevorBergeron commented Jan 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tswast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment