Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making Column inputs from a Python DF numpy arrays instead of pandas.Series #3

Open
bbassett-tibco opened this issue Feb 12, 2021 · 0 comments

Comments

@bbassett-tibco
Copy link
Collaborator

https://stackoverflow.com/questions/66148445/why-isnt-my-new-column-being-named-correctly-when-using-a-python-data-function

Customer observed that column inputs retain their column name from the original input even after transformation.

This differs from how the equivalent TERR data function works. It has to do with how we internally handle a Column input in Python vs TERR. In both Python and TERR we pass inputs (and outputs) over as a table. In TERR's case a data.frame and in Python's case a pandas.DataFrame. In TERR's case though, if the Data Function says the input is a Column we actually convert it from a 1-column data.frame to a vector of the equivalent type, similarly for a Value we convert it from it's 1x1 data.frame to a scalar type. In Python, for Value inputs we do that as well, but for Column inputs we leave it as a pandas.Series which retains the column name from the original input column.

We could probably do something different there. We wouldn't want to convert it to a standard Python list (because in that case, x2*2 would actually make the column twice as long, rather than a vectorized arithmetic operation). But I suppose we could make it a straight numpy array instead (the equivalent of adding x2 = x2.to_numpy() at the start of the user's example). I'm not sure if that would make an unnecessary copy of the data though. It doesn't look like it does (it's just a reference to the underlying data), so that might be a better approach overall, and might be more what customers are expecting.

REPRO:
Create a data function like:

output = input * 2

Define both Output and Input as numeric Columns, and hook input up to a column from a Spotfire data table.
Set output to go to a new data table
Expected: column in new Spotfire table is named "output"
Actual: column in new Spotfire table is named the same as the original input column

Issue migrated from TIBCO Software JIRA [PYSRV-260] created by jorobert
@bbassett-tibco bbassett-tibco removed this from the Product Backlog milestone Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant