Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Eliminate offset and length in BaseColumn #177

Open
scotts opened this issue Feb 4, 2022 · 0 comments
Open

Eliminate offset and length in BaseColumn #177

scotts opened this issue Feb 4, 2022 · 0 comments

Comments

@scotts
Copy link
Contributor

scotts commented Feb 4, 2022

We should remove the _offset and _length in BaseColumn:
https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/column.h#L223-L224
There are multiple places where we do not properly track this, such as in expression evaluation:
https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/column.cpp#L236-L238
We should be able to not track these in the BaseColumn anymore without losing any functionality.

We also may want to support UDF evaluation for different offsets, such as:

a = ta.Column([1, 2, 3])
b = ta.Column([10, 20, 30])

a[:2] + b[2:]

Slicing the vector with the BufferView might be the right solution.

cc: @wenleix

@scotts scotts added good first issue Good for newcomers and removed good first issue Good for newcomers labels Feb 4, 2022
@scotts scotts changed the title Eliminate offset and length in in BaseColumn Eliminate offset and length in BaseColumn Feb 11, 2022
YLGH pushed a commit to YLGH/torcharrow that referenced this issue May 7, 2022
Summary:
Pull Request resolved: pytorch/torchrec#177

Perf units are calculated with bandwidth being in terms if GB/sec. To accurately model perfs we want to put it in terms of bytes/ms

Reviewed By: dstaay-fb

Differential Revision: D34758134

fbshipit-source-id: 46a0560ec37972f92a60dd78024bb6b949e702dc
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant