-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate column names coming back as array, instead of using the last column #1384
Comments
So the behaviour for duplicate columns being presented as arrays was introduced in #1240 (although this is only in v8 and no v7). This only changed the values for the values presented and didn't affect the metadata (which is what your quote regarding the documentation). At the moment, the behaviour is expected, so from that point of view this is not a valid bug. However, if the mis-match in returned metadata and the actual returned data structures, then that seems valid. I'll look into it a bit to see what is going on there. |
Oh, I may have mis read it then. If it's only dealing with the metadata instead of the actual values coming back, that would make sense. Would it be possible to introduce an option that returns just the last value for multiple columns names? Basically, using the last columns meta data as well as the last columns value, instead of just using the last columns metadata while grouping all the values into a collection? This would bring the
If we could get an option (not even defaulted to avoid breaking changes), then we could implement that option within TypeORM and bring the returned resultset in line with what we expect from other drivers. Thoughts? |
The change to return an array of values was brought in deliberately in v8 of the library. I'm not sure that going back would be a good idea, nor adding a feature flag for this. Whilst I appreciate other libraries appear to be handling this in a defacto-standard way, I'm not sure standardised data loss is a good idea and, instead, users should be writing better queries where these duplicates aren't there OR it's dealt with on the application layer. It should be fairly trivial to process the returned rows to extract the last value of the array if the value is an array. I suppose the complexity is determining the difference between intended arrays (parsed JSON data, though I'm not sure we parse it, that might be something the drivers do) vs constructed arrays from multiple values, which is why the metadata structure should better represent the recordset data-structure. |
Then there must be some confusion here. I've tested on both v7 and v8, and both return an array when multiple columns of the same name return. Thoughts? Please see below version as reported by
I disagree with the notion that we shouldn't do this because "data loss is bad" but also lose metadata associated with all but the last duplicate column anyway, according to the documentation...? It seems to me that it is intuitive to think that, as a js driver for a RDMS, the result would be an array of records, with objects only a single level deep (as this is how RDMS return data). I agree it is trivial to check for an array in some sort of post-processing step, and take the last value. I also understand that we could even use However, I still believe that the current default is unexpected behavior, even if it's technically accurate, and requires you to dig into
This way, it's documented when first becoming acquainted with the library and the user will then be able to determine form the get go what they would prefer. I hope you reconsider, but at the same time understand that you guys are the experts and know best. If the above proposal is out of scope, feel free to close the issue 🙂 Thanks! |
Firstly, it definitely looks like there's something not quite right because the returned metadata and the returned data don't match up quite right. At the moment you'll get the metadata from both columns but they aren't shown as arrays, so it's difficult to actually associate the metadata with the row data. That's something to look into. Secondly, regarding v7.3 + v8, maybe this was always acting this way, or it was brought in earlier. But that's a bit of a distraction - the point is it is currently intentional behaviour. To address the main crux of this issue: Should duplicate columns return as an array of values or just replace the last seen value? In my day-to-day use of databases, I don't encounter duplicate column names like this, so it's entirely an academic exercise for me. My unsympathetic response is "write better SQL", but I appreciate with automated query building this can be harder - however I am involved in automated DB building and SQL builders that are capable of not having duplicate column names. If duplicate names are unavoidable, (rhetorically) why is the last column the one you want? I can see from a naive implementation you'd do: const record = {};
dbquery.on('column', (name, value) => {
record[name] = value;
}); and you'd end up with the last column there... but if you were slightly more thorough and did: const record = {};
dbquery.on('column', (name, value) => {
if (!(name in record)) {
record[name] = value;
}
}); then you have different behaviour where the first value is retained and I would say that's sensible too. As a developer who knows they will have duplicate names in their queries, do I automatically want the first or the last value? I'm not sure it's obvious, that feels like a toss-up. What if I want both/all? What if I want the first not the last? What if I want the second (if there are more than two)? It just all feels like we have to make an opinionated choice where there's no obvious correct answer. We could add an option to always return the last item, but then what if someone wants an option to always return the first one, or another configuration? This just feels like a spiral into endless options for every edge-case. With the current implementation (an array of values) the consumer of the library can make this choice freely because they have all the values at their disposal and their DB interface can just pluck out the values they want/need fairly trivially.
It's not right to say the RDMS returns data this way, the RDMS returns all the columns, including the duplicated ones, which is how we are able to present them all... It just appears other libraries have made the decision to override the data on duplicate cols. I'll definitely think on this, as I appreciate if lots of other DB libs do it this way, it's not fun to be the odd one out. For my understanding, can you justify to me why it makes sense to return the last duplicate named value over the first? What circumstances can you suggest where this is just so more obvious than any other value (eg: typical queries, etc where this is the obvious choice)? |
and
I definitely see where you're coming from, it absolutely is an opinionated decision regardless of which direction is taken, and I appreciate the concern about getting into the potential madness of "option for everything. The only thing that I can offer as justification of taking the last column vs taking any other column is simply for uniformity with other packages and protocols, where this seems to be the default and expected behavior. And again, I'm not advocating for it to be defaulted, but rather an option that the user can opt into. If I was issuing direct commands, I can't think of a situation where I would want to return two of the same column other than it being a mistake. But I confess that my background in this issue is driven in part due to my use of a query builder that is built on top of TypeORM, where something like this is more likely to happen. Specifically, I'm referencing a querybuilder that takes an OData string and builds a query representing it using TypeORM. The current functionality is problematic because a) the user is the one determining the query, not the developer, and b) some generalizations have to be made with how the query is actually built, in which it's very difficult to account for all edge cases (for example, selecting the joined column twice in certain circumstances). Of course, these could definitely be seen as bugs on the part of the third-party library and how they generate the query, but could also be considered just a byproduct of unexpected behavior by
Sorry, I was more talking about how they conceptually return results, not technically. RDMS's return their data in a 2-dimentional result which lends itself well to an array (resultset) of objects (row) with properties (column) set to the value. It seems to me to be counter-intuitive to thrown in additional dimensions by default 😉 As you said, and I agree, it is trivial to iterate through and just take the last column in order to emulate the other packages, or use |
It seems that either the documentation on how duplicate column names are handled is incorrect, or the implementation is incorrect. Documentation notes:
Expected behaviour:
I would expect that, by default, any query with duplicate column names would take the last column and return the keyed object using that data.
Actual behaviour:
The actual behavior seems to group the two data sets together into an array.
Script:
Software versions
mcr.microsoft.com/mssql/server:2017-latest-ubuntu
)The text was updated successfully, but these errors were encountered: