Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for retrieving Columns Schema from DbDataReader #15269

Closed
NickCraver opened this issue Sep 24, 2015 · 72 comments
Closed

API for retrieving Columns Schema from DbDataReader #15269

NickCraver opened this issue Sep 24, 2015 · 72 comments
Assignees
Milestone

Comments

@NickCraver
Copy link
Member

This is to unwrap the specific issue of resultset schemas from the many involved in #14302.
Back on May 5th @davkean noted:

We want GetSchemaTable in some form (we should get a separate bug filed on that, however, as you could imagine bringing that back with a different design that doesn't pull in DataSet/DataTable).

While it doesn't seem that anyone cares what method exists for of getting the schema of a resultset, we seem to agree on needing a method of doing it in .Net Core. For example, if a user runs a simple query such as:

SELECT 'bob' Name, 1 Id

Then there's no way to see what the types are. While getting the .Net type (via .GetFieldType(), e.g. DateTime) is possible and getting part the SQL type (via .GetDataTypeName(), e.g. NVARCHAR) is possible, other vital attributes of the schema shape currently aren't possible. For example there's no way to distinguish a varchar(20) from a varchar(max). There's also no way to tell if the column was nullable. Previously, this functionality was provided by .GetSchemaTable() which returned a DataTable class.

Currently I have a very simple use case: query data here, create a table with that data elsewhere. The query can be something like: SELECT * FROM sys.database_files. I point this out to illustrate that INFORMATION_SCHEMA (as proposed by some in #14302 as an alternative) is not a viable option. The same issue presents on any stored procedure which can also return any result set shape depending on the code paths. We need a way to see the scheme of a result set, these database-query alternatives are both expensive and non-covering.

This is a hard blocker for projects of myself and others. I have uses in both one-off projects as well as Opserver (which monitors SQL) and Dapper.Net. I know from #14302 that I'm not alone on this block. The portability of .Net core is extremely appealing for certain projects, especially those where we need to monitor the innards of SQL without depending on a .Net install (many if not most existing SQL servers only have .Net 3.5). Unfortunately though, this one missing piece of the puzzle may prevent using it altogether, forcing full CLR installs.

While SqlClient specifically serves my blocking use case, it's important to point out as @FransBouma notes in the other issue: this needs to be in the base API, not specifically up in SqlClient. For example, I also need this for Dapper features and it works across many providers - it is not at all SQL Server specific.

So how about it, can we please get an API for result set schemas added back? I'm happy to contribute to designing and provide feedback on a new API here if that's the best route. I know many others are as well.

@FransBouma
Copy link
Contributor

Additionally: DbConnection.GetSchema(). This too relies currently on DataTable and should get a proper alternative in .NET core going forward. There are no alternatives for some schemas retrievable through this method, for some providers. If there's no alternative in .NET core, there's no way to write generic code to obtain metadata about a schema aspect. E.g. on Firebird, some metadata is very hard to retrieve from the schema (read: very complex metadata queries) and the schemas provided by this method are very helpful. I think the solution chosen for DbDataReader.GetSchemaTable should also be applied to DbConnection.GetSchema as it too simply has to return a datatype with tabular data which is only known at runtime.

@william-gross
Copy link

This is a blocker for me as well, as I commented in #14302.

@NickCraver
Copy link
Member Author

Responding to @YoungGah's note above:

Absolutely! We are tossing around ideas of the API returning schema in JSON format. Any opinions?

Given that JSON serialization is not included in core, wouldn't it be a bit odd to have libraries implementing DataReader taking dependencies to support it? I wouldn't need JSON serialization in several apps aside from this. I'd love to hear what others hear, but from the SQL side, this option isn't very appealing. That being said, I certainly see why it's being discussed for other providers.

With tabular data it can usually be expressed by positional or named arguments (e.g. a dictionary, list of an interface, etc.). I think that's a better approach for SQL, but I also recognize how that doesn't really work for a document type data store.

@roji
Copy link
Member

roji commented Sep 25, 2015

Am the maintainer of Npgsql, this is definitely a very important discussion, and it seems important to have this for .NET Core RTM.

Here are some thoughts:

  • It definitely seems that dropping DataTable/View/Set is a good approach as long as an alternative schema-fetching mechanism is provided.
  • Re resultset metadata, as has been mentioned before a limited subset is available via GetFieldType, GetDataTypeName. Although this API approach could be extended (e.g. IsNullable), this doesn't seems like an optimal API for getting the results. It would also probably be different from whatever new API will replace DbConnection.GetSchema. If so, you may consider removing GetFieldType, GetDataTypeName from .NET Core as well as these will be a subset of whatever full new metadata API will be introduced.
  • Re JSON, I agree with @NickCraver that it seems a bit out place in the ADO.NET API:
    • It would force a particular JSON API (Newtonsoft?) on all ADO.NET consumers (and producers).
    • It would be a weakly-typed API
    • It is somewhat cumbersome for provider writers, since relational databases typically don't communicate schema information in JSON.
  • Why not introduce a simple .NET object-based model, e.g. DbColumn, DbTable? Providers could extend these as usual in ADO.NET for provider-specific metadata attributes (e.g. SqlDbColumn)...

@FransBouma
Copy link
Contributor

JSON is a silly format in this context, it's an untyped text blob that needs deserializing using an external library. So -1 for JSON. The main issue is besides the external library needed that it's weakly typed: a piece of code calling GetDbSchema has to be sure the data it receives back is of the same type and the elements embedded inside it is stored in the same types. Always. With JSON that's up in the air, hence: bad choice.

@nvivo
Copy link

nvivo commented Sep 26, 2015

@YoungGah

We are tossing around ideas of the API returning schema in JSON format. Any opinions?

I'm not sure about the idea of json for this. What would be the advantage over a simple object that can be easily serialized if needed? This is supposed to be used in the context of .NET calls, there is hardly a need for this information to be that interoperable to require a json blob.

On https://github.com/dotnet/corefx/issues/1039#issuecomment-142271673 I proposed a very a simple interface that has the basic information that have been proven to be needed, and a dictionary for additional data.

interface IColumnSchema {
    int ColumnOrder;
    string ColumnName;
    string ProviderDataType;
    Type ClrMappedType;
    int MaxLength;
    bool CanBeNull;
    IReadOnlyDictionary<string, object> CustomProviderData;
}

interface IReaderSchema()
{
     IReadOnlyList<IColumnSchema> Columns { get; }
}

This is supposed to be used as:

var command = CreateDbCommand("select 1 from bob");
var reader = command.ExecuteReader();

var schema = reader.GetResultSchema(); // returns IReaderSchema

foreach (var c in schema.Columns) {
    ....
}

I'm not proposing this exact api, but something along the lines of this idea, with interfaces describing this data instead of the datatable.

@YoungGah and others, what do you think?

@YoungGah
Copy link

YoungGah commented Oct 2, 2015

Thanks for all your feedbacks and JSON is out, then. We will get back to you with another proposal.

@bricelam
Copy link
Contributor

bricelam commented Oct 2, 2015

In case it's useful, I did an EF feature spec on a new Schema Information API a few years ago. We never got around to implementing it, and it was definitely skewed toward EF6, but it may be interesting to look at in the context of this issue.

@YoungGah
Copy link

YoungGah commented Oct 2, 2015

@bricelam Thanks for the spec! We will look into it.

@roji
Copy link
Member

roji commented Oct 3, 2015

@bricelam, @YoungGah, this Schema Information definitely seems like a step in the right direction! Some immediate points that come to mind:

  • The proposal handles schema table/view metadata, but not resultset metadata (i.e. from a DbDataReader). Of course it would be pretty trivial to extend the proposal for that.
  • It's worth thinking about allowing providers to extend these classes to provide provider-specific information
  • At least in PostgreSQL there are function features not covered (e.g. variadic)
  • Unique constraint is missing

But I'm guessing this is just a starting point, I'll be glad to provide more feedback on an up-to-date proposal.

@roji
Copy link
Member

roji commented Oct 7, 2015

Note the current effort in EF7 to define a relational model produced by a reverse engineering provider. Ideally this could be the same model returned by an ADO.NET provider.

@saurabh500
Copy link
Contributor

cc @jcyang75

@saurabh500 saurabh500 assigned saurabh500 and unassigned YoungGah Dec 15, 2015
@saurabh500
Copy link
Contributor

For the schema information return type we are considering adding a List of Dictionaries.
The dictionary looks like Dictionary<string, object>
The key will represent that column and the object will be the value of the column.
Each item in the list will represent an item in the schema.

This data structure can be used for the replacement of the DbConnection.GetSchema() API as well.
Let us know what you think.

For the DataReader API we are considering the following names:

  1. GetSchemaDefinition()
  2. GetSchemaInfo()
  3. GetDefinition()

Suggestions, votes?

@NickCraver
Copy link
Member Author

@saurabh500 It's a bit hard to picture this in the abstract, can you give us an example of what this would look like? Let's say if it were serialized as JSON for the purposes of discussion.

Thoughts without the layout example for discussion: I would want constants for the known things such as size, type, etc. (as opposed to magical matching strings on every consumer's side), but the lack of strong typing by the nature of the API is a pretty significant downside. We're talking about shifting to runtime breaks rather than compile time breaks on an API that differs between CLR vs. CoreCLR. Every consumer would be casting object to the string, int, Type, etc. (and causing a lot of boxing in the process, though to be fair: that's not a major concern here I think).

While GetSchemaTable() isn't the best option, it is typed and most issues are compile time rather than runtime. I wouldn't want to lose this ease of use (and safety) vs. a very generic dictionary approach. There should be a more strongly typed API. If this is an attempt to describe both document and tabular data in the same APIs, I think that's a bad approach. There is a fundamental disconnect even at the "number of levels" definition, and those should be distinct APIs. The tabular schema definitions can then be much more strongly typed, and I can certainly see a string indexer on that description class/type for providers to add any additional properties they want above and beyond the base, strongly typed set.

@roji
Copy link
Member

roji commented Dec 15, 2015

I second @NickCraver here, if you're doing a new schema API, there's no reason whatsoever not to have it typed. Why not create classes such as DbSchemaTable, DbSchemaColumn, DbSchemaFunction and include simple strongly-typed properties exposing relevant information (e.g. Precision, Scale)? Specific providers could extend these (e.g. SqlSchemaColumn) to add provider-specific properties, as is already done with other areas of the ADO.NET API...

@roji
Copy link
Member

roji commented Dec 15, 2015

Also, as I mentioned above, note the EF7 reverse-engineering effort currently underway, which does exactly this: represent a database schema in a strongly-typed way. Any chance the ADO.NET and EF7 teams could get together and see if a single API could be used here?

@FransBouma
Copy link
Contributor

Agree with @roji's example and @NickCraver 's points. An untyped Dictionary bucket is useless. I also wonder how much time has been put into this on MS' side. We're debating this since May this year and the most naive approach is to use an untyped Dictionary and MS comes up after all this time with exactly that.

Not a good show, people.

@saurabh500
Copy link
Contributor

@FransBouma, @roji, @NickCraver Thanks for the feedback.

While proposing the loosely typed Dictionary there were 3 considerations:

  1. Allowing schema to be returned by DataReader
  2. Allowing DbConnection.GetSchema(), DbConnection.GetSchema(String), DbConnection.GetSchema(String, String[]) to return a schema in a manner similar to point 1.
  3. Allowing another provider to provide the schema using the same API

The DataReader's GetSchemaTable API provided the same Column Names every time it is used.
However the DbConnection's GetSchema() and the overloads provide different Column Names based on what is being queried.

If the DbConnection.GetSchema is called, then the Columns returned are CollectionName, NumberOfRestrictions, NumberOfIdentifierParts
If the schema information of the Databases is queried, then the returned columns are database_name, dbid, create_date
In case the schema information of all Tables is queried then the columns are TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE

The column names are different based on different contexts in which the Schema is being retrieved.

In the interest of keeping the Data Structure same for DbConnection and DataReader, I recommended an untyped Dictionary with relevant information present in it.
The providers can also provide their own Keys if they choose to use constants that System.Data.Common exposes.

@NickCraver following is a close JSON representation of what I was recommending:

[{
    ColumnName : "CourseID",
    ColumnOrdinal : "0",
    ColumnSize : "10",
    NumericPrecision : "255",
    NumericScale : "255",
    IsUnique : "False",
    BaseColumnName : "CourseID",
    DataType : "System.String",
    AllowDBNull : "False",
    ProviderType : "12",
    IsAliased : "",
    IsExpression : "",
    IsIdentity : "False",
    IsAutoIncrement : "False",
    IsRowVersion : "False",
    IsLong : "False",
    IsReadOnly : "False",
    ProviderSpecificDataType : "System.Data.SqlTypes.SqlString",
    DataTypeName : "nvarchar",
    ...
},
{
    ColumnName : "Year",
    ColumnOrdinal : "1",
    ColumnSize : "2",
    NumericPrecision : "5",
    NumericScale : "255",
    IsUnique : "False",
    BaseColumnName : "Year",
    DataType : "System.Int16",
    AllowDBNull : "False",
    ProviderType : "16",
    IsAliased : "",
    IsExpression : "",
    IsIdentity : "False",
    IsAutoIncrement : "False",
    IsRowVersion : "False",
    IsLong : "False",
    ProviderSpecificDataType : "System.Data.SqlTypes.SqlInt16",
    DataTypeName : "smallint",
    ...
}]

Having explained this, the consensus is definitely towards having a strongly typed API

Requests / Questions
If the same return type is to be used for the Schema APIs of DbConnection and DataReader, what are the relevant set of strongly typed properties that should be included?
Are you recommending adding different data structures for schema API for the DbConnection and DataReader?
If the schema returned by DbConnection and DataReader, then there is hardly any overlap in the strongly typed properties.

This issue was opened for DataReader. @FransBouma has mentioned the DbConnection.GetSchema(...) API.
It will be a good idea to discuss both DbConnection and DataReader on this thread.

The schema returned by the DataReader is the Column schema however the DbConnection covers a much wider range of schema at different levels.

@roji
Copy link
Member

roji commented Dec 15, 2015

@saurabh500, I understand the motivation for weakly-typed results you explain above. If a single API method (i.e. DbConnection.GetSchema()) needs to return different results based on the query its given, then there's indeed a problem with strongly-typed results.

I think the answer is to replace the single, multi-result GetSchema() with multiple methods, each of which returns a single result type. In other words, instead of having one method returning different things based on its input (the query), have several methods which each return only one thing. So you would have GetDatabaseSchema, GetTableSchema, and so on. Note the JDBC approach of retrieving a DatabaseMetadata object from the connection, which you can then query with specific methods; this has the advantage of concentrating all the schema-related methods in a single place that's separate from DbConnection (however, note that in JDBC these methods return a weakly-typed ResultSet which isn't good).

Regarding an overlap between DbConnection.GetSchema() and DbDataReader.GetSchema()... IMHO, an overlap isn't necessary related to making the API strongly-typed or not - even if there isn't any overlap anywhere a strongly-typed API is still preferable to a weakly-typed one. I think still there's some potential for overlap, e.g. with DbSchemaColumn with may be returned from both.

Finally, the JSON representation seems good in principle (assuming it gets translated to a strongly-typed API). However, please consider a clean distinction between SqlServer properties and general properties which make sense for all relational database. So DataTypeName and AllowDBNull make sense for all databases, but IsIdentity and IsRowVersion are entirely SqlServer-specific. The latter probably belong in an Sql* subclass, much like SqlConnection is a subclass of DbConnection.

@ErikEJ
Copy link

ErikEJ commented Dec 15, 2015

I am all for:
1: Why not introduce a simple .NET object-based model, e.g. DbColumn, DbTable? Providers could extend these as usual in ADO.NET for provider-specific metadata attributes (e.g. SqlDbColumn)...
2: And corresponding methods to replace the magic "GetSchema" - So you would have GetDatabaseSchema, GetTableSchema

The challenge being of course to agree on what belongs in the "general" model

@FransBouma
Copy link
Contributor

@saurabh500

The DataReader's GetSchemaTable API provided the same Column Names every time it is used.
However the DbConnection's GetSchema() and the overloads provide different Column Names based on what is being queried.

I don't really see how they should be related. DbConnection.GetSchema returns a different schema based on input and ADO.NET provider: not every provider provides the same schemas and not every provider provides the same layout in the returned schemas. The method is currently not that useful if you don't know what provider you're dealing with.

If the DbConnection.GetSchema is called, then the Columns returned are CollectionName, NumberOfRestrictions, NumberOfIdentifierParts
If the schema information of the Databases is queried, then the returned columns are database_name, dbid, create_date
In case the schema information of all Tables is queried then the columns are TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE

Not true. Perhaps on SQL Server, but ODP.NET doesn't return that information, as it doesn't even have the concept of a catalog.

The column names are different based on different contexts in which the Schema is being retrieved.

and ADO.NET provider. Please look over the fence what other databases there are. It's useless to create a 'generic' API which is actually designed for SQL Server.

In the interest of keeping the Data Structure same for DbConnection and DataReader, I recommended an untyped Dictionary with relevant information present in it.

You're limiting both with this not-needed restriction. DbDataReader.GetSchemaTable is used to work with the resultset. It's always the same, for every ADO.NET provider, so you can write generic code to deal with it.

DbConnection.GetSchema() differs per ADO.NET provider and also per input. It can't be used to write generic code as it's specific per ADO.NET provider. Even different ADO.NET providers for the same DB type (e.g. MS' own Oracle ADO.NET provider and ODP.NET from Oracle itself) return different schemas and information per schema.

Any API exposed which hasn't been designed with at least the major databases in mind is a lost cause and a missed chance. This is a generic API so it has to be usable across all major databases. That means that a concept like a strongly typed UDT with schema info has to be available, it shouldn't rely on catalogs (or schemas (as a concept of container for database elements, like 'dbo' on sql server), as MySQL doesn't support schemas but does support catalogs etc.) alone, to name an example.

And again, there are 2 different schemas here: one being about meta-data of the schema in the RDBMS itself (through DbConnection.GetSchema()) and one being about the meta-data of the active resultset (through DbDataReader.GetSchemaTable()). They are different, as they provide different information and the former is flexible and can return anything, the latter is fixed as it returns always the same thing, and has to be predictable so generic code can be written. Ideally it would be best if DbConnection.GetSchema provided a fixed set of schemas to query and each resultset for a schema requested has a pre-defined layout. If that layout is left to the ADO.NET provider, we'll get what we have today, which is a poorly designed, unusable mess (the DbConnection.GetSchema method that is), as some ADO.NET providers provide not enough info or totally different info than others. E.g. it's not possible today to use DbConnection.GetSchema to obtain meta-data for a database schema to do reverse engineering of schema to object models on any major RDBMS.

But, I'm wondering, why are we even having this debate when there has been at least one Microsoft employee years ago who was clever enough to include GetSchemaTable() in the API in the first place. The reasoning behind it then might be a good start, instead of doing all that work again and worse: making all the same mistakes.

@roji
Copy link
Member

roji commented Dec 15, 2015

@FransBouma just one small comment...

Although database schema concepts vary across databases (catalogs, schemas), there's still a lot of common ground to capture in a generic API. Concepts such as tables, columns, indices/restrictions etc. seem to be pretty universal. As an example, Npgsql implements the current DataTable-based DbConnection.GetSchema() in a way that closely resembles SqlClient, allowing generic code to use that method to some extent. Another good example is EF7's reverse engineering, which models schemas in a database-independent way; they're doing a pretty good job at it and I actually think it's pretty much a duplication of what this issue is about (see dotnet/efcore#4091). So I really think both APIs being discussed here (both the DbConnection one and the DbDataReader one) should strive to be generic.

@FransBouma
Copy link
Contributor

@roji I've written (many years ago) a general purpose system for reverse engineering metadata to entity models for LLBLGen Pro and it's indeed very doable to capture a lot of the common elements across DB types in a general purpose class library. Most databases indeed contain common concepts like a table, a procedure/function, a type, table field, perhaps even views etc.

That's the easy part ;). With this, the devil is in the details. Like I described earlier: UDTs which both have a .NET type and also an underlying DB specific type (catalog/schema/elementname) on sqlserver, but not on oracle. A value in a resultset which is a BFILE type and should be read using a reader, preferably a generic one so generic code can deal with that value as more DBs support a BFILE like type (but not all of them). Just a few things that come to mind.

As I know what's needed to reverse engineer relational models to entity models, I also know it's not a 100% sufficient model for all cases: not all meta-data is needed for reverse engineering, and it can very well drive decisions in the datastructures: why bother with meta-data like packages which contain overloaded stored procedure definitions in Oracle if they're not needed for reverse engineering relational model data to entity definitions? :)

So the scope is (much) wider than that. It's a good start, for sure. It's not a solution that can be used to implement what's currently lacking (as more is needed).

I also want to stress that schema meta-data info is nice for DbConnection.GetSchema, but for the topic here, DbDataReader.GetSchemaTable(), the actual type info is needed per column, for the active resultset, which is IMHO a total different thing: for GetSchemaTable you don't need table/view/proc metadata, but type information, and additionally e.g. to read a BFILE value if a column is of type BFILE, in a generic manner.

@NickCraver
Copy link
Member Author

I support @FransBouma's sentiment here on them really being separate APIs. The DbConnection schema may warrant some generic components, but the DataReader schema can be much more strongly typed.

@saurabh500 To answer your question:

what are the relevant set of strongly typed properties that should be included?

Let be get the ball rolling here (and iterate), here's what I would want to carry over these items from GetSchemaTable() today:

These would be on a base type, something like DbColumn or DbColumnInfo:

  • AllowDBNull
  • BaseCatalogName
  • BaseColumnName
  • BaseSchemaName
  • BaseServerName
  • BaseTableName
  • ColumnName
  • ColumnOrdinal
  • ColumnSize
  • IsAliased
  • IsAutoIncrement
  • IsExpression
  • IsHidden
  • IsIdentity
  • IsKey
  • IsLong
  • IsReadOnly
  • IsUnique

These would move into a SQL Server inherited type, like SqlColumn or SqlColumnInfo which inherits from the above:

  • NumericPrecision
  • NumericScale
  • ProviderType
  • IsColumnSet (this may warrant a move to base - I'm unsure which other providers implement it)
  • IsRowVersion
  • UdtAssemblyQualifiedName (there may be other core issues on this one)

...and I have no strong thoughts on NonVersionedProviderType - perhaps others can chime in there.

In addition to these strongly typed properties, there should be a [string property] indexer on DbColumn which can store any number of additional provider properties, though that return type would have to be object to be usable. There's a tradeoff here of references (which no library author wants) vs. every provider subclassing. It allows the usage of both models, since any property subclassed can be exposed both ways. Then again, there's the base argument of "anything shared should have just been on the base class" - which is absolutely true. So maybe the former point is a moot one.

What other base properties would others like to see added? Disclaimer: Dapper already takes a reference to SQL Server packages for other reasons, so our thoughts on what is and isn't in the SqlColumn in the above proposal admittedly aren't as strong as others may have.

Thoughts?

@mgravell
Copy link
Member

@NickCraver needs DataType (.NET type) in the base

@FransBouma
Copy link
Contributor

precision/scale needs to be in base (all db's support it) and udt type too (as some DBs support it). The thing is that if one needs to cast to an ADO.NET provider specific column type (e.g. SqlColumn) it has to be because there's no other DB supporting that feature. I think the set of these features is very small, and if even 2 databases support a given feature it should be in the base class.

it otherwise makes things very cumbersome to use in generic code: the current DataTable based result is usable in generic code, no matter what DB is used, and I think that is a use-case that has to be met with the new system as well.

@mgravell
Copy link
Member

@FransBouma accessing in a datatable by name shouldn't be much different to accessing via the indexer Nick mentioned - especially if it simply returns null for keys that don't exist.

@mgravell
Copy link
Member

@roji In theory I agree; there's a slight gotcha there that the matching column
in the old API is "DataType", but I don't think we should feel overly
constrained by that.

@MihaMarkic
Copy link

@NickCraver Well, are we talking only about ORMs or ado.net in general?

The advantage of dynamic approach (or hybrid with dynamic/properties and indexer) has the advantage of strong typed/intellisense approach when needed. I can imagine supporting a single database and then strong typing comes handy.

@MihaMarkic
Copy link

@mgravell Or you can still check if a property exists through reflection. I agree it is not ideal.

@ErikEJ
Copy link

ErikEJ commented Dec 16, 2015

@NickCraver Well, I am that expert 😄 - and they are two distinct database engines with separate ado.net providers.

@saurabh500
Copy link
Contributor

Thanks for the inputs folks.

We will incorporate the mentioned list attributes summarized by @NickCraver to model the Data structure for this. Thanks for the inputs.

To put this in perspective of the DbConnection.GetSchema(), DbColumn represents the schema details of a column.

We are considering creating base Types for the different schema objects and creating relationships between these types to represent the Database schema.

E.g.
To name a few types and their relationships

DatabaseSchema represents the Database
DbTable represents the Tables in the database
DbColumn (or DbColumn) represents the columns.
DbForeignKey represents the foreign keys.
DbIndex represents the Indexes in the database.

Each DatabaseSchema contains a list of DbTable, DatabaseName (and other properties)
DbTable contains a list of DbColumn, a list of DbForeignKey and references the DbTable it is a part of.

There are more concepts that are needed for the DbConnection.GetSchema()
We are still looking into this would like your feedback for the design of the Types are we refine it.

This thread is focused around the DataReader.GetSchemaTable()

I will open another issue to discuss the DbConnection.GetSchema()

@saurabh500
Copy link
Contributor

I have added dotnet/corefx#5024 for the DbConnection.GetSchema discussion

@NickCraver
Copy link
Member Author

@saurabh500 where can we see progress on the API scheme discussed here? Given the major problem has been communication and RTM rapidly approaching, I know that at least for myself, C# is a lot clearer than any words here. To be honest, I'm not even sure what this means (especially given ongoing discussion about what should be in the base class):

We will incorporate the mentioned list attributes summarized by @NickCraver to model the Data structure for this. Thanks for the inputs.

...can we track actual dev or API review progress somewhere? I know several of us would be happy to help via PRs, etc.

@saurabh500
Copy link
Contributor

@NickCraver I will put the pull request or the dev branch link here to track progress of the code.

@NickCraver
Copy link
Member Author

@saurabh500 any update?

1 similar comment
@NickCraver
Copy link
Member Author

@saurabh500 any update?

@saurabh500
Copy link
Contributor

@NickCraver I have added the DbColumn to the System.Data.Common. I have exposed a few fields in SqlClient to implement this API.
I have to add an API to expose List in System.Data.Common and I am almost done with my implementation for SqlClient.
I will be linking this issue in subsequent PRs.

@NickCraver
Copy link
Member Author

@saurabh500 thanks - we look forward it!

@NickCraver
Copy link
Member Author

Linking initial PR discussion here: dotnet/corefx#5609

@terrajobst
Copy link
Contributor

We discussed this today. We believe the current proposal isn't ready yet.

@saurabh500: once you updated the proposal, please edit the description of the issue and copy & paste it there. Also, remove the api-needs-work label and add the api-ready-for-review label. Thanks!

@saurabh500 saurabh500 changed the title A replacement API for DataReader.GetSchemaTable() API for retrieving Columns Schema from DbDataReader Feb 4, 2016
@saurabh500
Copy link
Contributor

@terrajobst PTAL

@saurabh500
Copy link
Contributor

API for GetColumnSchema

Problem

DbDataReader on .Net Framework, provides an API called DataTable GetSchemaTable() which was returns the metadata of the columns
being read using the DbDataReader object. Because of the removal of DataTable in .Net Core this API had to be removed.
DbDataReader.GetSchemaTable() needs a replacement in .Net core.

Progress

The data structure DbColumn along with the necessary attributes which should be present
for the various ADO.Net providers to effectively provide the metadata for the columns is provided in the issue.

namespace System.Data.Common
{
   // DbColumn contains the base attributes that are common to most Database columns. An indexer 
  // has been provided in case a property added by a provider needs to be accessed without a dependency on the provider implementation. 
    public class DbColumn
    {
        public virtual bool AllowDBNull { get; set; }
        public virtual string BaseCatalogName { get; set; }
        public virtual string BaseColumnName { get; set; }
        public virtual string BaseSchemaName { get; set; }
        public virtual string BaseServerName { get; set; }
        public virtual string BaseTableName { get; set; }
        public virtual string ColumnName { get; set; }
        public virtual int ColumnOrdinal { get; set; }
        public virtual int ColumnSize { get; set; }
        public virtual bool IsAliased { get; set; }
        public virtual bool IsAutoIncrement { get; set; }
        public virtual bool IsExpression { get; set; }
        public virtual bool IsHidden { get; set; }
        public virtual bool IsIdentity { get; set; }
        public virtual bool IsKey { get; set; }
        public virtual bool IsLong { get; set; }
        public virtual bool IsReadOnly { get; set; }
        public virtual bool IsUnique { get; set; }
        public virtual int NumericPrecision { get; set; }
        public virtual int NumericScale { get; set; }
        public virtual string UdtAssemblyQualifiedName { get; set; }
        public virtual Type DataType { get; set; }
        public virtual string DataTypeName { get; set; }

        public virtual object this[string property] {} 


    }
}

The next part of the API is to expose a ReadOnlyCollection from the DbDataReader

The right way of exposing the API would be to add an API in the DbDataReader class called
ReadOnlyCollection<DbColumn> DbDataReader.GetColumnSchema()

This is however not possible in .Net Core current version as it would hinder portability of apps created on .Net core with .Net framework.

https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/breaking-change-rules.md

Based on the contents of the link above, the following rule is allowed
Adding an abstract member to a public type when there are no accessible (public or protected) constructors, or the type is sealed

The restriction is due to type forwarding, which needs the calls to DbDataReader to be forwarded to the .Net framework's DbDataReader.

Proposal

We introduce an interface called IDbColumnSchemaGenerator, in .Net core which can be implemented by a type which can provide the Column Metadata.
For DbDataReader, this interface needs to be implemented by the subclasses of the DbDataReader to provide the Column metadata provided by the DbDataReader

namespace System.Data.Common
{
    public interface IDbColumnSchemaGenerator
    {
        System.Collections.ObjectModel.ReadOnlyCollection<DbColumn> GetColumnSchema();
    }
}

The subclasses of DbDataReader in the ADO.Net providers implement this interface.

public class SqlDataReader : DbDataReader, IDbColumnSchemaGenerator 
{
  public ReadOnlyCollection<DbColumn> GetColumnSchema(){
      return ... ; 
  }
}

In .Net core, to allow the Column Metadata to be exposed from the DbDataReader, an extension to the DbDataReader will be provided. This extension has the same signature as the interface method.

using System.Collections.ObjectModel;

namespace System.Data.Common
{
    public static class DbDataReaderExtension
    {
        public static System.Collections.ObjectModel.ReadOnlyCollection<DbColumn> GetColumnSchema(this DbDataReader reader)
        {
            if(reader is IDbColumnSchemaGenerator)
            {
                return ((IDbColumnSchemaGenerator)reader).GetColumnSchema();
            }
            throw new NotImplementedException());
        }
    }
}

It is recommended that the extension be used by consumers, for compatibility with .Net Framework. The details are mentioned in the compatibility section below.

Producer

The ADO.Net providers are the producers of the Column Schema. The interface will be implemented by the sub-classes of DbDataReader in the ADO.Net providers.

E.g.

class SqlDataReader : DbDataReader, IDbColumnSchemaGenerator
{
    ...

        public ReadOnlyCollection<DbColumn> GetDbColumnSchema()
        {
            ReadOnlyCollection<DbColumn> metadata = ... ; 
            // Build the list of columns metadata. Each columns metadata is represented by the DbColumn
            return metadata; // 
        }
    ...
}

Consumer

The consumers can retrieve the column metadata using an extension method on the DbDataReader. This extension will be provided in System.Data.Common

For a library consuming the DbDataReader the code will look like the following:

using System.Data.Common;

namespace org.example.my
{
        ...
            public void ProcessMetadata(DbDataReader reader)
            {
                ICollection<DbColumn> columnMetadata = reader.GetColumnSchema();
            }
        ...
}

Compatibility of libraries written on .Net Core with .Net Framework

The extension method will have a different implementation in the partial facade to get the Schema. The implementation will use the DbDataReader.GetSchemaTable() api which returns the schema as a Datatable.
This Datatable will be mapped to a ReadOnlyCollection and returned by the implementation of the extension in the partial facade.

vNext

Change in .Net Framework (Desktop)

In the next version of .Net Framework we add a virtual method on the DbDataReader to allow access to schema using ReadOnlyCollection<DbColumn>
The method will have the same signature as the extension method.

namespace System.Data.Common
{
    public abstract class DbDataReader : ... 
    {
        ...
        public virtual ReadOnlyCollection<DbColumn> GetColumnSchema()
        {
            // Map Datatable to the DbColumn as default implementation
            // Providers can override the implementation for their requirement.
            // The implementation will be similar to the implementation in 
            // partial facade implementation for .Net core v1.0 
        }
    }

}

.Net Core: vNext

For a vNext of .Net core which targets .Net Framework with the API added to it, we can introduce the same API on .Net core with a default implementation in the base class DbDataReader. At this point the providers have the option of providing the metadata either by overriding the method in the base class or by implementing the interface.
The base class implementation would check if the interface is implemented by the subclass and will provide the interface implementation while querying the base class API. The providers which do not implement the interface will have the option of overriding the base class method or implementing the interface.

The base class virtual function will have the same signature as the interface.

namespace System.Data.Common
{
    public abstract class DbDataReader : ... 
    {
        ...
        public virtual ReadOnlyCollection<DbColumn> GetColumnSchema()
        {
            if(this is IDbColumnSchemaGenerator)
            {
                return ((IDBColumnSchemaGenerator)this).GetColumnSchema();
            }
            throw new NotImplementedException();
        }
    }

}

The vNext of .Net core will have a changed implementation of the Extension method where the extension method will call the base class function to retrieve the schema. This will allow consumer DLL compiled on .Net core v1.0 to be able to run on vNext without any changes.

Any code compiled on .Net Core vNext will compile against the base class function as the member methods get precedence over the extensino method.

Examples

Consumers

A consumer written on .Net Core on v1.0

In this case, the code will be compiled with the Extension method.

using System.Data.Common;

namespace org.example.my
{
        ...
            public void ProcessMetadata(DbDataReader reader)
            {
                ICollection<DbColumn> columnMetadata = reader.GetColumnSchema();
            }
        ...
}

In vNext of .Net core

If the consumer is recompiled, the code will be compiled with the base class DbDataReader which has the virtual method added to it. The base class virtual method takes precedence

using System.Data.Common;

namespace org.example.my
{
        ...
            public void ProcessMetadata(DbDataReader reader)
            {
                ICollection<DbColumn> columnMetadata = reader.GetColumnSchema();
            }
        ...
}

Producers

The producers in the .Net Core v1.0 don't have to implement the interface IDbColumnSchemaGenerator in v1.0

In vNext, they can continue to implement the interface and not change any code. However, they can choose to override the base class method to provide the Column Schema as well. Either ways the consumer will not need to change their code to consume the Schema

For new provider implementation, it is however recommended that they override the base class virtual method in DbDataReader for Columns Metadata.

@NickCraver
Copy link
Member Author

@saurabh500 Thanks for writing all this up. Just a few minor improvements I see:
I think the .Net desktop version is missing the conversion bits from the proposal, specifically this example both now (extension) and vNext (virtual):

namespace System.Data.Common
{
    public abstract class DbDataReader : ... 
    {
        public virtual ReadOnlyCollection<DbColumn> GetColumnSchema()
        {
            throw new NotImplementedException();
        }
    }
}

I think it'd be better to put a comment (rather than the throw) about converting the existing GetSchemaTable() to ReadOnlyCollection<DbColumn> in these examples to be more accurate and convey the providers-need-to-do-nothing to get the benefits of today's interfaces, but they can to add more detail later if they choose to override. It's correct in this section of the notes, but the code doesn't align:

The extension method will have a different implementation in the partial facade to get the Schema. The implementation will use the DbDataReader.GetSchemaTable() api which returns the schema as a Datatable. This Datatable will be mapped to a ReadOnlyCollection and returned by the implementation of the extension in the partial facade.

One other item: I'd use something along the lines of IDbDataReaderSchemaGenerator for the interface name, I think IDbColumnSchemaGenerator is too generic/confusing when factoring in the entire-database schema getter that may have a similar interface approach.

@saurabh500
Copy link
Contributor

@NickCraver
For the first part. Thanks for pointing it out. I will make changes to align the code with the description using comments.

The second part about the name of the interface
IDbColumnSchemaGenerator: I chose the name because I wanted to convey that the implementing class is a Column Schema Generator and a ProviderDbReader can implement this class to become a Column Schema Generator.

In case another class is internally created by the provider to generate Column Schema, this interface can be used by them. E.g. If there is an class or technique which generates the schema for the database this interface can be used to populate the columns using a different implementation than the DbDataReader.

This was the thought process behind the name and mapping it to DbColumn Schema rather than the interface.

IDbDataReaderSchemaGenerator gives the sense that the object should be a reader and can apply to this usecase.
I did think about calling the interface IDbDataReaderSchemaGenerator but chose IDbColumnSchemaGenerator so that the concept of DbColumn schema generation can be decoupled from a reader and can be used in a more generic manner.

Thoughts?

@terrajobst
Copy link
Contributor

@saurabh500

Thanks for writing this up. In order to keep things organized I propose that:

  • We close this issue.
  • @saurabh500 creates a new issue and copy as pastes the above write up in there.

As refinements are made, we update the description. The reason I say this is because finding the latest version of a proposal is otherwise a daunting task.

@saurabh500
Copy link
Contributor

@terrajobst I have opened issue dotnet/corefx#5915 for API review
That would help track feedback for the API.

@NickCraver Can you close this issue? We can follow this up on dotnet/corefx#5915

@terrajobst
Copy link
Contributor

Thanks!

@werddomain
Copy link

werddomain commented May 29, 2016

where i can download this api?
Im translating my mcv 5 app and this block me, i cant finish the conversion :(

    `public static List<string> GetColumnList(DbDataReader reader)
    {

        List<string> columnList = new List<string>();

        DataTable readerSchema = reader.GetSchemaTable();

        for (int i = 0; i < readerSchema.Rows.Count; i++)

            columnList.Add(readerSchema.Rows[i]["ColumnName"].ToString());

        return columnList;

    }`

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests