-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for retrieving Columns Schema from DbDataReader #15269
Comments
Additionally: DbConnection.GetSchema(). This too relies currently on DataTable and should get a proper alternative in .NET core going forward. There are no alternatives for some schemas retrievable through this method, for some providers. If there's no alternative in .NET core, there's no way to write generic code to obtain metadata about a schema aspect. E.g. on Firebird, some metadata is very hard to retrieve from the schema (read: very complex metadata queries) and the schemas provided by this method are very helpful. I think the solution chosen for DbDataReader.GetSchemaTable should also be applied to DbConnection.GetSchema as it too simply has to return a datatype with tabular data which is only known at runtime. |
Responding to @YoungGah's note above:
Given that JSON serialization is not included in core, wouldn't it be a bit odd to have libraries implementing DataReader taking dependencies to support it? I wouldn't need JSON serialization in several apps aside from this. I'd love to hear what others hear, but from the SQL side, this option isn't very appealing. That being said, I certainly see why it's being discussed for other providers. With tabular data it can usually be expressed by positional or named arguments (e.g. a dictionary, list of an interface, etc.). I think that's a better approach for SQL, but I also recognize how that doesn't really work for a document type data store. |
Am the maintainer of Npgsql, this is definitely a very important discussion, and it seems important to have this for .NET Core RTM. Here are some thoughts:
|
JSON is a silly format in this context, it's an untyped text blob that needs deserializing using an external library. So -1 for JSON. The main issue is besides the external library needed that it's weakly typed: a piece of code calling GetDbSchema has to be sure the data it receives back is of the same type and the elements embedded inside it is stored in the same types. Always. With JSON that's up in the air, hence: bad choice. |
I'm not sure about the idea of json for this. What would be the advantage over a simple object that can be easily serialized if needed? This is supposed to be used in the context of .NET calls, there is hardly a need for this information to be that interoperable to require a json blob. On https://github.com/dotnet/corefx/issues/1039#issuecomment-142271673 I proposed a very a simple interface that has the basic information that have been proven to be needed, and a dictionary for additional data. interface IColumnSchema {
int ColumnOrder;
string ColumnName;
string ProviderDataType;
Type ClrMappedType;
int MaxLength;
bool CanBeNull;
IReadOnlyDictionary<string, object> CustomProviderData;
}
interface IReaderSchema()
{
IReadOnlyList<IColumnSchema> Columns { get; }
} This is supposed to be used as: var command = CreateDbCommand("select 1 from bob");
var reader = command.ExecuteReader();
var schema = reader.GetResultSchema(); // returns IReaderSchema
foreach (var c in schema.Columns) {
....
} I'm not proposing this exact api, but something along the lines of this idea, with interfaces describing this data instead of the datatable. @YoungGah and others, what do you think? |
Thanks for all your feedbacks and JSON is out, then. We will get back to you with another proposal. |
In case it's useful, I did an EF feature spec on a new Schema Information API a few years ago. We never got around to implementing it, and it was definitely skewed toward EF6, but it may be interesting to look at in the context of this issue. |
@bricelam Thanks for the spec! We will look into it. |
@bricelam, @YoungGah, this Schema Information definitely seems like a step in the right direction! Some immediate points that come to mind:
But I'm guessing this is just a starting point, I'll be glad to provide more feedback on an up-to-date proposal. |
Note the current effort in EF7 to define a relational model produced by a reverse engineering provider. Ideally this could be the same model returned by an ADO.NET provider. |
cc @jcyang75 |
For the schema information return type we are considering adding a List of Dictionaries. This data structure can be used for the replacement of the DbConnection.GetSchema() API as well. For the DataReader API we are considering the following names:
Suggestions, votes? |
@saurabh500 It's a bit hard to picture this in the abstract, can you give us an example of what this would look like? Let's say if it were serialized as JSON for the purposes of discussion. Thoughts without the layout example for discussion: I would want constants for the known things such as size, type, etc. (as opposed to magical matching strings on every consumer's side), but the lack of strong typing by the nature of the API is a pretty significant downside. We're talking about shifting to runtime breaks rather than compile time breaks on an API that differs between CLR vs. CoreCLR. Every consumer would be casting object to the While |
I second @NickCraver here, if you're doing a new schema API, there's no reason whatsoever not to have it typed. Why not create classes such as DbSchemaTable, DbSchemaColumn, DbSchemaFunction and include simple strongly-typed properties exposing relevant information (e.g. Precision, Scale)? Specific providers could extend these (e.g. SqlSchemaColumn) to add provider-specific properties, as is already done with other areas of the ADO.NET API... |
Also, as I mentioned above, note the EF7 reverse-engineering effort currently underway, which does exactly this: represent a database schema in a strongly-typed way. Any chance the ADO.NET and EF7 teams could get together and see if a single API could be used here? |
Agree with @roji's example and @NickCraver 's points. An untyped Dictionary bucket is useless. I also wonder how much time has been put into this on MS' side. We're debating this since May this year and the most naive approach is to use an untyped Dictionary and MS comes up after all this time with exactly that. Not a good show, people. |
@FransBouma, @roji, @NickCraver Thanks for the feedback. While proposing the loosely typed Dictionary there were 3 considerations:
The DataReader's GetSchemaTable API provided the same Column Names every time it is used. If the DbConnection.GetSchema is called, then the Columns returned are CollectionName, NumberOfRestrictions, NumberOfIdentifierParts The column names are different based on different contexts in which the Schema is being retrieved. In the interest of keeping the Data Structure same for DbConnection and DataReader, I recommended an untyped Dictionary with relevant information present in it. @NickCraver following is a close JSON representation of what I was recommending:
Having explained this, the consensus is definitely towards having a strongly typed API Requests / Questions This issue was opened for DataReader. @FransBouma has mentioned the DbConnection.GetSchema(...) API. The schema returned by the DataReader is the Column schema however the DbConnection covers a much wider range of schema at different levels. |
@saurabh500, I understand the motivation for weakly-typed results you explain above. If a single API method (i.e. DbConnection.GetSchema()) needs to return different results based on the query its given, then there's indeed a problem with strongly-typed results. I think the answer is to replace the single, multi-result GetSchema() with multiple methods, each of which returns a single result type. In other words, instead of having one method returning different things based on its input (the query), have several methods which each return only one thing. So you would have GetDatabaseSchema, GetTableSchema, and so on. Note the JDBC approach of retrieving a DatabaseMetadata object from the connection, which you can then query with specific methods; this has the advantage of concentrating all the schema-related methods in a single place that's separate from DbConnection (however, note that in JDBC these methods return a weakly-typed ResultSet which isn't good). Regarding an overlap between DbConnection.GetSchema() and DbDataReader.GetSchema()... IMHO, an overlap isn't necessary related to making the API strongly-typed or not - even if there isn't any overlap anywhere a strongly-typed API is still preferable to a weakly-typed one. I think still there's some potential for overlap, e.g. with DbSchemaColumn with may be returned from both. Finally, the JSON representation seems good in principle (assuming it gets translated to a strongly-typed API). However, please consider a clean distinction between SqlServer properties and general properties which make sense for all relational database. So |
I am all for: The challenge being of course to agree on what belongs in the "general" model |
I don't really see how they should be related. DbConnection.GetSchema returns a different schema based on input and ADO.NET provider: not every provider provides the same schemas and not every provider provides the same layout in the returned schemas. The method is currently not that useful if you don't know what provider you're dealing with.
Not true. Perhaps on SQL Server, but ODP.NET doesn't return that information, as it doesn't even have the concept of a catalog.
and ADO.NET provider. Please look over the fence what other databases there are. It's useless to create a 'generic' API which is actually designed for SQL Server.
You're limiting both with this not-needed restriction. DbDataReader.GetSchemaTable is used to work with the resultset. It's always the same, for every ADO.NET provider, so you can write generic code to deal with it. DbConnection.GetSchema() differs per ADO.NET provider and also per input. It can't be used to write generic code as it's specific per ADO.NET provider. Even different ADO.NET providers for the same DB type (e.g. MS' own Oracle ADO.NET provider and ODP.NET from Oracle itself) return different schemas and information per schema. Any API exposed which hasn't been designed with at least the major databases in mind is a lost cause and a missed chance. This is a generic API so it has to be usable across all major databases. That means that a concept like a strongly typed UDT with schema info has to be available, it shouldn't rely on catalogs (or schemas (as a concept of container for database elements, like 'dbo' on sql server), as MySQL doesn't support schemas but does support catalogs etc.) alone, to name an example. And again, there are 2 different schemas here: one being about meta-data of the schema in the RDBMS itself (through DbConnection.GetSchema()) and one being about the meta-data of the active resultset (through DbDataReader.GetSchemaTable()). They are different, as they provide different information and the former is flexible and can return anything, the latter is fixed as it returns always the same thing, and has to be predictable so generic code can be written. Ideally it would be best if DbConnection.GetSchema provided a fixed set of schemas to query and each resultset for a schema requested has a pre-defined layout. If that layout is left to the ADO.NET provider, we'll get what we have today, which is a poorly designed, unusable mess (the DbConnection.GetSchema method that is), as some ADO.NET providers provide not enough info or totally different info than others. E.g. it's not possible today to use DbConnection.GetSchema to obtain meta-data for a database schema to do reverse engineering of schema to object models on any major RDBMS. But, I'm wondering, why are we even having this debate when there has been at least one Microsoft employee years ago who was clever enough to include GetSchemaTable() in the API in the first place. The reasoning behind it then might be a good start, instead of doing all that work again and worse: making all the same mistakes. |
@FransBouma just one small comment... Although database schema concepts vary across databases (catalogs, schemas), there's still a lot of common ground to capture in a generic API. Concepts such as tables, columns, indices/restrictions etc. seem to be pretty universal. As an example, Npgsql implements the current DataTable-based DbConnection.GetSchema() in a way that closely resembles SqlClient, allowing generic code to use that method to some extent. Another good example is EF7's reverse engineering, which models schemas in a database-independent way; they're doing a pretty good job at it and I actually think it's pretty much a duplication of what this issue is about (see dotnet/efcore#4091). So I really think both APIs being discussed here (both the DbConnection one and the DbDataReader one) should strive to be generic. |
@roji I've written (many years ago) a general purpose system for reverse engineering metadata to entity models for LLBLGen Pro and it's indeed very doable to capture a lot of the common elements across DB types in a general purpose class library. Most databases indeed contain common concepts like a table, a procedure/function, a type, table field, perhaps even views etc. That's the easy part ;). With this, the devil is in the details. Like I described earlier: UDTs which both have a .NET type and also an underlying DB specific type (catalog/schema/elementname) on sqlserver, but not on oracle. A value in a resultset which is a BFILE type and should be read using a reader, preferably a generic one so generic code can deal with that value as more DBs support a BFILE like type (but not all of them). Just a few things that come to mind. As I know what's needed to reverse engineer relational models to entity models, I also know it's not a 100% sufficient model for all cases: not all meta-data is needed for reverse engineering, and it can very well drive decisions in the datastructures: why bother with meta-data like packages which contain overloaded stored procedure definitions in Oracle if they're not needed for reverse engineering relational model data to entity definitions? :) So the scope is (much) wider than that. It's a good start, for sure. It's not a solution that can be used to implement what's currently lacking (as more is needed). I also want to stress that schema meta-data info is nice for DbConnection.GetSchema, but for the topic here, DbDataReader.GetSchemaTable(), the actual type info is needed per column, for the active resultset, which is IMHO a total different thing: for GetSchemaTable you don't need table/view/proc metadata, but type information, and additionally e.g. to read a BFILE value if a column is of type BFILE, in a generic manner. |
I support @FransBouma's sentiment here on them really being separate APIs. The @saurabh500 To answer your question:
Let be get the ball rolling here (and iterate), here's what I would want to carry over these items from GetSchemaTable() today: These would be on a base type, something like
These would move into a SQL Server inherited type, like
...and I have no strong thoughts on In addition to these strongly typed properties, there should be a What other base properties would others like to see added? Disclaimer: Dapper already takes a reference to SQL Server packages for other reasons, so our thoughts on what is and isn't in the Thoughts? |
@NickCraver needs |
precision/scale needs to be in base (all db's support it) and udt type too (as some DBs support it). The thing is that if one needs to cast to an ADO.NET provider specific column type (e.g. SqlColumn) it has to be because there's no other DB supporting that feature. I think the set of these features is very small, and if even 2 databases support a given feature it should be in the base class. it otherwise makes things very cumbersome to use in generic code: the current DataTable based result is usable in generic code, no matter what DB is used, and I think that is a use-case that has to be met with the new system as well. |
@FransBouma accessing in a datatable by name shouldn't be much different to accessing via the indexer Nick mentioned - especially if it simply returns |
@roji In theory I agree; there's a slight gotcha there that the matching column |
@NickCraver Well, are we talking only about ORMs or ado.net in general? The advantage of dynamic approach (or hybrid with dynamic/properties and indexer) has the advantage of strong typed/intellisense approach when needed. I can imagine supporting a single database and then strong typing comes handy. |
@mgravell Or you can still check if a property exists through reflection. I agree it is not ideal. |
@NickCraver Well, I am that expert 😄 - and they are two distinct database engines with separate ado.net providers. |
Thanks for the inputs folks. We will incorporate the mentioned list attributes summarized by @NickCraver to model the Data structure for this. Thanks for the inputs. To put this in perspective of the DbConnection.GetSchema(), DbColumn represents the schema details of a column. We are considering creating base Types for the different schema objects and creating relationships between these types to represent the Database schema. E.g. DatabaseSchema represents the Database Each DatabaseSchema contains a list of DbTable, DatabaseName (and other properties) There are more concepts that are needed for the DbConnection.GetSchema() This thread is focused around the DataReader.GetSchemaTable() I will open another issue to discuss the DbConnection.GetSchema() |
I have added dotnet/corefx#5024 for the DbConnection.GetSchema discussion |
@saurabh500 where can we see progress on the API scheme discussed here? Given the major problem has been communication and RTM rapidly approaching, I know that at least for myself, C# is a lot clearer than any words here. To be honest, I'm not even sure what this means (especially given ongoing discussion about what should be in the base class):
...can we track actual dev or API review progress somewhere? I know several of us would be happy to help via PRs, etc. |
@NickCraver I will put the pull request or the dev branch link here to track progress of the code. |
@saurabh500 any update? |
1 similar comment
@saurabh500 any update? |
@NickCraver I have added the DbColumn to the System.Data.Common. I have exposed a few fields in SqlClient to implement this API. |
@saurabh500 thanks - we look forward it! |
Linking initial PR discussion here: dotnet/corefx#5609 |
We discussed this today. We believe the current proposal isn't ready yet. @saurabh500: once you updated the proposal, please edit the description of the issue and copy & paste it there. Also, remove the |
@terrajobst PTAL |
API for GetColumnSchemaProblemDbDataReader on .Net Framework, provides an API called ProgressThe data structure DbColumn along with the necessary attributes which should be present namespace System.Data.Common
{
// DbColumn contains the base attributes that are common to most Database columns. An indexer
// has been provided in case a property added by a provider needs to be accessed without a dependency on the provider implementation.
public class DbColumn
{
public virtual bool AllowDBNull { get; set; }
public virtual string BaseCatalogName { get; set; }
public virtual string BaseColumnName { get; set; }
public virtual string BaseSchemaName { get; set; }
public virtual string BaseServerName { get; set; }
public virtual string BaseTableName { get; set; }
public virtual string ColumnName { get; set; }
public virtual int ColumnOrdinal { get; set; }
public virtual int ColumnSize { get; set; }
public virtual bool IsAliased { get; set; }
public virtual bool IsAutoIncrement { get; set; }
public virtual bool IsExpression { get; set; }
public virtual bool IsHidden { get; set; }
public virtual bool IsIdentity { get; set; }
public virtual bool IsKey { get; set; }
public virtual bool IsLong { get; set; }
public virtual bool IsReadOnly { get; set; }
public virtual bool IsUnique { get; set; }
public virtual int NumericPrecision { get; set; }
public virtual int NumericScale { get; set; }
public virtual string UdtAssemblyQualifiedName { get; set; }
public virtual Type DataType { get; set; }
public virtual string DataTypeName { get; set; }
public virtual object this[string property] {}
}
} The next part of the API is to expose a ReadOnlyCollection from the DbDataReader The right way of exposing the API would be to add an API in the DbDataReader class called This is however not possible in .Net Core current version as it would hinder portability of apps created on .Net core with .Net framework. Based on the contents of the link above, the following rule is allowed The restriction is due to type forwarding, which needs the calls to DbDataReader to be forwarded to the .Net framework's DbDataReader. ProposalWe introduce an interface called IDbColumnSchemaGenerator, in .Net core which can be implemented by a type which can provide the Column Metadata. namespace System.Data.Common
{
public interface IDbColumnSchemaGenerator
{
System.Collections.ObjectModel.ReadOnlyCollection<DbColumn> GetColumnSchema();
}
} The subclasses of DbDataReader in the ADO.Net providers implement this interface. public class SqlDataReader : DbDataReader, IDbColumnSchemaGenerator
{
public ReadOnlyCollection<DbColumn> GetColumnSchema(){
return ... ;
}
} In .Net core, to allow the Column Metadata to be exposed from the DbDataReader, an extension to the DbDataReader will be provided. This extension has the same signature as the interface method. using System.Collections.ObjectModel;
namespace System.Data.Common
{
public static class DbDataReaderExtension
{
public static System.Collections.ObjectModel.ReadOnlyCollection<DbColumn> GetColumnSchema(this DbDataReader reader)
{
if(reader is IDbColumnSchemaGenerator)
{
return ((IDbColumnSchemaGenerator)reader).GetColumnSchema();
}
throw new NotImplementedException());
}
}
} It is recommended that the extension be used by consumers, for compatibility with .Net Framework. The details are mentioned in the compatibility section below. ProducerThe ADO.Net providers are the producers of the Column Schema. The interface will be implemented by the sub-classes of E.g. class SqlDataReader : DbDataReader, IDbColumnSchemaGenerator
{
...
public ReadOnlyCollection<DbColumn> GetDbColumnSchema()
{
ReadOnlyCollection<DbColumn> metadata = ... ;
// Build the list of columns metadata. Each columns metadata is represented by the DbColumn
return metadata; //
}
...
} ConsumerThe consumers can retrieve the column metadata using an extension method on the DbDataReader. This extension will be provided in System.Data.Common For a library consuming the using System.Data.Common;
namespace org.example.my
{
...
public void ProcessMetadata(DbDataReader reader)
{
ICollection<DbColumn> columnMetadata = reader.GetColumnSchema();
}
...
} Compatibility of libraries written on .Net Core with .Net FrameworkThe extension method will have a different implementation in the partial facade to get the Schema. The implementation will use the DbDataReader.GetSchemaTable() api which returns the schema as a Datatable. vNextChange in .Net Framework (Desktop)In the next version of .Net Framework we add a virtual method on the DbDataReader to allow access to schema using namespace System.Data.Common
{
public abstract class DbDataReader : ...
{
...
public virtual ReadOnlyCollection<DbColumn> GetColumnSchema()
{
// Map Datatable to the DbColumn as default implementation
// Providers can override the implementation for their requirement.
// The implementation will be similar to the implementation in
// partial facade implementation for .Net core v1.0
}
}
} .Net Core: vNextFor a vNext of .Net core which targets .Net Framework with the API added to it, we can introduce the same API on .Net core with a default implementation in the base class The base class virtual function will have the same signature as the interface. namespace System.Data.Common
{
public abstract class DbDataReader : ...
{
...
public virtual ReadOnlyCollection<DbColumn> GetColumnSchema()
{
if(this is IDbColumnSchemaGenerator)
{
return ((IDBColumnSchemaGenerator)this).GetColumnSchema();
}
throw new NotImplementedException();
}
}
} The vNext of .Net core will have a changed implementation of the Extension method where the extension method will call the base class function to retrieve the schema. This will allow consumer DLL compiled on .Net core v1.0 to be able to run on vNext without any changes. Any code compiled on .Net Core vNext will compile against the base class function as the member methods get precedence over the extensino method. ExamplesConsumersA consumer written on .Net Core on v1.0 In this case, the code will be compiled with the Extension method. using System.Data.Common;
namespace org.example.my
{
...
public void ProcessMetadata(DbDataReader reader)
{
ICollection<DbColumn> columnMetadata = reader.GetColumnSchema();
}
...
} In vNext of .Net core If the consumer is recompiled, the code will be compiled with the base class DbDataReader which has the virtual method added to it. The base class virtual method takes precedence using System.Data.Common;
namespace org.example.my
{
...
public void ProcessMetadata(DbDataReader reader)
{
ICollection<DbColumn> columnMetadata = reader.GetColumnSchema();
}
...
} ProducersThe producers in the .Net Core v1.0 don't have to implement the interface IDbColumnSchemaGenerator in v1.0 In vNext, they can continue to implement the interface and not change any code. However, they can choose to override the base class method to provide the Column Schema as well. Either ways the consumer will not need to change their code to consume the Schema For new provider implementation, it is however recommended that they |
@saurabh500 Thanks for writing all this up. Just a few minor improvements I see: namespace System.Data.Common
{
public abstract class DbDataReader : ...
{
public virtual ReadOnlyCollection<DbColumn> GetColumnSchema()
{
throw new NotImplementedException();
}
}
} I think it'd be better to put a comment (rather than the
One other item: I'd use something along the lines of |
@NickCraver The second part about the name of the interface In case another class is internally created by the provider to generate Column Schema, this interface can be used by them. E.g. If there is an class or technique which generates the schema for the database this interface can be used to populate the columns using a different implementation than the DbDataReader. This was the thought process behind the name and mapping it to DbColumn Schema rather than the interface.
Thoughts? |
Thanks for writing this up. In order to keep things organized I propose that:
As refinements are made, we update the description. The reason I say this is because finding the latest version of a proposal is otherwise a daunting task. |
@terrajobst I have opened issue dotnet/corefx#5915 for API review @NickCraver Can you close this issue? We can follow this up on dotnet/corefx#5915 |
Thanks! |
where i can download this api?
|
This is to unwrap the specific issue of resultset schemas from the many involved in #14302.
Back on May 5th @davkean noted:
While it doesn't seem that anyone cares what method exists for of getting the schema of a resultset, we seem to agree on needing a method of doing it in .Net Core. For example, if a user runs a simple query such as:
Then there's no way to see what the types are. While getting the .Net type (via
.GetFieldType()
, e.g.DateTime
) is possible and getting part the SQL type (via.GetDataTypeName()
, e.g.NVARCHAR
) is possible, other vital attributes of the schema shape currently aren't possible. For example there's no way to distinguish avarchar(20)
from avarchar(max)
. There's also no way to tell if the column was nullable. Previously, this functionality was provided by.GetSchemaTable()
which returned aDataTable
class.Currently I have a very simple use case: query data here, create a table with that data elsewhere. The query can be something like:
SELECT * FROM sys.database_files
. I point this out to illustrate thatINFORMATION_SCHEMA
(as proposed by some in #14302 as an alternative) is not a viable option. The same issue presents on any stored procedure which can also return any result set shape depending on the code paths. We need a way to see the scheme of a result set, these database-query alternatives are both expensive and non-covering.This is a hard blocker for projects of myself and others. I have uses in both one-off projects as well as Opserver (which monitors SQL) and Dapper.Net. I know from #14302 that I'm not alone on this block. The portability of .Net core is extremely appealing for certain projects, especially those where we need to monitor the innards of SQL without depending on a .Net install (many if not most existing SQL servers only have .Net 3.5). Unfortunately though, this one missing piece of the puzzle may prevent using it altogether, forcing full CLR installs.
While SqlClient specifically serves my blocking use case, it's important to point out as @FransBouma notes in the other issue: this needs to be in the base API, not specifically up in SqlClient. For example, I also need this for Dapper features and it works across many providers - it is not at all SQL Server specific.
So how about it, can we please get an API for result set schemas added back? I'm happy to contribute to designing and provide feedback on a new API here if that's the best route. I know many others are as well.
The text was updated successfully, but these errors were encountered: