.NET Forum / Languages / C# / August 2006
Object oriented method is inefficient with database calling
|
|
Thread rating:  |
TS - 10 Aug 2006 21:58 GMT Say i have a class car with properties: Color, Make, Model, Year, DriverID And a Driver class with properties: DriverID, Name
The driverID PRIVATE property is the id of the driver from say a driver table (t_driver). This has a PUBLIC property accessor called Driver
My understanding of OO using the composition model is that when you want to load up a car class, you would access the DB to get Color, make, Model, Year, DriverID and load the Car class. When the PUBLIC property Driver is accessed, it would use the PRIVATE property DriverID to call the DB and load the Driver class.
In this example, 2 separate DB calls would be made to load the Car and Driver classes. I have DB/network people at work saying that these calls should be done at once to cut down on network traffic and DB calls.
Any comments as to why my way is not OO sound or firepower I could use to tell to my peers?
Thanks so much!
Jon Skeet [C# MVP] - 10 Aug 2006 22:55 GMT > Say i have a class car with properties: Color, Make, Model, Year, DriverID > And a Driver class with properties: DriverID, Name [quoted text clipped - 14 lines] > Any comments as to why my way is not OO sound or firepower I could use to > tell to my peers? Depending on the ORM system involved, you can often eagerly fetch things - issue a query which will load the car and the driver at the same time using a join.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Linda Liu [MSFT] - 11 Aug 2006 03:48 GMT Hi,
I think you should take different strategies depending on the amount of records a database query to the table t_car returns.
If a database query to the table t_car returns a great deal of records a time, I think you'd better load a Car object for each record and leave the Driver property in the Car object aside. As you said, when the public property Driver is accessed, use the private property DriverID to access the database and load a Driver object.
In the above scenario, if you load the Driver object when you load the Car object, the network traffic would be very heavy because there're a great deal of Car objects after the query and you are going to load a Driver object for each Car object.
I don't think you should use a join to query the database in this instance either, because the join operation will consume a lot of resources as well.
The advantage of loading a Driver object when the public property Driver is accessed is that this avoids the amount of querying database being too much a time.
On the other hand, if you query the table t_car and get one record a time, you could load a Car object for the record and then access the table t_driver by the value of DriverID field in the Car object and load a Driver object. This won't cause network traffic being heavy.
Hope this helps. If you have anything unclear, please feel free to let me know.
Sincerely, Linda Liu Microsoft Online Community Support
================================================== Get notification to my posts through email? Please refer to http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif ications. Note: The MSDN Managed Newsgroup support offering is for non-urgent issues where an initial response from the community or a Microsoft Support Engineer within 1 business day is acceptable. Please note that each follow up response may take approximately 2 business days as the support professional working with you may need further investigation to reach the most efficient resolution. The offering is not appropriate for situations that require urgent, real-time or phone-based interactions or complex project analysis and dump analysis issues. Issues of this nature are best handled working with a dedicated Microsoft Support Engineer by contacting Microsoft Customer Support Services (CSS) at http://msdn.microsoft.com/subscriptions/support/default.aspx. ================================================== This posting is provided "AS IS" with no warranties, and confers no rights.
Jon Skeet [C# MVP] - 11 Aug 2006 06:22 GMT > I think you should take different strategies depending on the amount of > records a database query to the table t_car returns. [quoted text clipped - 12 lines] > I don't think you should use a join to query the database in this instance > either, because the join operation will consume a lot of resources as well. You think it's better to do 501 queries to retrieve 500 cars rather than 1 which has a join? I have to disagree. If you're pretty sure you're going to need to use the driver, then joining the two tables is the logical approach IMO.
> The advantage of loading a Driver object when the public property Driver is > accessed is that this avoids the amount of querying database being too much [quoted text clipped - 4 lines] > t_driver by the value of DriverID field in the Car object and load a Driver > object. This won't cause network traffic being heavy. It'll cause far more queries though. I would have thought that would actually mean heavier network traffic than fewer queries returning the same total amount of data in effectively bigger chunks.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Frans Bouma [C# MVP] - 11 Aug 2006 10:34 GMT > > I think you should take different strategies depending on the > > amount of records a database query to the table t_car returns. [quoted text clipped - 19 lines] > you're going to need to use the driver, then joining the two tables > is the logical approach IMO. Joins aren't the solution. It might be in a simple 2 node graph with a m:1 fetch (this particular situation) but once you have multiple graph paths, it's not the way to go as a join will make it impossible to formulate several common scenario's of prefetch paths in 1 query.
What's faster are 2 queries: one for the cars and one for the drivers. You then use a small trick. - if the # of master rows (cars in this case) is below a given threshold, do: select ... from drivers where driverid in (@D1, @D2, @D3.... @Dn) where @Dx is a driverid from the cars. - if the # of master rows is equal or above a given threshold, do: select ... from drivers where driverid in (select driverid from cars where <filter on cars>)
and you merge them on the client with hashvalues, which is pretty simple and straightforward and fast.
This gives the least amount of roundtrips and overhead and the queries always succeed, no matter how complex the graph of paths is, so you can create generic code to produce these queries.
Joins seem a logical choice, eventually with UNIONs, though they aren't. Also, when pulling master-detail data in a 1:n scenario from the db, it's often more efficient to use a subquery, due to the duplicates on the master-side it will give. (one of the serious performance issues in DLinq, if they fail to implement a subquery directive hint)
> > The advantage of loading a Driver object when the public property > > Driver is accessed is that this avoids the amount of querying [quoted text clipped - 9 lines] > actually mean heavier network traffic than fewer queries returning > the same total amount of data in effectively bigger chunks. I agree, lazy loading on the car object to pull the driver from the db is not the way to go if you want to load ALL drivers associated with a loaded set of cars.
Lazy loading is only useful if you want to pull related data of a small subset of entities on an occasional basis from the db, not in graph-oriented fetches.
Frans
 Signature ------------------------------------------------------------------------ Lead developer of LLBLGen Pro, the productive O/R mapper for .NET LLBLGen Pro website: http://www.llblgen.com My .NET blog: http://weblogs.asp.net/fbouma Microsoft MVP (C#) ------------------------------------------------------------------------
Jon Skeet [C# MVP] - 11 Aug 2006 17:55 GMT > > You think it's better to do 501 queries to retrieve 500 cars rather > > than 1 which has a join? I have to disagree. If you're pretty sure [quoted text clipped - 5 lines] > paths, it's not the way to go as a join will make it impossible to > formulate several common scenario's of prefetch paths in 1 query. Absolutely - there are certainly situations where it doesn't work. However, where it *does* work I don't see why it's not an appropriate solution - and as you've said, the situation specified in the question will work.
Of course, it helps to have an ORM solution which allows you to specify this kind of thing.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Bruce Wood - 11 Aug 2006 06:55 GMT > Say i have a class car with properties: Color, Make, Model, Year, DriverID > And a Driver class with properties: DriverID, Name [quoted text clipped - 11 lines] > Driver classes. I have DB/network people at work saying that these calls > should be done at once to cut down on network traffic and DB calls. We use exactly this system where I work, but we have some tricks to cut down on the number of queries.
1. If we know that the number of records will always be small (say, a few hundred at most) then we design the data O-O layer to fetch the entire table when the first item is requested for that table. So, if you know that you're only ever going to have a hundred or so Drivers, then the first request for a driver via a DriverId fetches all drivers and caches them. This transfers more data, but the query is fast (no conditions) and there is one query and then that's it.
2. We have collection fetches: "get me the drivers for all of these cars" that result in a single query. For example:
CarCollection cars = new CarCollection(); ... add five cars to the collection ... Drivers driversForCars = cars.GetDrivers();
results in a query like this:
SELECT * FROM DRIVERS WHERE DRIVER_ID IN ( 15, 24, 17, 2, 63 );
Again, one query gets you the drivers for all cars in the collection. Compare this with:
foreach (Car c in cars) { Driver d = c.Driver; ... }
Here, we either do this:
Drivers driversForCars = cars.GetDrivers(); foreach (Driver d in driversForCars) { ... do something ... }
or this:
cars.GetDrivers(); foreach (Car c in cars) { ... do something with c.Driver ... }
The latter, of course, assumes caching: once you have fetched a car's driver you don't fetch it again. Caching may or may not be viable depending upon whether multiple users need to be able to see each others' changes in a timely manner.
Frans Bouma [C# MVP] - 11 Aug 2006 10:46 GMT > Say i have a class car with properties: Color, Make, Model, Year, > DriverID And a Driver class with properties: DriverID, Name [quoted text clipped - 16 lines] > Any comments as to why my way is not OO sound or firepower I could > use to tell to my peers? 'why my way isn't OO' is a question for a discussion between pundits ;), why or why isn't something OO and if it's not, it therefore must be bad is something not useful to waste time on. So let's drop that.
the thing you're worrying about is that because you're targeting the problem with an OO approach will result in a lot of queries. This in general is the case, IF you use the easy-way-out method where you simply fetch a related driver when you're asked to do that from a car instance.
Though you can also use a little more advanced approach: define fetch paths for prefetching, or in short prefetch paths. These are paths along which related entities have to be fetched, eventually with filters.
So if you want to fetch a set of car entities, and their associated driver entities, you effectively define a path: Car - Driver. This should effectively result in 2 queries: one for Car and one for Driver, which resultsets are then merged on the client. There are some tricks you can use to do this even more efficiently, please see my other post in this thread.
Frans
 Signature ------------------------------------------------------------------------ Lead developer of LLBLGen Pro, the productive O/R mapper for .NET LLBLGen Pro website: http://www.llblgen.com My .NET blog: http://weblogs.asp.net/fbouma Microsoft MVP (C#) ------------------------------------------------------------------------
Bruce Wood - 11 Aug 2006 18:02 GMT > the thing you're worrying about is that because you're targeting the > problem with an OO approach will result in a lot of queries. This in > general is the case, IF you use the easy-way-out method where you > simply fetch a related driver when you're asked to do that from a car > instance. Note that this isn't an O-O problem at all. Once you add in tricks like caching and prefetching, you have to start wrestling with problems of data going stale: "If I prefetch / cache, how do I keep up with changes in the database? How do I make sure that the data within my application is current with the data in the database?"
Depending upon your application, the answer may range from "I don't care. I don't need to keep current with changes to the DB," to "I have to fetch every time because I need to be as current as possible with the DB." At this stage, the questions and problems are the same whether you're writing an O-O application or a 3GL application.
The difference I would submit, is that O-O requires more careful design in this regard, as it's harder to immediately understand what operations will result in database fetches. If you design your objects (as I did mine) so that a property reference:
Driver d = car.Driver;
may result in a database fetch, it's very difficult to know, looking at the code, where trips back to the database may occur. This doesn't make the O-O approach somehow inferior. All it means is that you have to take more care in architecting your solution. We're using a horribly slow ODBC connection, and we've managed to optimize our trips back to the DB. It can be done.
One change I would recommend is this: don't use a property, use a method:
Driver d = car.GetDriver();
I suggest this because in several cases we discovered that we wanted to be able to regulate the amount of information coming back for an object. In the case of a Driver, you may want merely a minimal object that stands in for a driver and knows the driver ID, but contains no additional information. Or, you might want everything about the driver. The first requires no trip to the database, while the second does. Wouldn't it be ugly if your architecture required client code to force a trip to the database when all it wanted was a marker representing a driver? So, we did something like this:
Driver d = car.GetDriver(DriverPart.Minimal);
or to get a full driver:
Driver d = car.GetDriver(DriverPart.All);
We have other objects that allow fetching of various subsets for various purposes. Of course, if you're doing a full read up front and caching all drivers, then this level of control isn't necessary. However, for objects that are always fetched on first read, it can be very useful. It also provides a visual clue of potential trips back to the DB.
Frans Bouma [C# MVP] - 12 Aug 2006 10:54 GMT > > the thing you're worrying about is that because you're targeting > > the problem with an OO approach will result in a lot of queries. [quoted text clipped - 7 lines] > up with changes in the database? How do I make sure that the data > within my application is current with the data in the database?" I don't think caching and prefetching are related. They're completely different things, and I don't thing what applies to caching applies to prefetching as well. UNLESS! you're defining 'prefetching' as 'fetching data way before it's perhaps needed'. In that case we're talking about two different things ;). I call 'prefetching' the fetch action you're doing together with another fetch action because you know you'll need it in the very near future, e.g.: you need both car and driver, thus you fetch both up front.
Stale data is always a problem where data is consumed outside the system where it's stored/kept. Though as that's a given, a developer has to realize that as soon as s/he fetches data from a table / view in the db, the data IS stale.
> Depending upon your application, the answer may range from "I don't > care. I don't need to keep current with changes to the DB," to "I have > to fetch every time because I need to be as current as possible with > the DB." At this stage, the questions and problems are the same > whether you're writing an O-O application or a 3GL application. true, but that wasn't the OO problem at hand I think. The OO problem at hand was more in the form of: - in an OO world you would focus on fetching each Car object individually and every Driver object individually vs. - in a set oriented world, you'd focus on the set of data you have to fetch, e.g. a joined list or 2 sets with a mapping between them (e.g. dataset with two datatables and a datarelation).
If you want to do the latter, but want to use OO objects, there's a friction, because they're not equal.
> The difference I would submit, is that O-O requires more careful > design in this regard, as it's harder to immediately understand what [quoted text clipped - 5 lines] > may result in a database fetch, it's very difficult to know, looking > at the code, where trips back to the database may occur. trips back as in, persisting data?
I see it like this: say the algorithm A consumes an X amount of data. That data is consumed in a period of time P. If it's ok for A that X is available before P starts, you could opt for an optimization to fetch X up front, and then proceed into P. If it's not ok for A to have X up front, you have to fetch X during P, and perhaps the amount you need at that given moment.
A is then efficient if A has the data amount X' available at time T when it needs X'. This thus means that when X' is needed, no delay in fetching should occur. IF fetching-on-demand (lazy loading) is enough to feed A with X' amount of data at any given time T, why bother with prefetching? IF fetching-on-demand isn't enough, prefetching is recommended.
You see, in the statement you gave, it's not important for the developer what happens below the hood. What's important is that after that line, 'd' points to the Driver entity of car, as that's what 'reality' is on the abstraction level the developer works on.
> This doesn't > make the O-O approach somehow inferior. All it means is that you have > to take more care in architecting your solution. We're using a > horribly slow ODBC connection, and we've managed to optimize our > trips back to the DB. It can be done. Oh, I didn't want to imply that an OO solution would be inferior, not at all. I just wanted to imply that the hard-core 'This is not OO and therefore crap!'-slogans aren't practical to work with ;) so it should be a basis to base a decision on.
> One change I would recommend is this: don't use a property, use a > method: > > Driver d = car.GetDriver(); It should be a method, indeed, because the operation can be expensive (if the related driver hasn't been read yet) and therefore the MS guidelines say you should create a method. The sad thing is though, what would you suggest for the other side of the relation: Cars c = d.Cars;
It then should be logical to have a method as well, correct? However, then you run into the problem with databinding. What if you want to bind the Cars collection of the selected driver in a grid to another grid? You can't do that with design time databinding or other simple constructs, it requires ugly glue code, not something you want to write, trust me ;)
Therefore, I think a property which calls the method (which is also there) is best. (I use that scheme). The method then can also have overloads which accept additional filters and other bells / whistles, and the property is there for easy access and databinding access.
FB
 Signature ------------------------------------------------------------------------ Lead developer of LLBLGen Pro, the productive O/R mapper for .NET LLBLGen Pro website: http://www.llblgen.com My .NET blog: http://weblogs.asp.net/fbouma Microsoft MVP (C#) ------------------------------------------------------------------------
Registered User - 12 Aug 2006 13:55 GMT >Say i have a class car with properties: Color, Make, Model, Year, DriverID >And a Driver class with properties: DriverID, Name [quoted text clipped - 14 lines] >Any comments as to why my way is not OO sound or firepower I could use to >tell to my peers? This is more about the relationship between Car and Driver than anything else. First I question the Driver type. it would seem a car's driver property would be null if the car's state property IsBeingDriven returns false. Although only one person at a time can drive a car [1] through its lifetime a car can have many drivers. Perhaps a type Owner might be more suitable.
In any case I would argue that the car type should know nothing about its owner or driver. In the real world the VIN [2] is used as the key to ownership while the driver's key is the physical key. In both cases there is a layer between the car and owner or driver. This suggests Car table Ownership table Owner table where the Ownership table contains the pink slip [3].
But this doesn't resolve the query question.
Now it might be wise to consider two car types keeping the existing Car type with the VIN. The second car type would be derived from Car and named FullyQualifiedCar.The FullyQualifiedCar type would be additionally populated with desired values from the owner table. What a particular piece of code does can determine which type of object to use. Overloading a FullyQualifiedCar c'tor to take a Car as an argument will be useful for situations where a Car exists and a FullyQualifiedCar may be conditionally needed.
regards A.G.
[1] excluding driver education vehicles with redundant controls. Isn't there is always a contrary real world example? [2] vehicle identification number, unique identifier [3] vehicle title, CarID, OwnerID
>Thanks so much! Chris Darnell - 13 Aug 2006 04:57 GMT The truely OO approach is that each class has the knowledge to load and save its own information. As an object is filled (vehicle, for example), it tells its Driver subclass to load its data based on the provided Driver ID. This is a very sound OO design, but can adversely impact the performance of the system.
There is no shame in having the Vehicle class also load/save the Driver information, also. The Vehicle class's Fill() method would also instantiate and fill a Driver object.
Arguments for the truely OO approach will depend upon the real design of your system. A simple Vehicle/Driver sample can easily be argued both ways. Consider the "bigger picture" that your system represents. What are the real tradeoffs for your design if you don't do the true OO approach? How many users will be using the system at the same time? How often is any given vehicle loaded? How many vehicles are returned at a time? Will the users only be working with one vehicle at a time?
Chris
Registered User - 13 Aug 2006 15:45 GMT >The truely OO approach is that each class has the knowledge to load and >save its own information. As an object is filled (vehicle, for >example), it tells its Driver subclass to load its data based on the >provided Driver ID. This is a very sound OO design, but can adversely >impact the performance of the system. You probably chose the wrong word with subclass but I pretty much agree with you.
>There is no shame in having the Vehicle class also load/save the Driver >information, also. The Vehicle class's Fill() method would also [quoted text clipped - 3 lines] >of your system. A simple Vehicle/Driver sample can easily be argued >both ways. Absolutely correct.
>Consider the "bigger picture" that your system represents. >What are the real tradeoffs for your design if you don't do the true OO >approach? How many users will be using the system at the same time? >How often is any given vehicle loaded? How many vehicles are returned >at a time? Will the users only be working with one vehicle at a time? Excellent points. All too often the first design consideration becomes the only design consideration.
regards A.G.
Chris Darnell - 14 Aug 2006 17:17 GMT <snip/>
> You probably chose the wrong word with subclass but I pretty much > agree with you. <snip/>
> regards > A.G. Thanks, A.G. I was a little tired when I wrote that. I reckon the Driver would NOT be a subclass of Vehicle. It would probably be considered a "child class".
Chris
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|