String Theory Live Stalk
Sep 09

A little while ago, I was reading up on Ruby on Rails and started thinking about database performance. Dynamic SQL that happens auto-magically is great, but should the application tier really always be pulling over a bunch of database records into memory everytime it needs to manipulate data? Even if that data never needs to be displayed? I thought there might be a way to save some of those roundtrips.

Rails

The Rails framework does a fantastic job making it easy and obvious how to build and structure web applications. By making some assumptions about how you should construct your domain model and codebase, Rails can piece together your application without requiring you to explicitly define which files, modules, pages and tables should wire together.

This philosophy can be summarized by two major ideas: DRY (Don’t Repeat Yourself), and Convention over Configuration.

“Don’t repeat yourself” means that information is located in a single, unambiguous place. For example, using ActiveRecord, the developer does not need to specify column names in class definitions. Instead, Ruby can retrieve this information from the database.

“Convention over Configuration” means a developer only needs to specify unconventional aspects of his application. For example, if there’s a class Sale in the model, the corresponding table in the database is called sales by default. It is only if someone deviates from this convention, such as calling the table “products_sold”, that he needs to write code regarding these names.

Ruby On Rails: Wikipedia

By following those 2 general principles, Rails can get you most of the way towards a running application with almost ZERO work on your part. The framework owes a lot of its magical abilities to the language it is written in, Ruby.

Ruby

Ruby is an Object Oriented, Dynamic Scripting Language that allows near English pseudo code to run as actual code. This follows from the general philosophy of the language’s creator, Yukihiro “Matz” Matsumot, who said:

Often people, especially computer engineers, focus on the machines. They think, “By doing this, the machine will run faster. By doing this, the machine will run more effectively. By doing this, the machine will something something something.” They are focusing on machines. But in fact we need to focus on humans, on how humans care about doing programming or operating the application of the machines. We are the masters. They are the slaves.

Ruby on Rails

Essentially, the combination of Ruby and Rails create a development environment with no compiling, no deploying, no restarting servers, no running sql scripts, no table/model mappings, and NO wiring together of your application.

Dynamic server refreshes, integrated database migrations, integrated testing, and generators to create your placeholders for new Models, Views or Controllers are the norm.

Using Rails feels like magic at first, and then feels utterly obvious after a short time.

Database vs Application Responsibility

Rails is firmly entrenched in the idea that the application tier should encapsulate ALL of your business logic, and ALL of your data integrity. The only database managed constraint that is part of Rails are Primary Keys.

Why? Well, it’s actually a pretty long argument, but the crux of it is that Rails’ inventor, David Heinemeier Hansson, designed the framework to work with “application” databases. The theory goes that true integration between applications should happen through a Message Oriented Middleware, REST, or some other “Over Application” integration pattern. So your database should only be accessed by a single application, which should be responsible for the data integrity. This database management should not happen inside the database tier. The database is for persisting and retrieving data, that’s it. He makes no apologies about it — the framework is simply not designed to do that. Rails is often self referred to as Opinionated software, and this is one of the reasons.

Short Database Tier History

As some history, the debate about whether or not the database should encapsulate any logic and/or data integrity is a long and heated one that often resembles a religious debate.

Check these out for a few reading examples (note that comments go for 2 years on one of these posts):

Prior to Dynamically Generated SQL being the norm (through ORM tools in Java, Rails, C#, etc) the public debate mainly won out with Stored Procedures being the preferred approach. At this point, the mainly used options were embedded SQL in the application tier, or Stored Procs for CRUD. This was certainly not a “set” opinion or a “consensus” opinion, but it was a frequently used pattern.

The main reasons for and against using Stored Procedures for database interactions were:

  1. Pro - Performance:
    • + Network Bandwidth: Data did not have to be sent across the wire
    • + Database Caching: Stored procedure commands could be loaded up with query plans ready to go
    • + Precompilation: Some databases could actually go ahead and compile stored procedures
  2. Pro - Security:
    • + Permissions Tied to Data Manipulation: In a database accessed by many users or applications, stored procedures could restrict access to the correct user types
  3. Pro - Code Reuse:
    • + Cross Application API: The database creates an API to be used to modify data. This API can be used across applications to ensure data integrity
    • + Data Integrity: Stored procedures, triggers, keys, etc can be used to ensure that data cannot be corrupted at a data tier level, no matter which users or applications access them.
  4. Con - Code Maintainability
    • - Modifications: When stored procedures are used for CRUD, changes to tables require many modifications upstream through all tiers of the application. The same is true for API changes that must be migrated across all applications.
    • - Troubleshooting: Databases tend to give ambiguous error messages that are often difficult to track down. This can make troubleshooting extremely difficult.
    • - Business Rules Encapsulation: Putting logic in the database represented another tier to spread business logic across. Logic was difficult to track down and view all at once. Anyone who has ever worked with triggers has seen this first hand… many wasted hours with a strong feeling of stupidity when you finally figure out that your changes were not magically disappearing, but were coded to do so.

Dynamic SQL Hits the Mainstream

The pro-stored procedure line of arguing seems to change pretty drastically once Dynamic SQL hits the mainstream. After the last couple arguments pro stored procedure are made (circa 2003-2005), ORM technology starts really taking off. In the Java world, Hibernate is becoming the database interaction “pattern” of choice. Also speed up to 2005-6, when Rails starts to become popular. As a whole, holistic OOP solutions seem to be the overwhelming choice for building applications (scary how people seem to still be arguing this up to even now). Using procedural languages, even for smaller portions of applications, is now starting to be universally discouraged as archaic and extremely difficult to maintain. I know this sounds odd, but read through some of those posts and check the dates.

Modern thinking among the giants (Martin Fowler, etc) is that the way to integrate across applications is not through the database, but through interfaces at the application tier (REST, MoM, etc).

Hardware philosophy is also changing. Databases continue to be somewhat difficult to scale, while Web Server Farms are becoming increasingly cheap and simple to scale horizontally. In fact, some databases (Oracle, allegedly), instruct that massive scaling requires removal of stored procedures and constraints from the data tier. Many Data Centers also connect Application and Database tiers with multi GB/sec LAN connections, making the network bandwidth a far lesser concern.

So, to summarize, the argument in favor of/against using stored procedures now looks like:

Arguments for using Stored Procedures in Applications (circa 2007):

  1. Pro - Performance:
    • + Network Bandwidth: Data sent across the wire is still a concern, but it doesn’t raise as many red flags with modern ethernet connections. However, LESS data IS sent and serialized.
    • +/- Database Caching: Dynamic SQL is often now cached by database servers, becoming less of a concern. Stored procedures also cache. This argument becomes a wash.
    • +/- Precompilation: This is not cited as an argument as frequently
  2. Con - Scaling:
    • - Application Tier Scaling: Since the web farm is simpler to scale than the database, it is cited as a reason to make the web tier responsible for data integrity and encapsulating ALL business logic.
    • - Database Load: It is now sometimes cited as a negative for stored procedures that they will actually increase the database load (where scaling is more difficult).
  3. Con - Security:
    • - Permissions Tied to Data Manipulation: Tying permissions and creating many database user roles seems to have gone out of style
  4. Con - Code Reuse:
    • - Cross Application API: Most modern integration patterns do not include this technique. There ARE still MANY people working with systems like this, but modern framework design seems to abstain from creating legacy system support a focus.
    • +/- Data Integrity: Data integrity tier-responsibility continues to be a fierce debate. Since I will make the assumption that the single application database is the more popular pattern going forward, this argument will just be a wash. Many people argue that frameworks such as Rails should at least derive their relationships from the foreign keys in the database to be more DRY compliant. However, this is not high on the list of activities for the Rails Core Team, as the existing system seems to perform acceptably to the founder and Core team.
  5. Con - Maintenance:
    • - API Changes: Sometimes still cited as a positive for stored procedures, but mostly cited as a nightmare in very large systems. Among the negatives are the high number of changes needed to be made if any input or output parameters change. These changes will force modifications across any applications using them. This can also cause duplicated code to creep into the API. Pro stored procedure arguers often seem to continue to counter using embedded SQL, instead of domain model encapsulation. This pre-2005 argument will still pop up in newer posts.
    • - Business Logic Encapsulation: Overall, the lack of “encapsulation” of business logic and allowing it to spread across tiers makes it very difficult to infer what is happening across those tiers and modify the code appropriately.
    • - Database Portability: Stored procedures are very lacking in standards and must be coded for the database in question. This makes it very difficult to be “database agnostic”.
    • - Unit Testing: Stored procedures tend to be very hard to isolate with unit tests. The tools to unit test are often provided in a completely different layer (ie, Java SQLUnit)
    • + Deployment: Procedural code changes can be deployed easily to the database, often without impacting overall application performance much. This is often cited as a “real world” argument. However, it is often specified with a caveat acknowledging it goes around more extensive restrictions for deploying new application tier packages.
  6. Pro - Special Needs:
    • + Performance Hacks: I’m not really sure what to call this, but in a lot of the discussions, there seems to be a consensus that in some special cases where performance cannot be achieved in any other way, stored procedures are the way to go.
    • + One Size Does Not Fit All: The other logical argument seems to go that ORM does not work in every situation, nor does using Stored Procedures for everything. In general, a broad sweeping, absolutely always DO, or absolutely always DON’T use one or the other will not give you the best solution to your problem.

The general agreement seems to be that stored procedures still hold a place where performance must be achieved at all costs (this performance gain is still disputed by some people), or where some special database API must be used.

Other than that, the best way to summarize the current state of affairs is that people looking to the future of development seem to be leaving stored procedures well out of the picture.

Of note, people working with large legacy systems that integrate using the database still interact heavily with stored procedures.

Caveats:

There are a few caveats worth mentioning.

Oracle: Users of Oracle will often mention that Java code can be deployed to the database. This is not a widespread usage from what I can tell, primarily because it usually requires using an old version of Java, and doesn’t fundamentally change interaction with the database. Running code in a JVM on the database still requires serialization to communicate. This seems to strongly resemble a normal Java tier, with only network bandwidth saved. All of the other advantages and disadvantages seem to still apply.

MS SQL Server: In several threads, users of Microsoft SQL Server brought up that C# code can be run in the database tier. In general, this seemed to have the drawbacks of poor organization of the code, continued requirements of keeping the stored procs and server code in sync (although Visual Studio seems to make this somewhat easy), and it still has to be run in a separate environment (CLR) from the database engine. To be fair, I haven’t used this, so I may be missing some things here.

Ok. Now you are up to speed. Please inform me of any arguments I left out.

Is it Time for Stored Procedures Sans Stored Procedures?

Before researching this post, I was convinced I had this great idea to improve performance of modern web applications using Rails. I even think it follows the Rails and Ruby philosophies.

Now I need some feedback.

If you could offload logic to the database that requires multiple transactions or loops through large data sets, but you could write all of your code in plain old Ruby, never even being aware of the database code being generated, would you? You could reuse your existing Domain Model classes and encapsulate all of your logic in these models using some conventions and some extra parameters.

In the middle of writing up this history, a popular Ruby/Rails blog called errtheblog posted a way to interact with the database using plain Ruby code called Ambition. This was an encouraging step towards what I had envisioned - a pure Ruby way to interact with a database where the framework does any work necessary to translate code to the database in question. The next step, if any, was to see if there was a still a worthwhile reason to send logic or data manipulation to the database?

What would this look like?

Imagine an API like Ambition, but supplying Blocks of code that translate to stored procedures behind the scenes. The database interaction would be abstracted so that plain ruby methods would indicate to loop through a cursor or to perform calculations, while Rails Active Record objects still serve as the database abstraction tier.

If any stored procedure activities were called within a method, the encapsulating method would be assumed to be a stored procedure. Any database activities contained within an enclosing method block would attempt to be translated to a single procedure. Since all procedure generating logic would be written in OO code, you could leverage and extend existing ruby modules to interact with the database. The actual database code would be built using the encapsulating method’s name to define the stored procedure, its name, and its parameters. Parameters needing to be passed to the database would be extracted by parsing out the procedure logic in the calling tree of the method. After a second pass through the call, the needed parameters would be extracted.

The way I envision it, the code would not reuse existing stored procedures… but these wouldn’t be maintained by anyone but Rails, so why bother?

During development runtime, stored procedures would be generated using naming conventions and called behind the scenes. Your code would look like it was calling a regular ruby method. In production, a migration could be called to send all procs to the database during application upgrade.

Code modules could be unit tested just like regular old ruby.

So, what would you use this for?

I mean this as a question. I honestly don’t know what use cases make sense for dynamic stored procedures. What changes (if anything), if these are happening auto-magically? Here are some interesting questions:

1) Business Logic/Calculations: Would you write business logic in it? The code would still be written in Ruby for all intents and purposes of the developer. Does this make it OK to put business logic inside? It would be located alongside your existing logic and model classes.

2) Looping Sub Selects: Would you send deep selects within selects to the database? This could help with performance in some cases where joining requires too many clauses, too many tables and too much embedded logic.

3) Table Spanning Transactions: Would you send multiple updates and inserts to the database if they must occur in sync? (ie, history records, audit records, or just multi-table transactions)

4) Cursors for Updates: Would you send logic where you need to perform complicated updates that require retrieving and chunking through a ton of reference rows that don’t need to be displayed in any way to a user? (ie, billing updates, cross site subscription changes, or social network graph building)

So, what’s next? I think it’s a neat idea, but don’t have the time, resources or immediate need to build it right now. I’d love to hear implementation ideas and/or syntax for actually using a library like this. I have some specific implementation ideas, but it’s a little much to include in this long post.

What are your specific use cases where this would be useful/necessary. How often do you really need stored procs vs. just refactoring your data model and code?

I don’t want to start another huge debate on the merits of stored procedures. I do think it is worthwhile to see a debate on whether dynamic stored procedures are a course worth pursuing.

Blake


Bookmark and Share:



2 Responses to “Stored Proc-livities”

  1. Lucas Ward Says:

    That was an extremely well thought out description of the history and problems associated with using Stored Procedures. However, I’m still not 100% I understand what your proposing, although that could be because of my admitted ignorance of Ruby.

  2. Blake Byrnes Says:

    Hi Lucas,
    Thanks for the response.

    This isn’t really a Ruby thing, since you could theoretically do this in Java or another language as well. Ruby (or Python or Perl) just gives me a toolset to operate a little more dynamically with a DSL that could generate code that would run in the database. And using something like Rails makes it a little more natural to make convention over configuration choices and people are comfortable with a little “magic.”

    What I’m proposing building is a Domain Specific Language that would let you code logic that will run in the database (ie, a stored procedure) just like you use an Object Relational Mapping tool to generate dynamic SQL.

    The idea is to figure out if there is enough of a use-case to justify doing so. Clearly, you need to use SQL to do basic CRUD against the database. And we are used to using joins, subselects, etc to perform “deeper”, better performing queries. So we’re clearly not against leveraging the database to store and retrieve information, even if it is complicated to piece together. What I’m curious about is if there is a place for using Object Oriented code (within the normal confines of your domain models) to dynamically generate stored procedures that take advantage of the database.

    Essentially, I am trying to take advantage of “faster” processing in the database tier without the headache of maintaining and coding it in a database’s proprietary language. The “faster” element seems to still be an open question for some people, given the current state of Data Centers and Scaling that I listed above.

    Does this make more sense?

Leave a Reply