<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Brent Ozar PLFsqlserver | Brent Ozar PLF</title>
	<atom:link href="http://www.brentozar.com/archive/tag/sqlserver/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.brentozar.com</link>
	<description>Your technology pain-relief experts.</description>
	<lastBuildDate>Wed, 08 Feb 2012 14:15:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Understanding Database Choice</title>
		<link>http://www.brentozar.com/archive/2012/01/understanding-database-choice/</link>
		<comments>http://www.brentozar.com/archive/2012/01/understanding-database-choice/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 14:30:46 +0000</pubDate>
		<dc:creator>Jeremiah Peschka</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[bigdata]]></category>
		<category><![CDATA[sqlserver]]></category>

		<guid isPermaLink="false">http://www.brentozar.com/?p=13331</guid>
		<description><![CDATA[Everyone needs a distributed database right? Wrong. It&#8217;s easy to get so excited about trying out a new technology that you close your eyes to the problem you&#8217;re trying to solve. I came across a question on Stack Overflow where the poster was asking what kind of distributed database they needed. Rather than jump right...<p>...<br /><i>Upcoming free webcasts: <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=663314175">SQL and SSDs: A Valentine's Day Love Story</a> and <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=664876357">Anatomy of the SQL Server Log File</a></i>.</p>
]]></description>
			<content:encoded><![CDATA[<p>Everyone needs a distributed database right? Wrong. It&#8217;s easy to get so excited about trying out a new technology that you close your eyes to the problem you&#8217;re trying to solve. I came across a question on Stack Overflow where the poster was asking what kind of distributed database they needed. Rather than jump right into the fray and say &#8220;You don&#8217;t need it,&#8221; I took a step back and asked myself &#8220;What&#8217;s the real question here?&#8221;</p>
<div id="attachment_13336" class="wp-caption alignright" style="width: 210px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://www.flickr.com/photos/oakleyoriginals/4251240010/"><img src="http://www.brentozar.com/wp-content/uploads/2012/01/untested-code.jpg" alt="" title="untested code" width="200" height="261" class="size-full wp-image-13336" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Oh boy, it&#039;s an untested database!</p></div>
<h3 id="understandyourrequirements">Understand Your Requirements</h3>
<p>We all hope that some day our business will be as successful as Google or Facebook or whatever the next big thing is. Those business got that way by first concentrating on doing something incredibly well and then growing as needed. Nobody reasonable wakes up and says &#8220;We&#8217;re going to build this incredibly robust application and here&#8217;s how we&#8217;re going to do it.&#8221; The internet is full of articles about how different companies have dealt with their problems of scaling. Do your job well and worry about success when it happens.</p>
<p>In the StackOverflow question, Ted (we&#8217;ll call this person Ted) was asking which distributed database they should use to scale their system. They gave some vague requirements for data throughput, but left out why the system needed to be distributed across multiple servers.</p>
<p>This triggered my buzzword detector. I think distributed databases are incredibly cool, but they have a place; Ted&#8217;s requirements didn&#8217;t match at all. without much explanation for why a distributed database would be important here, it was hard to even refute the argument about using a distributed database. </p>
<p>Distributed databases have some operational advantages &#8211; they <em>tend</em> to be more robust and tolerant of equipment failures, but that&#8217;s based on certain configuration details like using multiple server replicas. RDBMSes, out of the box, aren&#8217;t distributed across multiple servers, but there are a lot of features that have been built to make it possible to replicate data across data centers or to shard the database across multiple servers. The closest thing to a business requirement was that the database needed to be free; open source would be nice, too.</p>
<h3 id="understandyourhardware">Understand Your Hardware</h3>
<p>Business requirements only matter so much. Eventually somebody has to write data to a disk. Once that drive head starts moving, the best designed software won&#8217;t matter if the underlying hardware can&#8217;t keep up with the load. That&#8217;s why it&#8217;s important to know what your hardware is capable of handling. It&#8217;s just as important to know what your application is capable of producing</p>
<p>Ted needed a system that would be handling less than 100 transactions per second and would probably end up writing data at a rate of around 400 kilobytes per second. Neither of these requirements are show stoppers. Assuming that the server was going to be writing at a constant rate, the amount of data generated and kept would be around 10 terabytes of data a year. While it&#8217;s nothing to scoff at, it&#8217;s not an unheard of data generate rate. The thing is, almost any off the shelf database software can handle these kinds of load. Almost any off the shelf server can handle this kind of data throughput. </p>
<p><a href="http://www.brentozar.com/wp-content/uploads/2012/01/many-ways-to-store-data.png"><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.brentozar.com/wp-content/uploads/2012/01/many-ways-to-store-data.png" alt="" title="many ways to store data" width="160" height="89" class="alignright size-full wp-image-13334" /></a>The requirement to handle ~100 requests per second at around 4 kilobytes per record isn&#8217;t a matter of choosing a database product, it&#8217;s a matter of designing a storage solution that can handle the ongoing needs of the business. When SAN space can be purchased from around $15,000 per terabyte, 10TB per year becomes a minor budget line item for all but the most cash strapped startup. </p>
<h3 id="understandyourdata">Understand Your Data</h3>
<p>There was one feature I left out until now. It&#8217;s important to understand how your data will be used. Graph databases excel at helping users explore the relationship between different data points. Relational databases make it easy to build incredibly flexible, good enough solutions to most problems. Key value databases make it possible to do single key lookups in near constant time. The way that data is read limits the playing field. </p>
<p>Ted mentioned that almost of the data lookups were going to be by primary key lookups. If this were the only requirement for reading data, this problem could be solved by any database. Then he threw in a little hook &#8211; there would be some joins in the data. In the world of databases once you use the j-word, your options get very limited very quickly. You have to start thinking about querying patterns, potential optimizations, and the trade offs of read vs write optimizations.</p>
<p>If you do need joins, you can take one of two approaches &#8211; let the database do it, or write it yourself. Neither approach is difficult (and one is certainly easier than the other), but they&#8217;re both feasible &#8211; heck the hard part has already been done for you: someone else <a href="http://en.wikipedia.org/wiki/Hash_join#Grace_hash_join">came up with the algorithm</a>.</p>
<h3 id="understandobscenity">Understand Obscenity</h3>
<p>Justice Potter Stewart in an attempt to classify obscenity said &#8220;I shall not today attempt further to define the kinds of material I understand to be embraced&#8230; but <a href="http://en.wikipedia.org/wiki/I_know_it_when_I_see_it">I know it when I see it</a>.&#8221; Right now, there&#8217;s no good definition of what makes data into Big Data. Some people say that you&#8217;ve hit Big Data when you can no longer predict query performance. Some people use hard and fast numbers of data volume in bytes, in data churn rate, or in the massively parallel nature of the database. There&#8217;s no right or wrong answer and Big Data is something that varies from organization to organization. </p>
<p>It&#8217;s important to understand what problem you&#8217;re trying to solve, understand the volume of data, and understand how the data is going to be used before making the final selection. There are many ways to store data. </p>
<p>What would I have done in this situation? Taking into account that I know SQL Server well, I would use SQL Server. SQL Server can perform admirably as a glorified key value store. B+trees are pretty quick in most use cases and they balance many of the problems of simultaneously reading and writing data to provide a good enough solution to the problem (with great management tools on top). When business users demand better querying capability, it&#8217;s easy enough to start adding non-clustered indexes on top of the solution.</p>
<p>...<br /><i>Upcoming free webcasts: <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=663314175">SQL and SSDs: A Valentine's Day Love Story</a> and <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=664876357">Anatomy of the SQL Server Log File</a></i>.</p>
<div class="wp-about-author-containter-top" style="background-color:#FFEAA8;"><div class="wp-about-author-pic"><img alt='' src='http://1.gravatar.com/avatar/740378c166b627c54c0341a4ee155c0f?s=100&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D100&amp;r=R' class='avatar avatar-100 photo' height='100' width='100' /></div><div class="wp-about-author-text"><h3><a href='http://www.brentozar.com/archive/author/jeremiah-peschka/' title='Jeremiah Peschka'>Jeremiah Peschka</a></h3><p>Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.</p><p><a href='http://facility9.com' title='Jeremiah Peschka'>Website</a> - <a href='http://twitter.com/peschkaj' title='Jeremiah Peschkaon Twitter'>Twitter</a> - <a href='http://www.facebook.com/peschkaj' title='Jeremiah Peschka on Facebook'>Facebook</a> - <a href='http://www.brentozar.com/archive/author/jeremiah-peschka/' title='More posts by Jeremiah Peschka'>More Posts</a> </p></div></div>]]></content:encoded>
			<wfw:commentRss>http://www.brentozar.com/archive/2012/01/understanding-database-choice/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Availability, Data Locality, and Peer-to-Peer Replication</title>
		<link>http://www.brentozar.com/archive/2012/01/highly-available-local-replication/</link>
		<comments>http://www.brentozar.com/archive/2012/01/highly-available-local-replication/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 15:00:24 +0000</pubDate>
		<dc:creator>Jeremiah Peschka</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[high availability]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[sqlserver]]></category>

		<guid isPermaLink="false">http://www.brentozar.com/?p=13292</guid>
		<description><![CDATA[I want to make something clear: high availability is not load balancing. The two options don&#8217;t have to be mutually exclusive, but they aren&#8217;t the same thing. Several months ago, I wrote about resolving write conflicts. Some of the approaches I mentioned for resolving write conflicts (such as taking a &#8220;last write wins&#8221; approach) involved...<p>...<br /><i>Upcoming free webcasts: <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=663314175">SQL and SSDs: A Valentine's Day Love Story</a> and <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=664876357">Anatomy of the SQL Server Log File</a></i>.</p>
]]></description>
			<content:encoded><![CDATA[<p>I want to make something clear: high availability is not load balancing. The two options don&#8217;t have to be mutually exclusive, but they aren&#8217;t the same thing. Several months ago, I wrote about <a href="http://www.brentozar.com/archive/2011/06/resolving-conflicts-database/">resolving write conflicts</a>. Some of the approaches I mentioned for resolving write conflicts (such as taking a &#8220;last write wins&#8221; approach) involved using peer-to-peer replication. It&#8217;s important to understand conflict resolution and peer-to-peer replication. Since I&#8217;ve already talked about <a href="http://www.brentozar.com/archive/2011/06/resolving-conflicts-database/">conflict resolution</a>, let&#8217;s dig into how peer-to-peer replication fits into the mix.</p>
<h3>Peer-to-Peer Replication and You</h3>
<p>Peer-to-peer replication is a special and magical kind of replication, it works in a ring or mesh to make sure that one row&#8217;s updates will magically spread to all servers. You&#8217;d think that this would mean every server is equal, right?</p>
<p>In some distributed databases actions can take place on any server. Using Riak as an example, when you want to add a new record to the database you can write the record to any server and the record will be routed to the server responsible for handling that data. This is part of the beauty of a specific type of distributed database: the database is a collection of nodes that serve reads and writes without regard for hardware failure. There&#8217;s a lot of software trickery that goes into making this work, but it works quite well.</p>
<p>SQL Server&#8217;s peer-to-peer replication is a distributed database, just not in the sense that I&#8217;ve used the term previously. In SQL Server peer-to-peer replication every node is an exact copy of every other node: the same tables, rows, indexes, and views exist on every node. This is where the difficulty begins &#8211; if every row exists on every node, how do we know where to update data? The problem is that we don&#8217;t know where to update a row. There is no out of the box mechanism for determining row ownership.</p>
<p>Distributed database systems like Riak, Cassandra, HBase, and Dynamo work by assigning an owner to every record, shouldn&#8217;t we do the same thing with SQL Server? When we&#8217;re spreading data across a number of servers, we have to ensure that writes go to the correct location, otherwise we need to build a large number of checks in to ensure that all nodes have the appropriate updates and that everyone is working on the correct version of data. Otherwise, we run into conflicts. This is the reason I hinted at using peer-to-peer replication combined with write partitioning and a last write wins method of conflict detection. If changes to a row can only occur on server A, we don&#8217;t need to worry about updates on other servers &#8211; those updates can be ignored since they did occur not in the correct location.</p>
<p>The difficulty lies in finding a way to do all of this. SQL Server&#8217;s replication offers no routing functionality, it just replicates data to the appropriate subscribing servers. In order to make sure that data gets to the right place, there must be another piece to the puzzle. There must be a way to correctly locate data.</p>
<h3>Record Ownership</h3>
<p>If you absolutely must use peer-to-peer replication as a form of load balancing, record ownership is an important concept to consider. Regardless of whether the distributed database is relational or not, software still needs to be aware of where the definitive version of a record can be found. If there&#8217;s no way to determine which version of a record is the definitive version of a record, two updates can occur in different locations. This will undoubtedly lead to painful conflict scenarios. Instead of worrying about handling conflicts, we worry about getting data to the right place. Once we know that the data is in the right place, we can trust our database to be as accurate as possible.</p>
<p>I use the term record instead of row for an important reason: a record represents a complete entity in a system. A row may be part of a record (e.g. a line item in an order) but the record is the complete order.</p>
<p>Record ownership is a tricky thing to think about; how do you determine who owns any single row? What&#8217;s a fair and efficient way to handle this? Let&#8217;s take a look at different techniques and see how they stack up. Here&#8217;s a quick list of possible ways we can distribute row ownership in a database:</p>
<ul>
<li>Random</li>
<li>Range-based</li>
<li>Static</li>
</ul>
<h4>Random Record Ownership</h4>
<p>Randomness is frequently used to ensure an even distribution. Randomly bucketizing data turns out to be a very effective way of ensuring that data will be split very close to evenly across any arbitrary number of locations. The difficulty is in ensuring randomness.</p>
<p>Some systems like Riak and Cassandra use a hash function to distribute data ownership around the database cluster. Different nodes are assigned a range of values &#8211; if there are four servers in the distributed database, each one is roughly responsible for 1/4 of the data in the database (I&#8217;m simplifying, of course, but you get the drift). Special routing code takes care of getting data to clients and sending writes to the appropriate place. The location of a record is typically determined by applying a hashing function to the record&#8217;s key. In this way, we can always find a row at a later date: by applying a function to the key we can quickly find the row even if the number of servers in the cluster has changed.</p>
<p>This mechanism provides a reliable way to uniquely identify data and distribute it among many servers. This technique is difficult to accomplish with SQL Server. There is no peer-to-peer replication functionality in the SQL Server space that makes it easy to say &#8220;This record belongs on server A and this record belongs on server B.&#8221; There&#8217;s a reason for this: peer-to-peer replication is a high availability technology. It exists to make life easier in the unfortunate event that your data center slides into the ocean. It&#8217;s possible to build some kind of load balancing layer in SQL Server using SQL Server Service Broker (or just about any other technology), but the point remains that SQL Server doesn&#8217;t provide out of the box functionality to automatically implement random record based ownership.</p>
<h4>Range-Based Record Ownership</h4>
<p>Range-based ownership is far simpler than random record ownership. In range-based ownership a range of records are claimed by a single server. This could be users 1 through 100,000 or it could be users whose names start with &#8216;A&#8217; through users whose names start with &#8216;K&#8217;. At a quick glance range-based record seems like it doesn&#8217;t have many down sides: it&#8217;s easy to determine where an appropriate record goes. My data goes to server A, your data goes to server B, his data goes to server C.</p>
<p>Range-based record ownership has a major flaw: some servers will experience more load than others. For example, if we&#8217;re partitioning by name we will quickly discover that first names aren&#8217;t very unique, at least not in Western cultures. In a survey of first names conducted in the UK, one quarter of women were likely to have only one of ten first names in 1994. One in three women was likely to be named Emily. Needless to say, data distribution will cause skew in the activity distribution on different servers. If one server accumulates a clump of very active users (e.g. a group of active early adopters), that server may experience a higher load than the others.</p>
<p>Designing an effective range-based record ownership scheme for SQL Server peer-to-peer replication is possible but very difficult. The effectiveness of the scheme depends on intimate knowledge of write patterns. Most of us don&#8217;t have the time to develop a deep understanding of how data is written and then develop a scheme that takes into account those patterns.</p>
<h3>Static Record Ownership</h3>
<p>With static record ownership, we assign each record to a server when it is created. This could be as simple as assigning a user to the closest server or it could mean assigning records to a server by some other arbitrary means. However this is accomplished, it&#8217;s important to remember that some piece code still must able to determine where a record should go and that the mechanism for identifying that initial location should be general purpose enough to meet your user&#8217;s needs in the long term.</p>
<p>There are several common ways to split out data. If you have a system that&#8217;s multi-tenant, it becomes easy to assign ownership for all of a single client/customer&#8217;s data to a single server. If that customer grows, you can buy a separate server or move them onto a different server with fewer users. Every record ends up having a composite key made up of the record identifier and the client identifier, but this is a small price to pay for clearly being able to separate data responsibility by client.</p>
<p>Another way to split out data is geographically. If I sign up for a service, it&#8217;s nice if the primary place to write my data is as close to me as possible. In this case, the service might have three data centers: in LA, in New York, and one in London. Much like using a multi-client architecture, a geographic method to determine ownership would use the location as part of the key for each record &#8211; records stored in LA would use a composite key with the data center location (&#8216;LA&#8217;) and some other arbitrary key value to identify a unique record.</p>
<p>No matter what scheme you decide to use, static record ownership is an easy way to determine which SQL Server should be responsible for writes to a single record. An advantage of static record ownership is that routing can be handled in the application or a sufficiently sophisticated router can handle routing writes without any additional application code being added to the application &#8211; just a few load balancer rules will need to be created or changed<br />
.</p>
<h3>In Summation</h3>
<p>Here&#8217;s the trick: throughout all of this we&#8217;ve ignored that order of events is important. We&#8217;ve just assumed that when data is being written, we&#8217;re guaranteeing the order of events. If the data is being written to random servers, there&#8217;s no guarantee of event order. In a naive system, a record might be written to one server and an update applied to a second server before the original record even shows up! Distributing data is difficult. Randomly distributing data is even more difficult. No matter how you distribute your data or distribute writes, remember that distributing data in SQL Server through peer-to-peer replication is a high availability technology. It can be co-opted for scale out performance improvement, but there are some design decisions that must be made.</p>
<p>...<br /><i>Upcoming free webcasts: <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=663314175">SQL and SSDs: A Valentine's Day Love Story</a> and <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=664876357">Anatomy of the SQL Server Log File</a></i>.</p>
<div class="wp-about-author-containter-top" style="background-color:#FFEAA8;"><div class="wp-about-author-pic"><img alt='' src='http://1.gravatar.com/avatar/740378c166b627c54c0341a4ee155c0f?s=100&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D100&amp;r=R' class='avatar avatar-100 photo' height='100' width='100' /></div><div class="wp-about-author-text"><h3><a href='http://www.brentozar.com/archive/author/jeremiah-peschka/' title='Jeremiah Peschka'>Jeremiah Peschka</a></h3><p>Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.</p><p><a href='http://facility9.com' title='Jeremiah Peschka'>Website</a> - <a href='http://twitter.com/peschkaj' title='Jeremiah Peschkaon Twitter'>Twitter</a> - <a href='http://www.facebook.com/peschkaj' title='Jeremiah Peschka on Facebook'>Facebook</a> - <a href='http://www.brentozar.com/archive/author/jeremiah-peschka/' title='More posts by Jeremiah Peschka'>More Posts</a> </p></div></div>]]></content:encoded>
			<wfw:commentRss>http://www.brentozar.com/archive/2012/01/highly-available-local-replication/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A DBA&#8217;s Guide to ORMs Webcast</title>
		<link>http://www.brentozar.com/archive/2012/01/dbas-guide-orms-webcast/</link>
		<comments>http://www.brentozar.com/archive/2012/01/dbas-guide-orms-webcast/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 14:00:50 +0000</pubDate>
		<dc:creator>Jeremiah Peschka</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Videos]]></category>
		<category><![CDATA[orm]]></category>
		<category><![CDATA[sqlserver]]></category>

		<guid isPermaLink="false">http://www.brentozar.com/?p=13342</guid>
		<description><![CDATA[Developers often use tools, like Entity Framework or NHibernate, to make it easier to work with the database. These tools sometimes cause problems for developers and DBAs alike. In this talk, we&#8217;ll talk about the terminology and techniques used with an ORM. We&#8217;ll also uncover ways to help DBAs work with developers, detect problematic queries,...<p>...<br /><i>Upcoming free webcasts: <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=663314175">SQL and SSDs: A Valentine's Day Love Story</a> and <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=664876357">Anatomy of the SQL Server Log File</a></i>.</p>
]]></description>
			<content:encoded><![CDATA[<p>Developers often use tools, like Entity Framework or NHibernate, to make it easier to work with the database. These tools sometimes cause problems for developers and DBAs alike. In this talk, we&#8217;ll talk about the terminology and techniques used with an ORM. We&#8217;ll also uncover ways to help DBAs work with developers, detect problematic queries, and improve performance in both the database and the application.</p>
<p>This 30 minute session is for DBAs who are unfamiliar with ORMs and who aren&#8217;t sure where to start.</p>
<p><iframe width="600" height="450" src="http://www.youtube.com/embed/pUWaOoXjGnY?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<h3>Using ORMs with Stored Procedures</h3>
<p>Most ORMs can use stored procedures instead of writing their own SQL. This is important when data is more complex than a single object to table mapping.</p>
<ul>
<li><a href="http://ayende.com/blog/1692/using-nhibernate-with-stored-procedures">Using NHibernate with Stored Procedures</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/bb896279.aspx">Entity Framework &#8211; How to Define a Model with a Stored Procedure</a></li>
</ul>
<h3>The n+1 Selects problem</h3>
<p>The n+1 selects problem frequently occurs when displaying a list of items to a user. This can happen through a combination of looping in application code and lazy loading (only loading data when it&#8217;s explicitly needed). The ORM will generate multiple calls to the database, one for each object that&#8217;s used. Solving this problem depends on the particulars of the ORM that you&#8217;re using.</p>
<ul>
<li><a href="http://ayende.com/blog/3732/solving-the-select-n-1-problem">Solving the Select N+1 Problem</a></li>
<li><a href="http://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem">What is the N+1 Selects Problem?</a></li>
</ul>
<h3>Query Cache Pollution</h3>
<p>ORMs can cause the same problems that ad hoc SQL can cause &#8211; many plans will be generated and consume SQL Server&#8217;s memory. Grant Fritchey (<a href="http://scarydba.com">blog</a> | <a href="https://twitter.com/#!/gfritchey">twitter</a>) documented how this problem appears in NHibernate and how to detect it in <a href="http://www.scarydba.com/2008/04/29/nhibernate-recompiles-and-execution-plans/">NHibernate Recompiles and Execution Plans</a>. Solutions abound and there&#8217;s an excellent write up of the history of this problem in <a href="http://zvolkov.com/clog/2009/10/28?s=NHibernate+parameter+sizes+controversy">NHibernate Parameter Sizes Controversy</a>.</p>
<h3>General ORM Links</h3>
<ul>
<li><a href="http://mynerditorium.blogspot.com/2010/04/how-to-fail-at-orm.html">How to Fail at ORM</a></li>
<li>Ted Neward&#8217;s essay <a href="http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx">The Vietnam of Computer Science</a></li>
<li>Jeff Atwood&#8217;s follow up to Neward&#8217;s essay: <a href="http://www.codinghorror.com/blog/2006/06/object-relational-mapping-is-the-vietnam-of-computer-science.html">Object-Relation Mapping is the Vietnam of Computer Science</a></li>
<li><a href="http://en.wikipedia.org/wiki/Active_record_pattern">The Active Record Design Pattern</a> &#8211; This pattern is used in most ORM frameworks</li>
</ul>
<h3>Links to Common ORM Tools</h3>
<ul>
<li><a href="http://nhforge.org">NHibernate</a> &#8211; a commonly used .NET ORM that is based on Hibernate</li>
<li><a href="http://hibernate.org">Hibernate</a> &#8211; the grandaddy of Java ORMs and the inspiration for many others.</li>
<li><a href="http://llblgen.com">LLBLGen Pro</a> &#8211; this is the Cadillac of ORMs. If there&#8217;s something you wished an ORM could do, odds are LLBLGen Pro can do it. It even provides tools to generate code for other ORMs.</li>
<li><a href="http://nhprof.com">NHibernate Profiler</a> &#8211; it&#8217;s a profiler that developers can locally to grab only their own queries to the database.</li>
<li><a href="http://rubyonrails.org">Ruby on Rails</a> &#8211; Ruby on Rails uses an ORM named ActiveRecord to do the heavy lifting.</li>
</ul>
<p>...<br /><i>Upcoming free webcasts: <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=663314175">SQL and SSDs: A Valentine's Day Love Story</a> and <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=664876357">Anatomy of the SQL Server Log File</a></i>.</p>
<div class="wp-about-author-containter-top" style="background-color:#FFEAA8;"><div class="wp-about-author-pic"><img alt='' src='http://1.gravatar.com/avatar/740378c166b627c54c0341a4ee155c0f?s=100&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D100&amp;r=R' class='avatar avatar-100 photo' height='100' width='100' /></div><div class="wp-about-author-text"><h3><a href='http://www.brentozar.com/archive/author/jeremiah-peschka/' title='Jeremiah Peschka'>Jeremiah Peschka</a></h3><p>Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.</p><p><a href='http://facility9.com' title='Jeremiah Peschka'>Website</a> - <a href='http://twitter.com/peschkaj' title='Jeremiah Peschkaon Twitter'>Twitter</a> - <a href='http://www.facebook.com/peschkaj' title='Jeremiah Peschka on Facebook'>Facebook</a> - <a href='http://www.brentozar.com/archive/author/jeremiah-peschka/' title='More posts by Jeremiah Peschka'>More Posts</a> </p></div></div>]]></content:encoded>
			<wfw:commentRss>http://www.brentozar.com/archive/2012/01/dbas-guide-orms-webcast/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Notes on Scalability</title>
		<link>http://www.brentozar.com/archive/2012/01/notes-on-scalability/</link>
		<comments>http://www.brentozar.com/archive/2012/01/notes-on-scalability/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 15:00:41 +0000</pubDate>
		<dc:creator>Jeremiah Peschka</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[sqlserver]]></category>

		<guid isPermaLink="false">http://www.brentozar.com/?p=13179</guid>
		<description><![CDATA[We all hope that we&#8217;re going to succeed beyond our wildest expectations. Startups long for multi-billion dollar IPOs or scaling to hundreds, or even thousands, of servers. Every hosting provider is touting how their new cloud offering will help us scale up to unheard of heights. I&#8217;ve built things up and torn them down a...<p>...<br /><i>Upcoming free webcasts: <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=663314175">SQL and SSDs: A Valentine's Day Love Story</a> and <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=664876357">Anatomy of the SQL Server Log File</a></i>.</p>
]]></description>
			<content:encoded><![CDATA[<p>We all hope that we&#8217;re going to succeed beyond our wildest expectations. Startups long for multi-billion dollar IPOs or scaling to hundreds, or even thousands, of servers. Every hosting provider is touting how their new cloud offering will help us scale up to unheard of heights. I&#8217;ve built things up and torn them down a few times over my career</p>
<h3 id="build_it_to_break">Build it to Break</h3>
<p>Everything you make is going to break, plan for it.</p>
<p>Whenever possible, design the individual layers of an application to operate independently and redundantly. Start with two of everything &#8211; web servers, application servers, even database servers. Once you realize that everything can and will fail, you&#8217;ll be a lot happier with your environment, especially when something goes wrong. Well designed applications are built to fail. Architects accept that failure is inevitable. It&#8217;s not something that we want to consider, but it&#8217;s something that we have to consider. </p>
<p>Distributed architecture patterns help move workloads out across many autonomous servers. Load balancers and web farms help us manage failure at the application server level. In the database world, we can manage failure with clustering, mirroring, and read-only replicas. Everything computer doesn&#8217;t have to be duplicated, but we have to be aware of what can fail and how we respond.</p>
<h3 id="everything_is_a_feature">Everything is a Feature</h3>
<p>As Jeff Atwood has famously said, <a href="http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html">performance is a feature</a>. The main thrust of Jeff&#8217;s article is that making an application fast is a decision that you make during development. Along the same lines, it&#8217;s a conscious decision to make an application fault tolerant. </p>
<p>Every decision that has a trade off. Viewing the entire application as a series of trade offs leads to a better understanding about how the application will function in the real world. The difference between being able to scale up and being able to scale out can often come down to understanding key decisions that were made early on.</p>
<h3 id="scale_out_not_up">Scale Out, Not Up</h3>
<p>This isn&#8217;t as axiomatic as it sounds. Consider this: cloud computing like Azure and AWS is at its most flexible when we can dynamically add servers in response to demand. To effectively scale out means that we need also to be able to scale back in. </p>
<p>Adding additional capacity is usually in the application tier; just add more servers. What happens when we need to scale the database? The  current trend is to buy a faster server with faster disks and more memory. This process keeps repeating itself. Hopefully your demand for new servers will continue at a pace that is less than or equal to the pace of innovation. There are other problems with scaling up. As performance increases, hardware gets more expensive for smaller and smaller gains. The difference in cost between the fastest CPU and second fastest CPU is much larger than the performance gained &#8211; scaling up often comes at a tremendous cost. </p>
<p><a href="http://www.brentozar.com/wp-content/uploads/2012/01/change-everything.png"><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.brentozar.com/wp-content/uploads/2012/01/change-everything.png" alt="don&#039;t be afraid to change everything" title="don&#039;t be afraid to change everything" width="194" height="90" class="alignright size-full wp-image-13181" /></a>
<p>The flip side to scaling up is scaling out. In a scale out environment, extra commodity servers are added to handle additional capacity. One of the easiest ways to manage scaling out the database is to use read-only replica servers to provide scale out reads. Writes are handled on a master server because scaling out writes can get painful. But what if you need to scale out writes? Thankfully, there are many techniques available to horizontally scaling the database layer &#8211; features can be broken into distinct data silos, metadata is replicated between all servers while line of business data is sharded, or automated techniques like SQL Azure&#8217;s federations can be used.</p>
<p>The most important thing to keep in mind is that it&#8217;s just as important to be able to contract as it is to expand. As a business grows it&#8217;s easiest to keep purchasing additional servers in response to load. Purchasing more hardware is faster and usually cheaper than tuning code. Once the application reaches a maturity level, it&#8217;s important to tune the application to run on fewer resources. Less hardware equates to less maintenance. Less hardware means less cost. Nobody wants to face the other possibility, too &#8211; the business may shrink. A user base may erode. A business&#8217;s ability to respond to changing costs can be the difference between a successful medium size business and a failed large business.</p>
<h3 id="buy_more_storage">Buy More Storage</h3>
<p>In addition to scaling out your servers, scale out your storage. If you have the opportunity to buy a few huge disks or a large number of small, fast disks give serious thought to buying the small, fast disks. A large number of small, fast drives is going to be able to rapidly respond to I/O requests. More disks working in concert means that less data will need to be read off of each disk.</p>
<p>The trick here is that modern databases are capable of spreading a workload across multiple database files and multiple disks. If multiple files/disks/spindles/logical drives are involved in a query, then it&#8217;s possible to read data from disk even faster than if only one very large disk were involved. The principle of scaling out vs. scaling up applies even at the level of scaling your storage &#8211; more disks are typically going to be faster than large disks.</p>
<h3 id="you8217re_going_to_do_it_wrong">You&#8217;re Going to Do It Wrong</h3>
<p>No matter how smart or experienced your team is, be prepared to make mistakes. There are very few hard and fast implementation guidelines about scaling the business. Be prepared to rapidly iterate through multiple ideas before finding the right mix of techniques and technologies that work well. You may get it right on the first try. It may take a number of attempts to get it right. But, in every case, be prepared to revisit ideas. </p>
<p>On that note, be prepared to re-write the core of your application as you scale. Twitter was originally built with Ruby on Rails. Over time they <a href="http://blog.evanweaver.com/2009/03/13/qcon-presentation/">implemented different parts of the application with different tools</a>. Twitter&#8217;s willingness to re-write core components of their infrastructure led them to their current levels of success. </p>
<p>Don&#8217;t be afraid to change everything.</p>
<p>...<br /><i>Upcoming free webcasts: <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=663314175">SQL and SSDs: A Valentine's Day Love Story</a> and <a href="https://brentozarevents.webex.com/brentozarevents/onstage/g.php?t=a&d=664876357">Anatomy of the SQL Server Log File</a></i>.</p>
<div class="wp-about-author-containter-top" style="background-color:#FFEAA8;"><div class="wp-about-author-pic"><img alt='' src='http://1.gravatar.com/avatar/740378c166b627c54c0341a4ee155c0f?s=100&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D100&amp;r=R' class='avatar avatar-100 photo' height='100' width='100' /></div><div class="wp-about-author-text"><h3><a href='http://www.brentozar.com/archive/author/jeremiah-peschka/' title='Jeremiah Peschka'>Jeremiah Peschka</a></h3><p>Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.</p><p><a href='http://facility9.com' title='Jeremiah Peschka'>Website</a> - <a href='http://twitter.com/peschkaj' title='Jeremiah Peschkaon Twitter'>Twitter</a> - <a href='http://www.facebook.com/peschkaj' title='Jeremiah Peschka on Facebook'>Facebook</a> - <a href='http://www.brentozar.com/archive/author/jeremiah-peschka/' title='More posts by Jeremiah Peschka'>More Posts</a> </p></div></div>]]></content:encoded>
			<wfw:commentRss>http://www.brentozar.com/archive/2012/01/notes-on-scalability/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Object Caching 1223/1242 objects using disk: basic

Served from: www.brentozar.com @ 2012-02-08 16:54:10 -->
