Mixed Mode and Windows Authentication Considerations

Given that Azure doesn’t use domain authentication credentials are important to a number of important activities. Like backing up databases to Azure storage: https://msdn.microsoft.com/en-us/library/jj720558.aspx

Thomas [W] Marshall

One question that comes up when you are installing and configuring a SQL Server instance is whether to use Windows Authentication or Mixed Mode Authentication. Mixed Mode Authentication allows the use of both SQL Server and Windows Authentication. In SQL Azure there is only SQL Server authentication.

In Windows Authentication the Windows local or domain user account is authenticated by the Windows OS or Domain Controller. In the case of a local account that account is effectively a domain account with the domain being the local machine. Windows Authentication can be further differentiated into service accounts and user accounts. Service accounts should have an SPN and either be trusted for delegation or a managed service account. Fortunately, those decisions and some of the more complex scenarios involving cross domain authentication and Kerberos are externalized to Windows. That is to say that those concerns are removed from the database and put…

View original post 351 more words

Word VBA Macro for Journal Entries

Never underestimate the value a record of events nor a simple time saving macro.

Thomas [W] Marshall

I have been using this macro for years. I have rewritten it several times. I find it to be quiet useful. It is comprised of four subroutines. The first two are the work I need done. The third evaluates the conditional that work is predicated on and calls the 2nd or 1st and 2nd. The 4th calls the 3rd and is bound to a keyboard shortcut in order to decouple the execution from the keyboard binding.

The first subroutine is called date header. As the name suggests it inserts a date header. I use header one for the date. For the purpose of keeping a journal this is useful as it displays at the top level of the navigation pane (which has a checkbox on the view tab).

Sub DateHeader() ' ' DateHeader Macro ' Insert H1 Current Date '     Selection.Style = ActiveDocument.Styles("Heading 1")     Selection.InsertDateTime DateTimeFormat:="dddd, MMMM dd…

View original post 371 more words

Extract Tranform Load Design

Earlier today someone asked me about the steps I take when creating an SSIS package. That was a great question. Often the creative process involved is decidedly non-linear and it helps to force a more linear form. The answer I gave was a list of points to broader categories. It went something like this:

  • Define the flow between the end points.

For some this might seem to be the whole business of designing an ETL package. This is where we list what we are extracting, where we are loading it, and what transformations have to take place for that load to be successful. Sometimes, this will involve simply copying data to a staging system and triggering a stored procedure to load transactional data into a warehouse. This approach is sometimes called ELT. Other times we are building ETL in support of data integration and need to match data from multiple source systems together to get a single version of it. Sometimes we are extracting from a system to distribute it to multiple consumers.

  • List the source(s).

Our sources could be any number of things. They maybe database servers, spreadsheets, flat files, WMI providers, web services, XML files, or any sort of custom source. Databases could be from multiple vendors and can require drivers and permissions. The account that the ETL runs as will need read permissions to extract data (not to mention connection permissions). It is important to get details about the sources up front to avoid a lot of back and forth getting connections to work.

  • List the destination(s).

Like our sources, destinations could be any number of the same types of things as our destinations. There are also additional destinations specific to ETL tools such as SSIS. SSIS can have as destinations related to SSAS and data warehouse dimensions. Knowing the destinations and their capabilities can help to make transformation choices less burdensome.

  • List the transformation(s).

Different sources and destinations will have different specific implementations of data types. They may likewise have differing code pages and collations. These sort of transformations may be necessary even when loading to a staging environment. Other transformations can include matching data between separate sources, decoding industry codes, extracting nested data, adding audit and lineage data, or capturing statistics about data flowing between systems. One of the more common types of transformation will be schema transformations necessary to load transaction data into a warehouse form.

  • Define the environmental parameters

The environmental parameters address more specific issues about the source and destination as well as performance requirements and scheduling. It is good to list these items even for those which may not make it into an initial design, because they will expose options and show due diligence undertaken in the selection of techniques from those available.

    1. List the preparation(s).

Does a process have to execute to generate the data? The account the ETL package is running as will need execute permissions, or need to be called by something that executes the data generating process. Is there a list that can be used to limit operations or variables that must be set before proceeding to the data flow?

    1. List the cleanup(s).

Do connections need to be closed? Does a completion message need to be logged or sent to a receiver? Is there another process that should be started as in the case of an ELT style load which should trigger stored procedures on the destination side?

    1. List the logging.

To what extent should the ETL package perform its own logging? Are there custom events for the package that need to be logged? Does the etl controller log for the package? Should the logging be pushed to the system logs or custom ones?

    1. List the error handling.

How does the package handle errors? If the volume of data is large it should be discretized to enable roll back to an intermediary state rather than to default. Should the package be broken into several that can be run on the separate servers (e.g. the source system has an extractor and the destination a loader) how would/should communication of the errors be handled?

    1. List the configuration(s)/environment(s).

While developing the package what systems will be used? How will this change when moving the package to testing and production environment? What variables and package elements will be effected by this move?

    1. List the relevant infrastructure element(s).

This can be broken down into several subordinate questions about the physical locations of the source and destination systems, as well that those systems themselves. In many respects this is a list of opportunities and conjectures related to performance enhancement. Listing these items can help to identify major issues to work around up front but is often best served as a place to revisit ideas for performance after the ETL is actually working.

      1. Same server

Sometime ETL is a matter of moving some data around on the same server. It still pays to be aware of what disks the files are residing on and how they are formatted as well as what kind of IO load that can be expected.

      1. Server to Server (with ETL controller)

Whether or not there is an ETL controller can determine how a package is being run. A separate ETL controller will have its own scheduling system and not run against limits the source or destination servers might have owing to their own scheduled operations. It might be desirable to have each the source and destination handle their own scheduling and operate asynchronously (i.e. the source system outputs to a file on the ETL server, which when resources become available on the destination is read by the destination server).

      1. Server to Multi-server (with ETL controller)

Like with a server to server ETL a broadcast type ETL might benefit from a separation of the extract and load stages. It might also benefit from a T like operation where the flow itself between the source and (at least one) destination flow directly but put a copy down to a share that can be picked up by asynchronous destinations.

      1. Multi-site (with SAN Replication)

When the source and destination server(s) are located in separate building understanding the WAN dynamics can help in avoiding frustrations. There are also opportunities with asynchronous ETL where there is SAN replication involved (i.e. we can have an extract write to the SAN and a load monitor its local SAN replica for the file and load when it arrives). Bandwidth restrictions can cause SLA headaches and it is best to note these things up front rather than fail when exceeding the limitations of the network. Even with fiber between the sites there is still going to be routers and switches which are electronic therefore not moving at the speed of light (which is not instantaneous transmission). ETL is also not the only traffic and your network admin can help you understand how much can be used, if only by getting you some data to ETL so you can analyze it.

    1. Scheduling

How often is the ETL going to run? What other processes will be run that can interfere with it? Does it have to run 24/7 or are there windows of outage? How and when are updates to the source system going to be tested for their impact on the ETL?

  1. Synchronicity

How much delay between the source system and destination system changes is tolerable (ideally/realistically, cost/benefit)? If this is low, can a subset of the ETL’d data be considered higher priority (i.e. if certain columns in a table contribute to an aggregate shouldn’t that arrive first followed by the drill down detail data?)

  • Define the data quality parameters
    1. Encoding

What are the meanings of numbers and letters in fields? How will they be stored on the destination side?

    1. Standards

Are there organizations that govern any aspect of the data and how it is to be represented for the industry? (e.g. ICD9, ICD10, IATA, NAIC, etc)

    1. Code Page

What code page translation issues must be specifically handled (e.g. squares or question marks for unknown characters)?

    1. Data Types

What data types are supported by the destinations? What are the default mappings to the source data types? What will be the process for changing a data type or its scope for either the source or destination?

  1. Traceability (Data Lineage)

What source system data will be retained as evidence of the who, what, where, when, and why of the source sytem data? What data will be added in ETL as evidence of the who, what, where, when, and why of the load?

Dataflow, BCP, ETL, Replication, et Al.

There are a lot of techniques for getting data from one system to another. Each of these will have advantages and restrictions depending on the parameters of the data flow. Some techniques are specifically for disaster recovery and high availability, many more are for distributing and restructuring data for specific uses such as reporting and analytics. Some are just for loading data.

Disaster recovery type data flows include such techniques as log shipping, Always On Availability Groups, Peer to Peer Replication, and even automated restores of backups. Each of these involves making data available to an additional server (or site) which will take over in the event of a failure on the primary. Some are more automatic than others. Some have smaller windows of data loss. As a rule the more automatic and the less potential data loss the more expensive the solution will be. We can refer to both of these as the metrics time to recover and time unrecovered.

Distribution type data flows include such techniques as Extract Transform Load (ETL) and replication (transactional, merge, snapshot) as well as techniques that don’t necessarily move the data but make it available remotely such as Linked Servers. Some of the techniques used in disaster recovery can be used to balance read loads and thereby also serve the purpose of distribution (e.g. Always On Availability Groups and Peer to Peer Replication).

Some data flows such as BCP only serve to load data into the database from a non-database source. ETL technologies can also facilitate this. We can also pull data into the database using OPEN ROWSET, OPENXML, OPEN…

There are also service and message based actions which while capable of transmitting data are better utilized to start, stop, or simulate data flows. Change data capture and change tracking can be used to detect the need for a data flow to be initiated. Data flows in turn can be initiated from a source, destination or intermediary. Likewise these can be scheduled or on demand.

The long and short of it is this: The lines between the servers on your enterprise application diagrams shouldn’t be neglected. There is more there than just the network connection. This is why as architects we will speak of a logical and conceptual model as well as a physical model and implementation plan. How we get from point A to point B is an important decision to be made after we’ve decided to go between them.

Disaster Recovery Metrics

Metrics are important. By measuring different aspects of our systems performance we can better understand and rationalize the costs and benefits associated with different solutions. Disaster recovery solutions can be prohibitively expensive. Others can leave you exposed to loss. Understanding a few key metrics can help you to determine what risks are cost effective to endure and which ones it would be wasteful to endure.

Some of the most important metrics for (database) disaster recovery are Mean Time Between Failure (MTBF), Mean Time to Recovery (MTTR), and Mean Time Unrecovered (MTU). The first two are generic to business and information systems.

Given the dependent nature of the business and its systems the MTBF and MTTR will be measured at several levels. The business MTTR will tend to be longer than the application MTTR, which will be longer than the database MTTR, which will be longer than server/network MTTR. This is because one cannot work without the systems that it is dependent on. An organization may have multiple systems and may be adopting strategies supportive of so called “graceful failure”, which might cause us to use MTTTR (Mean time to total recovery) and a collection of MTTRs specific to functionality descending from the business operations they support.

MTBF in a scenario with a large number of specific failure types needs to be properly defined as a policy to help prevent unnecessary strife regarding minor failures. In other words if we call it Mean time between total failure but define it as any failure it could easily be misconstrued as a “total” failure when something like an automatic acknowledgement of customer message received stops working. It can be useful to the problem of defining policy for failure levels to adopt a system of failure levels.

This level of abstraction allows the business stake holder to articulate statements that can be clarified as to their specifics over several iterations. For Example:

  • Level 1 Failures are failures of systems that support auxiliary business functions.
  • Level 2 failures are failures of systems that support important time-insensitive business functions.
  • Level 3 failures are failures of systems that support critical time-insensitive business functions.
  • Level 4 failures are failures of systems that support important time-sensitive business functions.
  • Level 5 failures are failures of systems that support critical time-sensitive business functions.

In turn the organization can designate a liaison to interview business stakeholders to determine which business functions fit into each category. Systems designers can review these specifications to determine which systems are then fitted into each category and prescribe appropriate technologies for meeting these needs. Project management can then determine budgets and submit them for review; ideally the project managers will mark some functionality as candidates for demotion and catch any for promotion as supports the strategic needs of the organization (sometimes people performing important business functionality won’t know how important their functionality is, perhaps, owing to the availability of personnel capable of performing their functionality).

The third metric measures lost work, not lost opportunity for work which the MTTR measures. I use mean time unrecovered because I believe it succinctly communicates what is being measured. The acronym MTU is most commonly used in networking to describe maximum transmission unit which might be more aptly called maximum packet size, but I digress. Each organization is going to have its own collection of acronyms. Ideally these will have both the property of being widely accepted and also of aptly describing what they represent. Ultimately, this measures how much data loss is incurred in the event of a failure in units of time. This can be estimated based on average bytes (or transactions) per unit time which in turn can be used to estimate the dollar value of that loss based on average dollars per byte (or transaction).

Cloud, On-premise, or Hybrid Architecture?

The answer to this question like so many other technical and infrastructure questions is a resounding, “it depends”. Whether to use a pure cloud based solution, an on-premise solution or both will depend on where your organization, or more specifically the systems you are making decisions about, are in their development. The cloud is great to be sure. If you are a startup my reflex would be to advise you to take it to the cloud.

The best aspects of the cloud are twofold. With a solution like SQL Azure your costs are based on usage. Cloud solutions also have greatly simplified high availability. There are drawbacks. Depending on your provider and how you engage them, a cloud solution can be complicated. Any approach you choose has the risk of becoming overly complicated.

An on-premise solution has two primary advantages. The first is one you want to discuss with your accountant or CFO. The hardware is capital expenditure and (as I understand it) can depreciated. The second advantage is in most respects the polar opposite of usage based costs. With an on-premise solution the variability of your operating costs is limited. That is to say with an on-premise solution only your power (and occasionally personnel) costs will fluctuate. This simplifies forecasting and budgeting.

The advantages of a hybrid solution will depend on how and why you implement it. One approach that I have seen touted is using the cloud as a disaster recovery secondary or just as storage for backups. Another approach is to use the cloud to provision for usage above capabilities on premise, whether they be locality (for availability or load balancing) or processor, memory or storage. In short the advantages of a hybrid solution are the advantages of the cloud in addition to the advantages of on-premise.

As stated earlier I would advise start-ups to use the cloud, if only to determine their needs in planning for an on-premise solution. This same advice is also worth considering for an organization that is growing or working on projects that are either temporary or exploratory in nature.

The Cloud Simplified

At some point, owing to the typically lower unit cost (SQL Azure uses Database Throughput Units; a good consultant can help get a handle on your capacity time units) of an on premise solution, it will make sense to transition from cloud to on-premise as the organization consumes those units in sufficient quantities to justify unit cost as the rationale for infrastructure decisions. As your organization approaches the limits of its on premise solution it returns to a state above those limits similar to startup (at least in terms of unit consumption relative to cost of unused units). With adequate planning and analysis your organization can make best use of its technology budget scaling up to its next on-premise solution with cloud services.

Let’s Talk About Dataflow Architecture

This morning I (Thomas Marshall) put together this drawing to facilitate a discussion on generic high level aspects of data architecture.

 

2015-1-9 Conceptual Generic High Level Data Flow

  • Application
    • These will be the primary business applications.
  • OLTP Data store
    • These are the databases applications used by your production employees, sales and customer service people. Some may be purely transactional, some will be more hybrid other will tend to be special purpose. Not all of these will have a relational database backend. Deciding what to include in this category of server is an important step as these are the systems we will be bringing together to give you a more comprehensive picture of your organization’s relation with is industry, customers, vendors, profits, sales, and competition.
  • External Data Source
    • External data sources will include government data, data from credit bureaus, industry organizations, even vendors and customers. This data will need to be processed both for fitting it to your system as well as for cleaning  it to prevent it containing an attack payload. This data is often important in predictive analytics. Knowing things like how market conditions affect customer credit ratings and being able to correlate that to marketing campaign success or customer loyalty can help a business turn crisis into opportunity.
  • Integration Staging Data store
    • The integration staging data store will serve as an intermediary and temporary store for the data that comes in from multiple sources. This can also be a staging source for transforming the OLTP data to the warehouse model. The reason for putting a staging store in has to do with contention for resources as well as timing of delivery and validation of external data. The OLTP system will tend to be tied directly to revenue and we will want to minimize our time reading and processing data on the OLTP server. External data will be out of an organizations control and its model can change without notice (especially when its free data).
  • Integration ETL
    • Integration ETL extracts data from transactional (application) and non-transactional source systems (industry publications, government and vendor supplied data). In larger organizations this might also include data stores for enterprise monitoring, service tickets, and asset management software which can help when it comes to generating reports on IT operations.
  • Data Warehouse
    • Many people see data warehouse as synonymous with star schema. It will actually tend to be a constellation of stars and snowflakes, as well as more specifically purpose oriented models. That is to say that the stars are generally optimal for analyzing data over specific dimensions (e.g. time), but some data can be included without transformations and inclusion (this is part of the impetus for big data). The primary purpose of the data warehouse is that it serves as a hub for distributing data throughout an organization. This will ideally be tended by data stewards applying techniques to deduplicate and standardize the data.
  • Dissemination ETL
    • This ETL process transforms the more generic read friendly model of the data warehouse into forms more specific to the consumers. These forms will be specific to reports and analytic structures. It will also tend to produce small reports about the delta on the warehouse.
  • Report Data store
    • Depending on the scale of the organization the report data store can reside on its own or several database instances or even on the same server as the data warehouse. Initially an organization will begin by running its reports against its OLTP server, and later against another server for load balancing. It might call this second server (or group of servers) replica, data warehouse, data mart, or reporting server. This data store will evolve over time in the interest of report performance and will do so along with the ETL that feeds it. As the reports will tend to have been run against the OLTP server they will have complicated joins that reflect a system modeled for insert and update efficiency rather than read efficiency. As the ETL starts to fill a model specific to the reports the report logic will tend to become simpler and subsequently the reports will run faster.
  • Descriptive reports
    • Descriptive reports are the standard reports most business stakeholders are familiar with. These tell the organization where it has been, what it has done, how many sales, how much money, and where other key performance indicators sit.
  • Analytic Data store
    • The analytic data store is an important component to predictive analytics. In the Microsoft BI stack this will be your SQL Server Analytic Services server(s tabular, cube). The analytic data store will have the results of calculations stored at several levels of aggregation which helps to speed returning of results as those calculations aren’t performed for each request.
  • Predictive reports
    • These reports will tend to be the most complex to create. They will have underlying data mining projects which generate the trends, clusters, and lift metrics which help your organization to determine future trends. Some of these will need to be designed to match up with descriptive reports. These will help to produce trend lines. Others won’t match up as they will be making predictions that contribute to the assumptions that contribute to those trend lines, or be used to create marketing plans.
  • Production reports
    • These reports will tend to be simple and serve to alert business stakeholders of issues with the dataflow. It can also alert to reasons for delays in processing and subsequently the descriptive and predictive reports. They also provide talking points and serve to delineate or codify the aspects of the dataflow that the stakeholders are concerned with.

Have you checked your disk recently?

Jimmy May and Denny Lee produced a great white paper for SQL Server 2008. This paper was entitled “Disk Partition Alignment Best Practices for SQL Server”. The paper covers a number of recommendations for setting up storage for SQL Server. It can be a pretty heady read for both SQL Server Database Administrators and SAN Administrators. One thing I have taken away from it time and again, in spite of it giving pause to question whether or not I know the first thing about disks, is that I want to put database files on a disk with 64 KB allocation units. The reason for this is that SQL Server stores data in collections of 8 8 KB pages called extents.

What this translates to is dramatically fewer input/output operations (IOP). The default allocation unit size for Windows disks under 16 TB is 4 KB. This is half the size of a data page. That means that 2 read or write operations are needed for each page read or write. When reading ranges of data SQL Server will look for the extents that contain the pages that the indexes point to contain(depending on if they are clustered indexes). If we assume (for simplicity) that each row you expect to read on a range is 1 page wide changing your disk allocation to 64 K will cause your queries to use as little as 1/16th as much IO. When reading 1000s of rows that change has the potential to improve performance more than switching to solid state disks (of course you can configure the allocation units on an SSD and realize the same sort of performance multiplier).

So how do you know what your disk allocation units size is? Go to the start menu. Type cmd in the search box. At the command line type chkdsk and hit enter. In a few moments you should see something like this:

Check Disk

4096. That’s 4 KB. I am not getting the most from my database disks. It looks like I should have my allocation units reconfigured.

Database, Server or Query Optimization?

What can you do to get better performance from your database?

There are a number of things that can be done to improve database performance. Some of these involve changing configurations or creating supporting objects in your database and on your server. Others involve changing queries or when you run them. Some are relatively easy and others can be quite involved.

Rewriting a query or several of them is sometimes the easiest approach to conceptualize but often there are numerous reasons not to. The queries might be built into an application not easily changed. There may be a lack of commitment from the owners of a report to sign off on changes. Creating indexes and statistics will often appear to be the most direct route to performance gains. There is of course a caveat. Your indexing strategy for one query might negatively impact other queries against those tables. For example should you have a long running query that a new index saves a minute of user time on, but costs a more frequently run query 5 seconds, the net result would be diminished overall performance should the improved query run more than time 12 times less frequently than the negatively impacted one. Of course from the point of view of the business the consequences of the delays will likely have differing magnitudes. It might be worth slowing down an update to display board to get answers that helps close a deal.

The best approach to optimizing your database performance will depend on your business needs. A good consultant will help you to determine how database performance translates to satisfying your needs. They might even be aware that the volume of questions needed optimally optimize your database might not optimally fit your needs for a database performance solution you can live with. A good consultant will be a good listener. Tells about your database needs.

What business reason would you give for improving database performance?

There are numerous reasons an organization would want to improve database performance. While it is easy to write some of it off as less necessary than patience, that is certainly the exception rather than the rule. Your business can at most only operate 24 hours per day. It can add capacity during its working hours by hiring more people but in terms of database this can lead to locking and blocking as more users try to work on the same data. This impacts those employees in that they have to wait which can impact revenue in a number of ways. If your customers have to wait or your employees are frustrated by the wait this can drive customers away.

A reflex many organizations have when going through growth is to simply add hardware. This works to an extent but can be prohibitive as more expensive licenses are often required to make use of more processors and memory. It also has its limits. Unmanaged a database server will allocate the resources it has dynamically. Given more resources it will allow queries to use more. Inefficient joins, unnecessary table scans can quickly fill up memory, and lead to rampant parallelization consuming all of the processor cores with a larger percentage of time used managing multi-processor operations. This might appear in an 8 processor/core machine as approximately 50% average utilization but with an edition of SQL Server with a 4 core max this is effectively 100% as the software won’t allow full utilization of more than 4 processors at any given moment though it might change which processors it is using from moment to moment.

What is Structured Query Language (SQL)?

Writing SQL is supposed to be easy. The Structured Query Language (SQL) is what is known as a 4th generation programming language. The SQL standard is maintained by World Wide Web Consortium W3C. It comes in a number of varieties specific to its numerous implementations. Some variations include T-SQL from Microsoft, PL\SQL from Oracle, DB2 SQL from IBM, MySQL (most recently) from Oracle, and PostgresSQL. There are other languages which are derived from SQL. WQL (WMI Query Language), YQL (Yahoo Query Language) and FQL (Facebook query language) come to mind.

The reason these languages are supposed to be easy (at least at a more basic level) is that they are designed to achieve a finite number of operations on large sets of related data in a way that saves the user devising or implementing algorithms for iterating, recursing, or relating those sets. Most implementations have several algorithms for all of the underlying member-wise operations and decide which of them to use to satisfy the high level request (ideally) with the lowest cost plan. These implementations use what is known as cost based optimization.

These high level operations can be broadly classified as data manipulation and data definition operations. Each implementation will tend to extend to the fuzzy edges of these categories and beyond. Many implementations provide some sort of control of flow language and error handling capabilities as well facilitating automation and scheduling.

The data manipulation language (DML) of SQL involves 4 basic types of operations. These are sometimes referred to by the acronym CRUD (Create, Read, Update, Delete) which lends itself to an acronym better than the language elements INSERT, SELECT, UPDATE, and DELETE. CREATE is actually a DDL command. Another acronym that describes an important DML functionality is ACID which stands for Atomicity, Consistency, Isolation and Durability. Sometimes an S is added for serializability. ACID(S) refers to the transactional nature of SQL. In fact the T in T-SQL stands for transactional.

The transactional nature of SQL also applies to Data Definition Language (DDL) operation. These are often referred to as schema operations, especially among T-SQL users. A transaction is an all or nothing operation that traces its origin via generally accepted accounting practices to the Renaissance and Luca Pacioli. Pacioli is credited with the invention of double entry bookkeeping. This in turn was foundational in the development of the ACID compliance requirements of the SQL specification.

In double entry, as the name suggests, debits and credits are entered twice (once into two separate ledgers). Until a transaction has been entered in both places the transaction is not completely entered. Database transactions can involve quite bit more than two entries depending on the business logic. ACID helps to assure that all of the steps in an operation are completed before the operation and its effects take hold.

Atomicity refers to the property of the transaction such that it operates as complete operation or not at all. Consistency refers to the property that no in between state exists between the start and end state. Isolation introduces the world of locking by requiring that the atomic and consistent transaction not affect or be affected by other transactions. This is also part of the reason some people don’t think the S is necessary as isolation (or so the argument goes) necessitates serializability as the consequence of non-cross transaction effect of isolation, others counter that when transactions are not contending for specific data concurrent transactions can be effectively simultaneous and would still need be sequenced. Much of this academic (at least until it causes performance or data problems) of course. Durability refers to the fact that a transaction cannot be rolled back after it has been committed, that is to say once the SQL finishes, its effects are moved to non-volatile storage (disk).

Easy right? Do some selects, inserts, updates, and deletes. No problem, until there is one. What’s great is that what worked until it didn’t (usually) expresses clearly what your intentions were. This can make solving those problems a simple matter of contacting the right expert and telling them there is a problem. The right expert will find your top problems and help you formulate a plan for addressing them.