Solving the Monty Hall Problem using Bayes Theorem

The ‘Monty Hall’ problem is best described by the wikipedia entry:

The Monty Hall problem is a probability puzzle, loosely based on the American television game show Let’s Make a Deal and named after its original host, Monty Hall. The problem was originally posed (and solved) in a letter by Steve Selvin to the American Statistician in 1975. It became famous as a question from reader Craig F. Whitaker’s letter quoted in Marilyn vos Savant‘s “Ask Marilyn” column in Parade magazine in 1990.

Suppose you’re on a game show, and you’re given the choice of three doors A, B and C.

Behind one door is a car; behind the other two are goats. You pick a door, say door A, and the host, who knows what’s behind the doors, opens another door, say door B, which has a goat. He then says to you, “Do you want to change your choice to door C?”

Is it to your advantage to switch? (It is!)

Many readers of vos Savant’s column refused to believe switching is beneficial and rejected her explanation. After the problem appeared in Parade, approximately 10,000 readers, including nearly 1,000 with PhDs, wrote to the magazine, most of them calling vos Savant wrong.[4] Even when given explanations, simulations, and formal mathematical proofs, many people still did not accept that switching is the best strategy.[5]Paul Erdős, one of the most prolific mathematicians in history, remained unconvinced until he was shown a computer simulation demonstrating vos Savant’s predicted result.[6]

First we need to define some notation:

A, B – events
P(A) – the probability of event A occurring
P(B) – the probability of event B occurring
P(A | B) – the probability of event A occurring, given that event B has already occurred
P(B | A) – the probability of event B occurring, given that event A has already occurred

Bayes Theorem is defined as:

The following reasoning is from Julian Havil’s book “Impossible?”

Assign symbols to the events:

A – the event “car is behind door A”
B – the event “car is behind door B”
C – the event “car is behind door C”
MA – the event “Monty opens door A” … similarly for MB , MC

Assume door A is chosen initially by the player, so Monty can open door B or C:

P(MB | A) = ½ ,   P(MB | B) = 0,   P(MB | C) = 1

So, since A, B and C are mutually exclusive events:

P(MB) = P(MB | A)P(A) + P(MB | B)P(B) + P(MB | C)P(C) = ½ x ⅓ + 0 x ⅓ + 1 x ⅓ = ½ 

Now, the player can stick or change. If they stick with door A, their probability of winning the car is:

P(A | MB) = P(MB | A)P(A) / P(MB) = (½ x ⅓) / ½ = 

If they switch to door C, their probability of winning the car is:

P(C | MB) = P(MB | C)P(C) / P(MB) = (1 x ⅓) / ½ =

Performance Improvements in .NET 6

There have been a large number of Performance Improvements in .NET 6 as evidenced in Stephen Toub’s blog post. Most of the time we don’t care about assembly level performance optimisations because the bottleneck is usually accessing some external resource, such as a database or web service.

If you need to benchmark .NET code take a look at a great tool, BenchmarkDotNet and also take a look at the book referenced there, Pro .NET Benchmarking as getting benchmarking correct can sometimes be quite tricky.

If you’re not yet on .NET 6, and you have code in a large loop that you want to squeeze some performance out of it (and it won’t make the code hard to understand and maintain!) here are a couple of simple tips:

  • Testing if n is even: replace (n % 2 == 0) with ((n & 1) == 0)
  • Dividing by 2: replace n / 2 by n >> 1 (but be aware of the unsigned / signed behaviour of right shift)

LINQPad script to Generate SQL Server Database Restore Script from Ola Hallengren’s Backup Solution

Unless you perform regular restores of your database backups, you don’t know that you actually have a valid backup. In a career spanning over 30 years, I’ve seen two occasions where a company was performing backups (or so they thought!) and sending tapes offsite, assuming they were good when in fact the tapes were blank!

The majority of SQL Server installations use Ola Hallengren’s maintenance solution (and certainly all the ones I’ve had anything to do with).

If you are doing regular (5 minutes or less) transaction log backups, a restore might involve applying quite a few transaction logs.

I’ve written a short LINQPad script here which will generate the TSQL to perform a database restore either from a point in time or the latest available, based upon the default locations and naming conventions used by Ola’s backups. It’s Differential backup aware, as well as creating the multiple Transaction Log restore statements. It’s also takes into account where backups are split into separate backup files (which is quite common). You specify the server name, the database name, the root folder where the backups are stored, and either a point in time or the latest.

Disclaimer: Use at your own risk AND test thoroughly!

Example output:

USE [master]

RESTORE DATABASE [AdventureWorks] FROM 
   DISK = N'C:\temp\Backup\K7\AdventureWorks\FULL\K7_AdventureWorks_FULL_20211118_151558.bak'
 WITH NORECOVERY, REPLACE

RESTORE DATABASE [AdventureWorks] FROM 
   DISK = N'C:\temp\Backup\K7\AdventureWorks\DIFF\K7_AdventureWorks_DIFF_20211118_152101.bak'
 WITH NORECOVERY

RESTORE DATABASE [AdventureWorks] FROM 
   DISK = N'C:\temp\Backup\K7\AdventureWorks\LOG\K7_AdventureWorks_LOG_20211118_152226.trn'
 WITH NORECOVERY, STOPAT = '2021-11-21 17:07:22'

RESTORE DATABASE [AdventureWorks] WITH RECOVERY

.NET: Disable Insecure TLS protocols

TLS1.1 and TLS1.0 (and lower) protocols are insecure and should no longer be used.

For .NET 4.7 or later, you do not need to set System.Net.ServicePointManager.SecurityProtocol. The default value (SecurityProtocolType.SystemDefault) allows the operating system to use whatever versions it has been configured for, including any new versions that may not have existed at the time your application was created.

If you want to explicitly code this in .NET, rather than specify the allowed protocols, disable the disallowed protocols before making any connections:

// TLS must be 1.2 or greater. Disable SSL3, TLS1.0 and TLS1.1 [Note: this is the default behaviour for .NET 4.7 or later] 
ServicePointManager.SecurityProtocol &= (~SecurityProtocolType.Ssl3 & ~SecurityProtocolType.Tls & ~SecurityProtocolType.Tls11);


Tuning SQL Server Queries 101

First, get an actual query execution plan. Look for warnings in the query plan.

Basic

  • First look for large scans and lookups: these can often be resolved by creating a new index or extending an existing one with additional included columns. Seeks are usually preferable to scans.
  • Then look for significant variance of Actual versus Estimated row counts: you can provide the optimiser with more accurate information by updating statistics, creating new statistics objects, adding statistics on computed columns, or by breaking the query up into simpler parts. Might be caused by ‘parameter sniffing’.
  • Then look for expensive operators in the query plan, especially those that consume memory such as sorts and hashes. Sorting can sometimes be avoided by altering/adding indexes.

More Advanced

Joe Sack has an excellent walkthrough here: The Case of the Cardinality Estimate Red Herring

SQLFrontline Freemium Edition

I’ve recently changed what I was calling the ‘Demo’ version of SQLFrontline, to a ‘Freemium’ model. The demo version only displayed one recommendation result in each of the four severity categories (Critical, High, Medium, Info).

The free version does not include all the features of the paid premium version obviously, but still provides some useful recommendations, providing advice on 40 checks.

Both use the same lightweight metadata collection.

The Free Version:

  • Performs 40 checks (out of 350+), but doesn’t show all affected objects if a check results in a recommendation
  • Deletes all collected metadata after collection
  • No reminder list is shown
  • Does not display a list of issues that are no longer present since last collection
  • Sends a single email for all servers
  • No database specific recommendations are made
  • Can only collect metadata for a limited number of servers

The Premium Version:

  • 350+ checks performed across the categories of Configuration, Reliability, Performance, Security, Server Info, Table/Index Design
  • New checks are constantly being added
  • Reminder list of recommendations that have been made previously and not yet fixed
  • List of issues fixed compared to the last collection
  • Can choose how long to store collected metadata, so that point in time reports can be made, along with automated estimates of DB growth over time
  • Can send 1 email per server or a single email for all servers
  • Ability to ‘mute’ recommendations on an individual basis, or entire check (for non-critical checks)
  • No practical limit on the number of servers

If you want to try it out, click this link to request a free access token.

Once you have an access token, here’s how to run it: How to run SQLFrontline

Don’t Embed SQL into SSRS Reports

Reasons not to embed SQL in SSRS reports (.rdl) and create stored procedures instead:

  • Easier to version control in SCC
  • Better visibility in the database
  • Easier to debug
  • Easier to fix any performance problems (and to hot fix, if necessary)
  • Easier to re-use functionality (which is often lengthy to produce) with other systems/APIs etc
  • Stored Procedures get given a name that accurately describes what they do
  • Easier to gather execution statistics
  • Easy to see what parameters are used to filter
  • Can be secured with explicit permissions
  • Easier to write automated testing against a stored procedure

All seem fairly obvious, but it’s surprising how many people still embed SQL into SSRS reports.