Windows File Splitter

Unlike Linux systems, Windows doesn’t have a built-in command line file splitter. Splitting large files into smaller chunks is something you often want to with data warehouses (such as Snowflake), in order to be able to use multiple threads for better bulk loading performance.

I saw a post by Greg Low, SDU_FileSplit – Free utility for splitting CSV and other text files in Windows where he had created one for Windows, but it wasn’t open-source.

I decided to create an open-source file splitter for Windows.

It’s a standalone .NET 5.0 executable, supports wildcards, can recurse sub-folders (careful with that option!) and automatically gzip compress output files.

Example use:

FileSplitter.exe -i c:\temp\*.csv -m 100000 -c -o c:\temp\TestFileSplitter

Splits all the files in specified folder c:\temp with extension .csv, and splits into max. 100K lines per file, storing the output files in folder c:\temp\TestFileSplitter

Date and Time Dimension

Almost every fact table in a data warehouse uses a date (or calendar) dimension, because most measurements are defined at specific points in time. A flexible calendar date dimension is at the heart of most data warehouse systems; it provides easy navigation of a fact table through user familiar dates, such as weeks, months, fiscal periods and special days (today, weekends, holidays etc.).

I’ve created a date dimension generator here at Github

It targets SQL Server, but should be easy to convert to other RDBMS.

It features:

  • User defined start and end dates
  • Computed Easter dates (for years 1901 to 2099)
  • Computed Chinese New year dates for years 1971 to 2099.
  • Computed public holidays for US, UK, Canada, Ireland, Malta, Philippines, Australia (with state specific for WA, NSW, QLD, SA, VIC).
  • Date labels in US, UK and ISO formats.

Things to Note:

  1. The [TodayFlag] needs to be updated once per day by a scheduled task (timezone dependent: might need a flag for each timezone).

  2. If you use an unusual Fiscal year (say 5-4-4), it will need to be loaded from an external source (such as an Excel/Google spreadsheet).

  3. The precise start date of the month of Ramadan is by proclamation, so these need to be added, year by year. It is possible to calculate but can be a day out, and can vary by region.

    https://travel.stackexchange.com/questions/46148/how-to-calculate-when-ramadan-finishes

    https://en.wikipedia.org/wiki/Ramadan_%28calendar_month%29