Ralf Glöckner

In this article I would like to show a way to generate file names with sequential numbers per date via the use of regular expressions.

The benefit of this approach is, that you can use it with varying name parts. So you don’t need to know the position of the date and version number. Additionally the date can be different from the file date, so you don’t have to rely on the file date for sorting.

Like so:

  • SomeDataFile_20150410_v01.csv
  • SomeDataFile_20150410_v02.csv
  • SomeDataFile_20150411_v01.csv

In this example I use a script task in SSIS, which can be integrated in a process of archiving old files and generating new ones.

Notice for this example:

For this example, the date will be written in the format „yyyyMMdd“ and the version number will be 2-digit, both delimited by a „_“. We also know, that the date and version are valid, so we don’t have to check for a valid date or digits in the version number. This allows us to use a more simpler approach for the regex patterns.

The goal is to check, if a file with a given date already exists and if so, to determine the highest version number.

The process in a glance

Let’s say, the new csv file will be put in one directory and all previous versions of the file will be moved to a separate archive folder, prior generating a new file. So the archive directory contains all files you need to get the desired information.

The steps to get the last version number of a file for a given date are as follows:

  • Search all files of a given date in the archive directory.
  • Determine the version number of all found files (via regular expressions).
  • Get the last version number and use it for generating the next one.

Prerequisits

If you don’t already have some data files, create sample files with names like the following:

  • SomeDataFile_20150410_v01.csv
  • SomeDataFile_20150410_v02.csv
  • SomeDataFile_20150411_v01.csv
  • SomeDataFile-ABCv14-20150427_v23 – Copy_v87 – Copy.01.csv

 
Create an SSIS package, and:

  • add a variable for the archive directory: „FlatFile_ArchiveDir“, containing the path to the archive directory,
  • add a variable for the date of the file you want the last version of: „DateOfFile“, containing the desired date,
  • add a variable for the output file: „FlatFile_Target“, it will hold the file name of the new file,
  • add a script task and configure the first two variables for read access and the variable „FlatFile_Target“ for read/write access.

The script

All the action takes place in the Main() method of the script task.

I will only describe the regex part in detail, all other steps will just be described briefly. The full source code can be found at the end of this article.

1. Searching and collecting the files:

The archive directory will be searched for files containing the desired date and a given prefix. As a result we have all files generated for the given date.

This collection will be the base for further analysis.

2. Reading the version numbers of the files via regular expressions

For each file in the collection of found files we need the version number. This will be done by regular expressions. Notice, that we only capture the version number, not the patterns surrounding it, to find the number. This can be achieved by using „lookarounds“.

For each file we do:

// Define search pattern for the version number
string pattVersNr = @"(?<=\d{8}_v)(0[1-9]|[1-9][0-9])(?=.*\.csv);

// Read the version number of the current entry and save it in the list
CurrFileVersionInfo = Regex.Match(FileName, pattVersNr).Value.ToString();

The version numbers will be saved in a list for further processing.

Explanation of the RegEx patterns:

(?<=\d{8}_v)

Locating the position of a version number by the prefix „_v„, preceding a date, formatted as an 8-digit number (\d{8}), without adding it to the result (lookbehind (?<=)).

Matching: any date in the format „yyyyMMdd“, followed by version, e.g. 20150422_v08.

(0[1-9]|[1-9][0-9])

Finding up to 2 digits in the version number, including leading zeros, adding it to the result.

Matching: any number between „01“-„99“, e.g. 05.

(?=.*\.csv)

Locating the position of arbitrary text (.*) following the csv ending (\.csv), without adding it to the match (lookahead (?=) ).

Matching: any text, e.g. AbC-123_xyz.0.csv.

3. Determine the last version number and generate a new one

The List of the version numbers will be sorted to determine the last version number. This number will be incremented by 1 and added to the file name of the new file.

4. Adding the new version to the file name

To add the number, including a leading zero, we can also use regular expressions:

FilenameNew = FileNamePartStart + "_" + DateFileName +
"_" + FileNamePartVersion + Regex.Match("0" + FilenameVersion, "[0-9]{2}$").Value.ToString() + ".csv";
.

Here we use the pattern [0-9]{2}$ to match any 2-digit number at the end of the search string ($). Given the string "0" + FilenameVersion to search, it will return the new version number with 2 digits, with or without a leading zero, depending on the length of the version number.

Looking further

The regular expression can be adjusted to fit your needs. Here are two examples.

Matching just the version number, without the date

To match only the version number, preceded by „_v“ use this lookbehind:

(?<=_v)

Be aware that this will match the wrong version number, if the filename contains more than one occurence of „_v“!

Like in this example: „SomeDataFile-ABC_v14v66-20150427_v54 – Copy_v87-000.1.csv“.

Matching the version number directly at the end of the filename

Remove the pattern preceding the file ending in the lookahead:

(?=\.csv)

Full Source code:

Put this in the header of the script:

using System.IO;
using System.Windows.Forms;
using System.Text.RegularExpressions;
using System.Collections.Generic;

The source code for the Main() method can be found here:  oraylis_numbering_files_via_regex.

Put it in the Main() method, before „Dts.TaskResult = (int)ScriptResults.Success;“ and you are ready to go.