Lukas Lötters

Hello All,

in this blog post I’ll show you how you can use regular expressions in Power BI by using the R transformation steps. Power BI itself is not capable to filter or select by a regular expression. It is a favored feature by some users (see community), but until now there is no way to use regular expressions in Power BI.

Last Week I tried to import data from imdb.com. The IMDB hosts a free FTP Server where everyone can download ratings, budget numbers, gross sales and much more for plenty of movies and tv shows. The problem is that these text files aren’t formated very well. So I used Power BI to clean up a little bit. For this blog post I only used the Top 250 movies of all times. These movies are stored in the ratings.ls file. You can see my transformation steps in M below.

Clean data table from IMDB

Clean data table from IMDB

Everything worked fine until I tried to derive the release year from the movie title. If you look at the Top 250 this seems easy, because the titles in this small list are very well structured. The pattern is always „Title (Year)“. If you look at the full dataset things look a little bit different. I found no way to handle these irregularities  with the Query Editor. The optimal way I think is to use a regular expression  like this one \((19|20)\d{2}‘. It searches for a string which starts with a ‚(‚  followed by 19 or 20 and two more digits.

I was close to give up, but then I rembered a feature of Power BI which allows to run R scripts in context of the Query Editor, Link. With this „Run R Script“ step you can hand the full dataset from the editor to R. And there you can do some transformation with R and return the transformed dataset to Power BI. So I wrote a little R script.

Now there is a new step inside the Query Editor, which returns my dataset in a table.

R Script Step in Power BI

R Script Step in Power BI

Now I have all the information I need and I can create a little chart which shows me how the Top 250 are distributed over time. It seems like newer movies get better votes and rank over time. You can explore the Power BI report yourself by clicking on the screenshot of the Power BI report.

Table with new Year column

Table with new Year column

 

IMDB data visualized in Power BI by year, votes and rank

IMDB data visualized in Power BI by year, votes and rank – Click on the screenshot for the full report

In this blog post I showed you how you can use regular expressions in Power BI. I think this is a cool example how R Scripts can be used to make Power BI even better.

Thanks for reading! I wish you all a Merry Christmas and a happy new year.

I will follow up in 2017!