Alphabetical ranking

Today I faced the challenge of having to find the rank of a text string in a range of text strings. Or put another way, in what position would the entry sit alphabetically?

The RANK function only works on numeric values. So instead, we have to resort to logic.

=COUNTIF(A1:A5,"<="&A1)

This will figure out how many cells within the range A1:A5 are less than or equal to A1, the value that we're trying to rank.

A beautiful yet simple piece of logic, learned from "anhn" in VB Forums.

RANK

Summary

If you have a range of values, the RANK function tells you where in that range a specific number sits. It can either tell you it's the nth biggest number in the range. Or it can tell you it's the nth smallest number in the range.

The formula only works on numeric values. If text values appear in the range, then these values are ignored. If the formula attempts to evaluate the rank of a text value, then it will result in a #VALUE! error.

Syntax

=RANK(Cell,Range,Value)

Cell is the number that you want to find the rank of. It doesn't need to be a number physically within the range of values you're ranking it against. But that exact value must appear in the range. Otherwise, the formula will yield #N/A.

Range is the range that you want to rank the number within. It must be a continuous range, but can be square—it need not be a single row or column of data.

Value is a number that determines the order in which the item's ranking is given.

  • Value = 0. The formula will tell you how big the cell is. I.e. the cell is the nth biggest in the range
  • Value is anything other than 0. The formula will tell you how small the cell is. I.e. the cell is the nth smallest in the range.

Examples

Below are some examples of the RANK formula at work. It assumes the range A1:A5 contains the values 1, 2, 3, 4 and 5.

=RANK(2,A1:A5,0) gives the value 4. The number 2 is the fourth biggest in the range

=RANK(2,A1:A5,1) gives the value 2. The number 2 is the second biggest in the range

=RANK(7,A1:A5,1) gives the value #N/A. The number 7 does not appear in the range, so the formula fails.

 

PivotTables: the reverse. Creating raw data from a summary

I received an interesting if slightly odd conundrum from Kat the other day. The solution was quite artful, so I thought I'd share.

She had some summary data, each row of which contained a count representing the number of records that row represented. And she wanted to create raw data from that. To anonymise, let's use the following comparable scenario.

Imagine you have a summary of the stock in your warehouse. Each row contains the stock item's ID number, its name, its description, its location together with a count of how many of these items are located there. Kat wanted to create a dataset that contained one record for each individual item, each row containing all of the above details with the exception of the count.

Let's say the source data is in a sheet called Source, and the new dataset will be created in a sheet called Target within the same workbook.

The first job was to work out how many rows we'd need in our target dataset. Easy: that's just a sum of the count column in the source. The source had 248 data rows, and the target would need 438. So in my Target sheet, I put the numbers 1 through 438 in column A.

The aim would be to keep the rows in the same order, so the first few rows of the target would represent the stock in row 1 of the source, followed by rows representing the stock in row 2 of the source etc.

So in the Source sheet, I needed to figure out the start and end rows in the target that each summary row in the source represented. Row 1 had a quantity of 4, row 2 a quantity of 2. So summary row 1 would be represented by rows 1 through 4 in the target; summary row 2 by rows 5 and 6. Etc.

To do this, I created two columns, one to store each of the start and end dates. We manually type a 1 for the start position of row 1. Its end position is the sum of the quantity and its start position, less 1.

Formulae in action

We also needed to create the serial number of the row in the source. But we subtract 1 from this number. So the serial numbers of the rows in the source run from 0 through 247. More on why in a moment.

In the target, we insert a column that will contain the source row number that the row will represent—the crux of the challenge, if you will. Let's call this column "Source Row". And we can manually populate the first row of this with a 1.

Now here's the important logic that goes into the target rows. We look up the previous row's Source Row in the amended source row number just mentioned. And when we find it, we'll bring back that row's start position. If that number is greater than the current target row's serial number, then our Source Row reference should be the same as the row above. But when this condition fails, it should increment by 1.

Or in other words, if the start position of the previous row's summary row is greater than the serial number of the row we're on, then we should continue referencing the same source row as did the previous entry. Otherwise, let's increment by one. And in Excel:

=IF(VLOOKUP(B3,Source!$G$2:$H$249,2,0)>A4,B3,B3+1)

Copy that down and you have the serial numbers of the rows that you need to draw data from: four 1s, followed by two 2s, a 3, a 4 etc.

This can be used as the basis for a VLOOKUP to bring back the data itself from the source:

=VLOOKUP($B3,Source!$A:$E,2,0)

Here's a link to the file.

TRIM

Summary

This is an oft forgotten function. Quite simply, it gets rid of unwanted spaces. It does so in the following ways.

  • All leading spaces that appear at the start of the cell being trimmed are removed
  • All trailing spaces at the end of the cell being trimmed are removed
  • Any groups of spaces appearing midway through the string are reduced to a single space.

Syntax

=TRIM(Cell)

Cell is the cell containing the text string you're trimming.

Examples

Below are some examples of the value of cell, and the result of the TRIM function.

"   Chicken" becomes "Chicken"

"Chicken   " becomes "Chicken"

"   Chicken   Tikka   Masala   " becomes "Chicken Tikka Masala".

Unable to insert columns [solved]

Sometimes you'll encounter a spreadsheet that doesn't allow you to insert a new column. When you try, it will give you the following error:

To prevent possible loss of data, Excel cannot shift nonblank cells off of the worksheet. Select another location in which to insert new cells, or delete data from the end of your worksheet.

I adore its use of the "off of" artificial construct.

It's basically saying that there's something at the far right of your worksheet (column IV pre–Excel 2003; column XFD thereafter), or else at the very bottom (row 65,536 pre–Excel 2003; row 1,048,576 thereafter).

Often you don't know why it's happened, and you can't find any data anywhere near those extremes. Here's how to solve it.

  • Find the last column in which data appears (column AQ, say)
  • Highlight the next column to the right (column AR)
  • CTRL+Shift+Right
  • ALT, E, A, A.

This last step is an old shortcut that still works to this day. In earlier versions of Excel, it selected Edit | Clear | All. Essentially, it gets rid of the contents and formats of all cells highlighted.

The equivalent steps to sort out the rows are:

  • Find the last row in which data appears (row 100, say)
  • Highlight the subsequent row (row 101)
  • CTRL+Shift+Down
  • ALT, E, A, A.

To allow you to insert rows and columns, you'll need to save your spreadsheet and, in true Microsoft style, close it and open it up again. Now you should be able to add columns and rows to you heart's content.

Comparing date/time values with dates

I received a tweet last week from @rcdl:

How do you count how many date/time values match a given date?

It was a Thursday night and I was out having dinner with a bunch of friends. But I felt compelled to answer the call. SO I did.

First, we must remember that dates/times in Excel are stored as numbers, and the unit of those numbers is days. So any time within a day is simply that day's number, with a decimal extension to represent the amount of the day that has elapsed.

Let's assume we've got a column of date/time stamps running down column A. And We have a test-case date in cell D1. Out goal is to figure out how many of the entries in column A fall on the day in question.

Our first task is to round the date/times down, essentially getting rid of the time of day at which the events occurred. We do this using the following formula in cell B1:

=ROUNDDOWN(A1,0)

This rounds A1 down to zero decimal places. Using the ROUND function would round all times after midday up, to the following day. Which is not what we want.

That formula can then be copied down column B as far as is necessary.

Next, we need to count how many of the entries in column B match the entry in D1. So in E1, we use the following formula:

=COUNTIF($B:$B,D1)

That simply counts the number of entries in column B that are identical to D1.

Job done.

Shortcut: Copying unformatted text

Oftentimes, I need to copy a cell’s contents into another application, usually Word or my email client. And often, I simply want the contents, not the associated formatting.

For many applications (Word included), you have the Paste Special… option. But in simple cases (or where there’s no such feature, in Gmail for example), there’s an easier way.

  • In Excel, highlight the cell in question.
  • Hit F2.
  • CTRL+Shift+Home. (This highlights the cell’s entire contents.)
  • CTRL+C.
  • Go to destination application.
  • CTRL+V.

It looks like a lot of steps, but when you actually do it, it’s oodles quicker than using the ribbons to accomplish the same.

If you need to copy multiple cells into an application bereft of the Paste Special… facility, then do a regular copy and paste from Excel into Notepad, and then do a copy and paste from Notepad to the destination. Cumbersome, but it works.

Using financials in Mail Merges

Imagine you have a financial figure that you want to use in a mail merge (a salary, bonus, house prices etc.)

If you use them as they stand, they'll likely look rubbish. They won't show the currency and there won't be a lovely comma separating the thousands from the sub-thousands. Instead, create a new field containing the following formula (assuming the financial field you want to use is in B2):

="£"&ROUNDDOWN(B2/1000,0)&","&MOD(B2,1000)

Let's say the salary was £32,500. The ROUNDDOWN function will divide your salary by 1,000 and give you that figure—in this case 32. The MOD function also divides your salary by 1,000 but instead gives you the remainder—in this case 500. The & function is a shortcut for the CONCATENATE function, stringing multiple entries together. So it adds the £ at the beginning and the comma midway.

The formula would need some additional logic if the numbers were to stretch into the millions, but you get the picture.

PivotTables: A comprehensive guide (part 2 of 2)

In the last post, we covered the fundamental principles that supported PivotTables, and reached the stage at which the frame of the PivotTable was built.

This post will introduce you to the analysis itself.

So now, we have the wireframe of a PivotTable.

The frame of a PivotTable

There are four sections within a PivotTable. Each can house one or more fields. And to get the fields into those sections, you either drag them into the four boxes directly below the field list; or you drag them into the table itself. Each way yields the same result.

Page fields (or Report filters)

These are filters that are applied at the field level. You use this feature if you want your PivotTable to only look at a subset of rows in your source data: a specific geography; women; high-priority risks. You can drag multiple fields into this area, allowing you to, say, only analyse males in the US.

Column fields/labels

These will contain the fields whose values you want to see in the columns. If you want each column to show a different country, drag the "country" field into this section. And if you want to see cities within each country, then drag both fields into that section. The top-most field will form the first grouping, and so on.

Once selected, a dropdown box will appear next to the column title in the table itself. This can be used to sub-select those values that you want to appear. You may, for example, not want to report data for the US.

Row fields/labels

These will contain the fields whose values you want to see in the rows. If you want each row to show a different gender, drag the "gender" field into this section. And if you want to also group your data by age, drag both fields into that section. Again, the top-most field will form the first grouping, and so on.

As with the columns, a dropdown box will appear next to the row title in the table, which can be used to sub-select those values that you want to appear.

Data/values

Here's the bit that quite a few people find a bit odd: the data itself. It's also the crux of the puzzle.

If you've defined your filters and row and column headings, the frame of your table looks good, but without populating data/values. So here's how they work.

If you simply want to count records, then choose any field that's populated across all records (a unique reference number or gender, for example). If it's a text field, then the PivotTable will have no option but to count. (Have you ever tried to sum a bunch of countries?)

Intuitively (to me at least), it would make more sense if PivotTables defaulted to give a count of the records. But in the absence of this, you're forced to choose a field, then contents of which are counted. Note: it only counts rows that contain something in the column selected. So if you want to count every row, choose something that's populated exhaustively. So if you have a record with a missing gender and choose gender as your data/values field, that record will not be counted.

If instead you choose a field that's formatted as a numeric, then Excel will decide that you're probably looking to sum the data. So, for example, if you drag the Revenue column into the core of the table, it will show the total (sum of) revenue by geography (if geography is in the column labels).

PivotTables with sums

If you want to report on multiple fields, the core of the table can accommodate multiple fields. You may, for example, want to report both revenue and profit for each intersection of rows and columns.

If sums are not sufficient for you, then hit the down arrow next to the field name in the tray in which it appears below the PivotTables field list. Hit Value Field Settings. A list of other functions appears for selection on the Summarize by tab, the most useful being Max, Min and Average.

That same dialog box contains additional options on the Show values as tab. The dropdown at the top allows you to change the way in which your data appears. So instead of showing values you can, for example, show the percentage of the row or column that the value represents. It's worth having a play in this area once you're confident in the more common functionality.

If you have any questions, please get in touch. PivotTables are quite daunting, but once you become comfortable using them, they unleash such power.

PivotTables: A comprehensive guide (part 1 of 2)

If you are anything more than a very basic Excel user, then this post is important to you. It introduces undoubtedly the most powerful single element of Excel: PivotTables. If you're new to them, this is one post I heavily encourage you to read.

PivotTables bring analysis to the masses. All you need to use them is a set of organised data and an appetite to analyse it.

Whether your data is made up of people in an organisation; defects in an IT system; risks and issues associated with a project; stock items and their location; invoices and their status. All of these datasets are the lifeblood of PivotTables.

So what do PivotTables do? Quite simply, they allow you to quickly and easily summarise and interrogate data. You can find the gender balance of employees by region; the average closure rate of defects by assignee; the number of open issues by severity; the number of stock items within each warehouse; the maximum time it has taken each client to pay their invoices.

So, let's get started.

First up, you need a dataset. Your data needs to be organised in contiguous columns (no empty columns in the middle), each column containing a specific attribute of the data set: gender, defect assignee, issue severity, stock location, invoice issue date.

And each row needs to be an item pertinent to the dataset: an employee, a defect, an issue, a stock item, an invoice.

Once you've got that, the best thing to do is to format the dataset as a Table. On the Home tab, hit Format as Table, and select your favourite colour scheme. This action does three key things:

  • The range in which your data sits is immediately and automatically given a name. So instead of referring to it by cell references, you can simply reference its name
  • The Table gains some lovely aesthetics—a special title row and alternate row shading thereafter. If the default colours aren't suitable, you can define your own by hitting New Table Style at the bottom of the Format as Table box. And if you scroll down in the Table, the column names nudge up to replace the A, B, C … column labels. Which is a lovely touch
  • The data is treated as one. Filters and sorting automatically knows what to filter or sort, while adding new rows at the base of the Table—or columns to the right—ensure that they're kept with the family, and are added to the named range.

But while nice in themselves, many attributes of a Table are perfectly suited to PivotTables.

So let's create one.

Creation of PivotTables

While your cursor sits within the Table, from the Insert ribbon, hit PivotTable. This will bring up a dialog box.

PivotTables dialog

Ensure the New Worksheet radio button is selected and hit Enter. Wherever possible, I always prefer my PivotTables to sit within their own sheet. You'll learn later that they are dynamic beasts of variable and often unpredictable size, so they don't sit well alongside other data.

A new worksheet will be created containing a frame that will form the basis of your PivotTable.

The frame of a PivotTable

Each of your column names will appear in the list on the left. And the frame of the PivotTable itself will appear in the spreadsheet on the right. And the column names can be dragged and dropped, either into the table itself or into the four boxes below—each option results in the same outcome.

The latter post in this pairing introduces the analysis itself—which is where the arousal begins.