Excel is a love of life for everyone who wants to tame the data by making much larger data in short yet precise data without losing its meaning and value. But what if you want to go the other way around i.e. you have a very basic requirement and wants to search for it in millions of records?
Suppose you have some products and each product has different varieties and transactions are in thousands if not in millions. For understanding I have a small set as follows:
Now out of this stack you want to know how many varieties each product has?
Scratching your head already? This is exactly what we are learning today!
Understanding the requirement
Just to be clear that every reader understands the requirement and why is it a challenging one, I will explain it a little more. We have the data in two columns. First; products itself and secondly the varieties that are sold. As you can see there is a repetition of product and also the variety in the respective column.
Now what I want is to know, for example, how many types of Apples are sold to customer? As the data is small I can easily see that there are only two types of apples; green.
I was able to know it only because record is small enough to process. But think of records that are in thousands! Will you risk doing like this? I won’t! I have Excel with me to do this.
Method 1: Remove duplicates [Destructive]
Excel provides us with a tool that removes the duplicates. If we apply it we will be left with only unique values and we can count them easily by further processing the suppressed data using formulas. Following steps explain the process:
Step 1: Add a helper column. Name it “Count” and enter 1 in all cells down to fill the range.
Step 2: Select the data and go to Data tab and click Remove duplicates button. This will cut the data down to unique values only.
Step 3: Copy the first column and paste it somewhere empty inside the worksheet. And apply the remove duplicate operation again. Now you will be left with only unique products.
Step 4: On the right of data obtained in above step name the column “Distinct count” and put this formula in cell F2 and double click the fill handle to populate the range:
Now you have the distinct count of types each product you have or sold:
Method 1: Drawbacks
Biggest problem with this method is it changes the data and thus considered destructive. The only option is to copy the whole data and the apply this technique which is not always feasible.
Secondly you have to go through three sub-processes to get it done. Removing duplicates twice and then using SUMIF or similar formula to count the values. How often you have enough time to do things longer way?
Method 2: Advanced filter with advanced formula
So first method failed for being destructive. But we can take care of this if we use advanced filter instead of completely removing the data from worksheet. But this poses another challenge. We need to find a way to do conditional subtotal and there is no straight formula to do it. But we can still make it so follow along:
Step 1: Take the same range as we considered in Method 1 including the helper column. Select the range and convert it into table. This will make writing formula super easy because of structured references of tables (one of the coolest tricks out of many Excel tables can do). Also change the name of table by going to Design tab > properties group > Table name to “pp”.
Step 2: Select the table and go to Data tab > Sort & Filter group > click Advanced filter button. From the dialogue box that pops up, check the unique records only box and click OK.
Step 3: You might have realised that we have got the same data that we got in Method 1 but this time the data is not lost. Its just filtered out.
Step 4: Copy the product column and paste it under the filtered table. Select the column and remove the duplicates same way we did in Method 1.
Step 5: To make the distinct counts put this formula in cell B19 and drag it down to cell B21:
This formula is actually making the sum of count column from filtered data. Following illustration walks you through above steps:
Method 2 – Drawbacks
So we solved the destructive problem of Method 1 by employing advanced filter. But that posed another challenge and made the formula unnecessarily complex. Sure its done. But is there a way to make it even easier and better? Yes there is!
Method 3: Pivot tables
Yes Pivot tables can do distinct count in Excel 2013 and its crazy fast and super simple. Guess what! You don’t even need helper column for that
Step 1: Select the columns (just product and code column) and go to Insert tab > Tables group > click Pivot tables
Step 2: A new window will pop up. Here you can choose to insert the pivot table in existing sheet or a new sheet. I chose existing sheet and mentioned the cell for insertion. Don’t forget to check “Add this data to the Data Model”
Step 3: Will take a moment to process and normal pivot table interface will appear. Move the products field to rows quadrant and code field to values quadrant.
Step 4: Currently it is showing the simple count of products, but we need distinct count. For this click count of code filed and select “value field settings” and from the dialogue box select “distinct count” and click OK
Let me say that LOUDLY!
NO DATA DESTRUCTION NO FORMULA NO VBA AND YET DISTINCT COUNT AND THAT TO IN A FLASH!
EXCEL IS LOVE!
Haa! Apologize I got a little over-joyous
So in this article we learnt three methods to make distinct count. How do you make the distinct count or what method would have you adopted? Share your ideas in the comment section below to start the debate!