Better Content Inventories with Content Analysis Tool
Improve your workflow when using exports from Content Analysis Tool in Excel. Tips and tricks to clean-up its raw crawl data.
Content audits and content inventories are essential documents for understanding the scale, scope, fate and quality of content. However they notoriously laborious to create and maintain.
I typically use Content Analysis Tool (CAT) to automate an initial inventory, saving considerable effort and improving accuracy over a manual inventorying process.
However a raw job-level export typically needs cleaning-up to begin to make its data more usable and useful; both for the lone content wrangler and for any collaborative audits with other content or domain experts.
These tips will help improve your workflow as you begin to work with CATs raw crawl data from your site.
Content inventory clean-up tips
- Transform bytes to kilobytes (Kb)
- Condensed, readable and usable URLs
- Find duplicates and junk via conditional formatting
- Make MIME content types understandable
Transform bytes to kilobytes (Kb), megabytes (Mb) and gigabytes (Gb)
CAT has provided you with a file size as a string of numbers for a content item. This is hard to read.
Example: Column D “Size” : Row 2 “22146”
Transform 22146 (bytes) into 21.63K (kilobytes), an understandable and qualified number, using this formula:
'Raw catExport' and
!D1 to match where you are working in your inventory.
Condensed, readable and usable URLs via CONCATENATE and HYPERLINK functions
You find it frustrating to have long and complex URLs that:
- take up too much horizontal screen space.
- repeat useless data about the protocol, domain, parent folders.
- do not link to the live content for easy referencing of production.
You will be working with three worksheets:
- The original worksheet from CAT.
- A new worksheet with a complete copy of the original worksheet.
- An additional worksheet to store the part of the URL that you don’t want to see (but which a formula must reference).
Raw catExport and
URL in the example.
- Create a copy of your export in a new worksheet.
- In both worksheets use Find and Replace to remove the generic parts of the URL that you’d like to hide, e.g., “http://www.domain.com/root-folder”.
- In a new, third, worksheet create the part of the URL that you removed in step two so that it can be referenced by a formula, i.e., paste “http://www.domain.com/root-folder” into cell A1.
- In the cloned worksheet use the following formula to create the concatenated hyperlink:
=HYPERLINK(CONCATENATE(URL!$A$1,'Raw catExport'!A2),'Raw catExport'!A2)
Tip: If your worksheet names are a single word or phrase then you don’t need to wrap them in
'single quotes' in the formula.
Find duplicates and junk via conditional formatting
CATs crawl of your content has returned duplicate items or junk that you want to rapidly identity and exclude from your inventory.
Example: ../_edit type link suffixes for the Editors in your CMS:
- ../root-folder/interesting-folder/file1.html /_edit
- Identify a pattern of duplication or junk that you’d like to highlight or Filter, like /_edit.
- From the Home menu ribbon choose Conditional Formatting.
- Then select Highlight Cell Rules, Text that Contains… .
- Consider setting the beginning with, or ending with rules.
- Enter your pattern, like: ending with “ /_edit ”.
- Run the Conditional Formatting.
- Filter the column where the pattern occurs and review.
- Use Conditional Formatting to find duplicate items.
- Use colour to flag items and to Filter them.
Make MIME content types understandable
CAT returns full MIME content type descriptions in Column C “Type”. These are hard to read and could confuse a lay audience.
- Select a content type
- use Find and Replace to create an understandable equivalent
Examples (from above):
- Types marked “unknown” are often redirected items or dead links
- use Filters to reduce rows of irrelevant content types
Let me know in the comments if you have any improvements or suggestions for working with CAT exports, or with content inventories and audits more generally. Likewise if you need help with any of the advice in this blog post.
- Content Insight’s Content Analysis Tool Blog.