Categories
Resources

Better Content Inventories with ContentWRX Audit

Improve your workflow when using exports from Content Analysis Tool in Excel. Tips and tricks to clean-up its raw crawl data.

Content audits and content inventories are essential documents for understanding the scale, scope, fate and quality of content. However they notoriously laborious to create and maintain.

I typically use ContentWRX Audit to automate an initial inventory, saving considerable effort and improving accuracy over a manual inventorying process.

However, a raw job-level export typically needs cleaning-up to begin to make its data more usable and useful; both for the lone content wrangler and for any collaborative audits with other content or domain experts.

These tips will help improve your workflow as you begin to work with ContentWRX’s raw crawl data from your site.

Content inventory clean-up tips

  1. Transform bytes to kilobytes (Kb)
  2. Condensed, readable and usable URLs
  3. Find duplicates and junk via conditional formatting
  4. Make MIME content types understandable

Transform bytes to kilobytes (Kb), megabytes (Mb) and gigabytes (Gb)

Problem:

ContentWRX’s has provided you with a file size as a string of numbers for a content item. This is hard to read.

Example: Column D “Size” : Row 2 “22146”

Solution

Transform 22146 (bytes) into 21.63K (kilobytes), an understandable and qualified number, using this formula:

=IF(('Raw catExport'!D2>=POWER(2,30)),
TEXT(('Raw catExport'!D2/POWER(2,30)),"##0.00\G"),
IF(('Raw catExport'!D2>=POWER(2,20)),
TEXT(('Raw catExport'!D2/POWER(2,20)),"##0.00\M"),
IF(('Raw catExport'!D2>=1024),
TEXT(('Raw catExport'!D2/1024),"##0.00\K"),
TEXT(('Raw catExport'!D2),"##0.00\B")
)
)
)

Tip: change 'Raw catExport' and !D1 to match where you are working in your inventory.

Condensed, readable and usable URLs via CONCATENATE and HYPERLINK functions

Problem

You find it frustrating to have long and complex URLs that:

  • take up too much horizontal screen space.
  • repeat useless data about the protocol, domain, parent folders.
  • do not link to the live content for easy referencing of production.

Example: 

  • http://www.domain.com/root-folder/interesting-folder/file1.html
  • http://www.domain.com/root-folder/interesting-folder/file2.html
  • http://www.domain.com/root-folder/interesting-folder/file3.html

versus

Solution

You will be working with three worksheets:

  1. The original worksheet from ContentWRX.
  2. A new worksheet with a complete copy of the original worksheet.
  3. An additional worksheet to store the part of the URL that you don’t want to see (but which a formula must reference).

These are Cleaned catExport, Raw catExport and URL in the example.

The process
  1. Create a copy of your export in a new worksheet.
  2. In both worksheets use Find and Replace to remove the generic parts of the URL that you’d like to hide, e.g., “http://www.domain.com/root-folder”.
  3. In a new, third, worksheet create the part of the URL that you removed in step two so that it can be referenced by a formula, i.e., paste “http://www.domain.com/root-folder” into cell A1.
  4. In the cloned worksheet use the following formula to create the concatenated hyperlink:

=HYPERLINK(CONCATENATE(URL!$A$1,'Raw catExport'!A2),'Raw catExport'!A2)

Tip: If your worksheet names are a single word or phrase then you don’t need to wrap them in 'single quotes' in the formula.

Find duplicates and junk via conditional formatting

Problem

ContentWRX’s crawl of your content has returned duplicate items or junk that you want to rapidly identify and exclude from your inventory.

Example: ../_edit type link suffixes for the Editors in your CMS:

  • ../root-folder/interesting-folder/file1.html
  • ../root-folder/interesting-folder/file1.html[su_highlight background=”#ffef99″]/_edit[/su_highlight]

Solution

  1. Identify a pattern of duplication or junk that you’d like to highlight or Filter, like /_edit.
  2. From the Home menu ribbon choose Conditional Formatting.
  3. Then select Highlight Cell Rules, Text that Contains… .
  4. Consider setting the beginning with, or ending with rules.
  5. Enter your pattern, like: ending with “[su_highlight background=”#ffef99”]/_edit[/su_highlight]”.
  6. Run the Conditional Formatting.
  7. Filter the column where the pattern occurs and review.

Tips:

  • Use Conditional Formatting to find duplicate items.
  • Use [su_highlight background=”#ffef99″]colour[/su_highlight] to flag items and to Filter them.

Make MIME content types understandable

Problem

ContentWRX returns full MIME content-type descriptions in Column C “Type”. These are hard to read and could confuse a lay audience.

Examples:

  • application/x-javascript
  • text/html
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • image/jpeg

Solution

  • Select a content type
  • use Find and Replace to create an understandable equivalent

Examples (from above):

  • Script
  • Page
  • Spreadsheet
  • Image

Tips:

  • Types marked “unknown” are often redirected items or dead links
  • use Filters to reduce rows of irrelevant content types

Summary

Let me know in the comments if you have any improvements or suggestions for working with ContentWRX Audit exports, or with content inventories and audits more generally. Likewise, if you need help with any of the advice in this blog post.

Further Reading

By Rik Williams

I write about how to collaborate to design simple, usable and inclusive information experiences that make the lives of customers easier. Read more in Categories and Tags.

Leave a Reply

Your email address will not be published. Required fields are marked *