ContentAudit: first class site crawling and SEO auditing for Umbraco

At the end of 2024 I was scrolling through Bluesky posts tagged with 'Umbraco' and came upon this thread:

Hey content people!

Any one recommend any good tools to help automate content audits / website inventory?

Screaming Frog is on the list

Interested in hearing about any other.

I know, jumped straight to asking for software choices (be assured we are thinking of everything else too!)

— Heledd Quaeck (@helivans.bsky.social) December 4, 2024 at 7:13 PM

My interest was sparked because of a conversation I had had with Matt Brailsford about whether the Umbraco Sustainability package could do whole site crawls for headless Umbraco sites, especially those with heavy use of virtual nodes.

I had used Screaming Frog in the past and wondered if it was possible to bring that kind of level of crawling and auditing to a first party tool built into Umbraco. Being closer to the content source means that you can get insights that aren't always accessible from something third party.

Introducing ContentAudit

ContentAudit is my new package that enables first class site crawling and SEO auditing inside Umbraco.

This package has been nearly 6 months in the making on and off and I'm so excited to finally have it out in beta!

The ContentAudit dashboard, showing a list of the number of pages crawl, a site score health dial and a list of top issues.

The ContentAudit dashboard shows a list of the number of pages crawled, a site score health dial and a list of top issues.

Getting Started

ContentAudit is on NuGet and can be installed from the command line or NuGet package manager.

dotnet add package Umbraco.Community.ContentAudit --version 1.0.0-beta
NuGet\Install-Package Umbraco.Community.ContentAudit -Version 1.0.0-beta

Once installed a new section called Audit will appear in the section menu. Once you're in the dashboard, you will be prompted to run a new audit. This may take some time depending on the size of your site, and it is recommended to try this on a staging or preproduction site before running it in production. A site crawl can be intensive.

When your crawl is completed, each page will be populated with information about your site's SEO information and other important crawl factors.

An example of some of the data surfaced by ContentAudit. In this instance, it's Core Web Vitals scoring.

An example of some of the data surfaced by ContentAudit. In this instance, it's Core Web Vitals scoring.

Data can also be seen outside of the section when editing content pages. This means content editors can get a context aware view of the pages they're working on as they're going.

A workspace/Content App showing audit details alongside the content editing experience.

A workspace/Content App showing audit details alongside the content editing experience.

Issue Tracking

There are a number of issues built in to ContentAudit to flag key SEO problems such as missing headings, meta descriptions, orphaned pages and more. By going to the Issues page you can see a breakdown of all of them, and the number of pages affected. Issues are also fully extendable in code, which is covered below.

A list of issues within the ContentAudit section.

A list of issues within the ContentAudit section.

A breakdown of pages with missing H2s as seen in the Issues page

A breakdown of pages with missing H2s as seen in the Issues page

Extending

The core concept of Issues are completely extendable by developers, by creating a class that inherits IAuditPageIssue or IAuditImageIssue. These automatically get picked up on site startup and don't need to be composed.

Here's an example of adding one to check for a custom keyword being in use on every page:

public class KeywordIssue : IAuditPageIssue
{
    public Guid Id => new Guid("cccb0159-f35f-45be-a32d-b2f7832eb123");

    public string Name => "Umbraco keyword missing";

    public string Description => "Pages that don't feature the keyword 'Umbraco'";

    public string Category => "Content";

    public IssueType Type => IssueType.Opportunity;

    public IssuePriority Priority => IssuePriority.Low;

    public IEnumerable<AuditIssueProperty> ExposedProperties => default;

    public IEnumerable<PageAnalysisDto> CheckPages(IEnumerable<PageAnalysisDto> pages)
    {
        return pages.Where(x => x.PageData.StatusCode == 200 && x.ContentAnalysis.KeywordDensity?.Any(y => y.Key.ToLower() == "umbraco") == false);
    }
}

There's more information on extending Issues on the GitHub documentation.

New Features

Is there something missing from your favourite crawling tool? Need to be able to do keyword research, or link with Google Search Console?

Start a discussion on the repo, I would love to know what to work on next!


This has been a fun project to work on and I'm so glad to have it out (even if it's only in beta!)