Reinoud van Dalen

August 08, 2014

Using replacement characters in Sitecore the right way

A bit bold title but it seems this "issue" has not been solved properly. Like: I haven't been able to google a solution that feels right. So I figured it was time to take the dive myself and fix this once and for all.

For quite a long time now we like to take care of our page url's in such a "pretty" manner that it is appealing to human visitors and understandable for bots. A lot of the required features are already given to us, out of the box, by Sitecore. However, one issue keeps returning to me: replaceing spaces with dashes. I am not sure if it makes any difference to bots, but I have not met a client who likes the "20%" bits in their url's. Instead they want that replaced by dashes.

EncodeNameReplacements

Sitecore actually has something for this called: encodeNameReplacements in the web.config. There is a default list of replacements and you can add the replacement of spaces for dashes to it like so (line 11):

<!-- ENCODE NAME REPLACEMENTS
        Specifies text replacements to use when encoding special chars in friendly urls
-->
<encodeNameReplacements>
  <replace mode="on" find="&amp;" replaceWith=",-a-," />
  <replace mode="on" find="?" replaceWith=",-q-," />
  <replace mode="on" find="/" replaceWith=",-s-," />
  <replace mode="on" find="*" replaceWith=",-w-," />
  <replace mode="on" find="." replaceWith=",-d-," />
  <replace mode="on" find=":" replaceWith=",-c-," />
  <replace mode="on" find=" " replaceWith="-" />
</encodeNameReplacements>

So if I have an item called "Reinoud van Dalen" then (with some extra options configured) this will be translated to /reinoud-van-dalen.

But wait, what if my item name has a dash?

Now we run into a bit of a problem. I have an item called "Reinoud van Dalen-Developer", this is translated to /reinoud-van-dalen-developer and when Sitecore starts processing the path, it reverses the replacement and will look for an item called "reinoud van dalen developer" (case insensitive) which it cannot find.

The wrong way: InvalidItemNameChars

You could choose to deny the usage of dashes in item names but apart from situations where you would have to process existing items and remove or replace them, I think it's not the most elegant solution. If you do would like to opt for this approach then you can add the '-' character to the InvalidItemName setting in the web.config:

<setting name="InvalidItemNameChars" value="\/:?&quot;&lt;&gt;|[]-" />

The right way: overwriting the ItemResolver

If we do not want to limit the usage of dashes (or any other character/string you feel like adding in the replaceWith of the encodeNameReplacements) then we need to overwrite the ItemResolver. Why? Because ultimately the standard ItemResolver will take the url, decode it (using the replacements) and compare that to sitecore item names. This should be the other way around: take the url and compare that to encoded item names.

Below you can see the modified ItemResolver. I've added comments to explain how it works and, apart from some polishing, the only line's I've changed are 75, 107 and 117). You could also extend Sitecore's ItemResolver rather than overwrite but I chose to replace it because you possibly double the amount of time needed to resolve an item if you extend.

public class ItemResolver : HttpRequestProcessor
{
    /// <summary>
    /// Overwrite the normal ItemResolver so we can compare encoded item display(names) instead of decoded url/paths
    /// </summary>
    /// <param name="args"></param>
    public override void Process(HttpRequestArgs args)
    {
        Assert.ArgumentNotNull(args, "args");

        //return if item is already set or we have no database or itempath to work with
        if (Context.Item != null || Context.Database == null || args.Url.ItemPath.Length == 0) return;

        Profiler.StartOperation("Resolve current item.");

        //Exampe item to find has this name: "Reinoud van Dalen-Developer"
        //1. try to get item by path with decoded ItemPath (i.e: /sitecore/content/home/reinoud van dalen developer) 
        var path = MainUtil.DecodeName(args.Url.ItemPath);
        var result = args.GetItem(path);

        if (result == null)
        {
            //2. try to get item by path with (encoded) ItemPath (i.e: /sitecore/content/home/reinoud-van-dalen-developer)
            path = args.Url.ItemPath;
            result = args.GetItem(path);
        }

        if (result == null)
        {
            //3. try to get item by path with (encoded) LocalPath (i.e: /reinoud-van-dalen-developer)
            path = args.LocalPath;
            result = args.GetItem(path);
        }
        if (result == null)
        {
            //4. try to get item by path with decoded LocalPath (i.e: /reinoud van dalen developer)
            path = MainUtil.DecodeName(args.LocalPath);
            result = args.GetItem(path);
        }

        var site = Context.Site;
        var siteRootPath = site != null ? site.RootPath : string.Empty;

        if (result == null)
        {
            //5. try to get item by path with siteRootPath and LocalPath combined (i.e: /sitecore/content/reinoud-van-dalen-developer)
            path = FileUtil.MakePath(siteRootPath, args.LocalPath, '/');
            result = args.GetItem(path);
        }

        if (result == null)
        {
            //6. try to get item by path with decoded siteRootPath and LocalPath conbined (i.e: /sitecore/content/reinoud van dalen developer)
            path = MainUtil.DecodeName(FileUtil.MakePath(siteRootPath, args.LocalPath, '/'));
            result = args.GetItem(path);
        }

        //7. if all else fails then resolve by ItemPath and LocalPath for each segment, comparing to item (display)name
        if (result == null) result = ResolveUsingDisplayName(args);

        //8. still no luck? then fallback to the start item of the current site
        if (result == null && args.UseSiteStartPath && site != null) result = args.GetItem(site.StartPath);

        if (result != null) Tracer.Info("Current item is \"" + path + "\".");

        Context.Item = result;
        Profiler.EndOperation();
    }

    private Item GetChild(Item item, string itemName)
    {
        foreach (Item obj in item.Children)
        {
            //code adjustment: apply MainUtil.EncodeName on obj.DisplayName and obj.Name (so "Reinoud-van-Dalen-Developer is compared to "reinoud-van-dalen-developer")
            if (MainUtil.EncodeName(obj.DisplayName).Equals(itemName, StringComparison.OrdinalIgnoreCase) || MainUtil.EncodeName(obj.Name).Equals(itemName, StringComparison.OrdinalIgnoreCase))
                return obj;
        }
        return null;
    }

    private Item GetSubItem(string path, Item root)
    {
        var obj = root;
        var str = path;
        char[] chArray = { '/'};
        foreach (string itemName in str.Split(chArray))
        {
            if (itemName.Length != 0)
            {
                obj = GetChild(obj, itemName);
                if (obj == null)
                    return null;
            }
        }
        return obj;
    }

    private Item ResolveFullPath(HttpRequestArgs args)
    {
        var itemPath = args.Url.ItemPath;
        if (string.IsNullOrEmpty(itemPath) || itemPath[0] != 47) return null;
        var num = itemPath.IndexOf('/', 1);
        if (num < 0) return null;
        var root = ItemManager.GetItem(itemPath.Substring(0, num), Language.Current, Version.Latest, Context.Database, SecurityCheck.Disable);

        //code adjustment: remove MainUtil.DecodeName on itemPath.Substring(num)
        return root == null ? null : GetSubItem(itemPath.Substring(num), root);
    }

    private Item ResolveLocalPath(HttpRequestArgs args)
    {
        var site = Context.Site;
        if (site == null) return null;
        var root = ItemManager.GetItem(site.RootPath, Language.Current, Version.Latest, Context.Database, SecurityCheck.Disable);

        //code adjustment: remove MainUtil.DecodeName on args.LocalPath
        return root == null ? null : GetSubItem(args.LocalPath, root);
    }

    private Item ResolveUsingDisplayName(HttpRequestArgs args)
    {
        Assert.ArgumentNotNull(args, "args");
        Item obj;
        using (new SecurityDisabler())
        {
            obj = ResolveLocalPath(args) ?? ResolveFullPath(args);
            if (obj == null)
                return null;
        }
        return args.ApplySecurity(obj);
    }
}

Media urls

This solves the page url replacement issue. There is, however, one more problem: media urls. Media items will be given the same encoding but resolving the media item is not done with the ItemResolver but has it's own processor and ultimately tries to find an item by path and will not try to find a match per path segment (like the ResolveUsingDisplayName function in the ItemResolver). I think the reason for this is that it would hurt the performance, especially if you have several images on a page.

To overcome this problem I ended up adding an extra (configurable) replacement list especially for url's and applied the extra encoding/replacing in an extended LinkProvider plus the modified ItemResolver. This way the media url's were unaffected.

TAGS: sitecore linkprovider niceurls


Comments
Craig

Hello. My opinion would be to explain why you feel the wrong way is incorrect. Lots of devs and architects alike like to make things more complicated than necessary and 95% of the time from personal experience it's based on preference. I find a lot of people over engineer things simply to show off what they know. If you can't come up with a reason for not doing something other than "it's not the most elegant solution" then that is not a reason to do something different. Very rarely do developers in the real world who work on projects with deadlines have time to make things as elegant as possible. Simplicity is always king. I caution you against doing things simply because it's not "elegant". This leads you down a path that potentially makes your code hard to maintain.


Reinoud

Hi Craig, I understand what you're saying. Not trying to show off, but let me try to explain why: This particular issue is something I've encountered more than once and the question always starts with: why can't we make this work? And next we settle for using 'the wrong way' and I park the thought of figuring out a solution that does not prevent us from using dashes in item names. Most of the times we are half way down the project and need to remove all the dashes in item names because we cannot allow it anymore. The solution is rather simple, however, the particular methods are not virtual which means I had to copy a lot of code which is unchanged.


Reinoud van Dalen

Hi Anna, Good call, you are absolutely right. There is indeed a potential problem with items that result in the same url. A content editor or page editor warning would be nice for this (would be nice by default as well). In practice I find that the client receives the explanation for this situation rather well compared to explaining the inability of using dashes in item names.


Anna

Hi Reinoud, it's good to know there is another approach to solve this problem because I always used tweaking InvalidItemNameChars and ItemNameValidation settings. But overriding the ItemResolver leads to another potential problem. In your case items with names "Reinoud van Dalen-Developer" and "Reinoud van Dalen Developer" have the same URL "/reinoud-van-dalen-developer" and you are not able to access both items. Now imagine these items are news and there is a thousand of them or they are hidden in a bucket... So both approaches force a developer to explain something to content managers, either "why they cannot use dashes in item names" or "why Sitecore thinks these two items have the same name". It is up to you to choose :)


W. Paap

A nice addition to this solution is to prevent duplicate item names, for example: https://sitecoreclimber.wordpress.com/2013/03/05/prevent-duplicates-items/ and http://sitecoreblog.patelyogesh.in/2015/10/dealing-with-duplicate-item-names-in.html


Betty

It surprises me that there's not another field on items to store their url rather than using the name. that would feel much cleaner to me and would be a lot faster for page loads.


Esben

If anyone stumbles upon this, standard Sitecore 8.1 now seems to replace spaces by dashes by default. That is, the following line has been added to Sitecore.config: replace mode="on" find=" " replaceWith="-" It also works with item names containing dashes.