Amplify a Wagtail/Django site - Validation (Part 2)

The driving force behind creating AMP pages for a content website is to appear at the top of Google search results. AMP pages appear in search results with a lightning bolt icon. Page speed is also one of the important factors in the ranking of search results. As AMP pages are very fast due to its strict specification, it is common to see AMP version of a page appearing at the top. AMP pages with structured data also appear in the Top stories carousel, host carousel of rich results, Visual stories, and rich results in mobile Search results.

This blog post is the second part of the series: Amplify a Wagtail/Django site.

AMP pages in google search

In the above gif, notice the ⚡ icon next to search results. Most of the results are AMP pages and only the last search result is a normal page. It is a highly likely case that mobile users only click the top links. Therefore AMP page becomes a necessity for online content publishers who publish 100s of articles daily.

How to make AMP page discoverable to Google search?

The initial requirement is to have a link tag in the non-AMP page with rel="amphtml" and href contains URL of AMP version.

Add the following to the non-AMP page:

<link rel="amphtml" href="https://www.example.com/url/to/amp/document.html">

And this to the AMP page:

<link rel="canonical" href="https://www.example.com/url/to/non-amp/document.html">

The important requirement is the AMP page should be valid. When a mobile user clicks on an AMP search result, Google search retrieves the page from the Google AMP cache. Only valid AMP pages are cached. Valid AMP pages are preloaded efficiently and stored in Google cache to improve the user experience.

Render valid AMP pages in Wagtail/Django powered site

Valid means AMP page should not contain tags which are restricted as per AMP specification. AMP HTML is a subset of HTML, it puts some restrictions on the full set of tags and functionality available through HTML. For performance, AMP HTML does not allow author written JavaScript beyond what is provided through the custom elements.

In Wagtail powered site, AMP templates should not use restricted tags. For example, <img> tag should be replaced with <amp-img>. <img> does not have an end tag. However, <amp-img> does have an end tag </amp-img>.

  • Change tags for fields whose properties can be accessed: It is easy to change tags in cases where we can directly access field properties. Wagtail image tag render <img> tag so we can directly access image properties to create <amp-img> tag:
<!-- Template: https://github.com/Parbhat/wagtail-amp/blob/master/blog/templates/blog/blog_page_amp.html -->

{% for item in page.gallery_images.all %}
    <div>
        {% image item.image fill-600x500 as gallery_image %}
        <amp-img alt="{{ gallery_image.alt }}"
          src="{{ gallery_image.url }}"
          width="{{ gallery_image.width }}"
          height="{{ gallery_image.height }}"
          layout="responsive">
        </amp-img>
        <p>{{ item.caption }}</p>
    </div>
{% endfor %}
  • Change tags in Richtext and Streamfield: The main feature of a CMS is a WYSIWYG HTML Editor to create content. Wagtail also uses Draftail in Richtext field and Streamfield's Richtext block. As the content created with these fields is stored as HTML, we do not have direct access to images etc. To create a valid AMP page, we can use a Python package called Beautiful Soup to modify the HTML.
from bs4 import BeautifulSoup


def amplify_html(rendered_html):
    bs = BeautifulSoup(rendered_html)

    for image in bs.find_all('img', attrs={'src': True}):
        amp_img = bs.new_tag(
            'amp-img', src=image.get("src"),
            alt=image.get("alt", ""),
            layout="responsive",
            width=image.get("width", 550),
            height=image.get("height", 368)
        )
        amp_img['class'] = image.get("class", "")
        image.replace_with(amp_img)

    # ...
    # Complete code: https://github.com/Parbhat/wagtail-amp/blob/master/blog/utils.py

    return bs.decode_contents()

In the first part of the series, we learned the Wagtail serving mechanism. Serve method on a Page model is responsible for sending HTML response. Body field is a Streamfield and we can get the rendered HTML using the __html__() method. The body field HTML is non-AMP, it can contain tags which are restricted in AMP pages. Therefore we get the AMP HTML using amplify_html method and add it in the context.

class AmpBlogPage(Page):
    # ...
    # Complete code: https://github.com/Parbhat/wagtail-amp/blob/8abd9d6b0ee36fe2183167ad6c5f4556dd825590/blog/models.py#L92

    def get_template_amp(self, request, *args, **kwargs):
        return 'blog/amp_blog_page_amp.html'

    def serve(self, request, *args, **kwargs):
        is_amp_request = kwargs.get('is_amp_request')
        if is_amp_request:
            kwargs.pop('is_amp_request')
            context = self.get_context(request, *args, **kwargs)
            body_html = self.body.__html__()
            body_amp_html = amplify_html(body_html)
            context['body_amp_html'] = mark_safe(body_amp_html)

            return TemplateResponse(
                request,
                self.get_template_amp(request, *args, **kwargs),
                context
            )
        return super(AmpBlogPage, self).serve(request, *args, **kwargs)

    # ...

body_amp_html is used in template instead of page.body.

<article>
    {{ body_amp_html }}
</article>

Similarly we can get AMP HTML for Richtext field. In Richtext field, we need to use expand_db_html function to expand database-representation HTML into proper HTML.

We can also use a separate field for AMP content and use publish hook to fill it. Then we do not have to parse and modify HTML in the request-response cycle. But we can avoid it. As the process is fast and also pages are served from Google cache in Google search. Each time a user accesses AMP content from the cache, the content is automatically updated, and the updated version is served to the next user once the content has been cached.

For streamfield, there is another option:

If your project uses Streamfield's Richtext block without the image, embed feature then you can skip the beautiful soup method. It is recommended to use Richtext field with a small set of features and use Image chooser block. New Streamfield blocks should be created depending on the UI requirements. We only need to add is_amp_request in context.

    def serve(self, request, *args, **kwargs):
        is_amp_request = kwargs.get('is_amp_request')
        if is_amp_request:
            kwargs.pop('is_amp_request')
            context = self.get_context(request, *args, **kwargs)
            context['is_amp_request'] = True

            return TemplateResponse(
                request,
                self.get_template_amp(request, *args, **kwargs),
                context
            )
        return super(AmpBlogPage, self).serve(request, *args, **kwargs)
<!-- Template for streamfield block like Service block -->
{% if is_amp_request %}
    {% image value.image fill-400x300 as amp_img %}
    <amp-img alt="{{ amp_img.alt }}"
      src="{{ amp_img.url }}"
      width="{{ amp_img.width }}"
      height="{{ amp_img.height }}"
      layout="responsive">
    </amp-img>
{% else %}
    {% image value.image fill-400x300 %}
{% endif %}

With these small changes, our AMP page is ready to fly high with Wagtail. In the next part of the series, we will add CORS headers required in amp-list and amp-form components.

Thanks for reading. You can also follow the Github repo. AMP-lify!

Useful links:

  1. Validate AMP pages
  2. Make your page discoverable
  3. Google AMP Cache
  4. Google search
  5. Understand how AMP looks in search results
  6. AMP HTML Specification
  7. How AMP pages are cached
By @Parbhat Puri in
Tags : #wagtail, #django, #opensource, #web, #Programming, #code,

Comments !