Sending Markdown-based mailings that look pretty

Sending nicely formatted emails that are rendered correctly in (all) mail clients is a tedious task. To avoid spending a lot of time manually drafting emails through the provided Microsoft Outlook mail client, I wanted to see whether I would be able to parse Markdown and render it into a good-looking mailing leveraging the Markdown standard(1). All without the need to manually edit the output before sending.

Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. Created by John Gruber in 2004, Markdown is now one of the world’s most popular markup languages.

To start of let's list a couple of requirements for my specific use-case.

Requirements:

The mail content is written in Markdown
The mail output contains tables
The mail output contains images (including SVG) (1)
The resulting mail should be rendered nicely in both (mobile) Gmail and Microsoft Outlook (this covers >99% of the target audience)
The mail should be standalone (e.g. no loading of images of the web required)
The mail should be send by a script by using a mail host (2)

SVG is an XML-based vector image format for defining two-dimensional graphics, having support for interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium since 1999. SVG images are defined in a vector graphics format and stored in XML text files.
A mail host resolves email addresses and reroutes mail within your domain.

To start of we'll breakdown the challenges is separate bite-sized pieces. First thing to note is that formatted mails often use HTML-formatting. What makes it challenging to end-up with properly formatted mails on various different clients is that the clients all support a different (small) subset of the HTML standard(1) and CSS(2).

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.
Cascading Style Sheets (CSS) is a style sheet language used for specifying the presentation and styling of a document written in a markup language such as HTML or XML (including XML dialects such as SVG, MathML or XHTML). CSS is a cornerstone technology of the World Wide Web, alongside HTML and JavaScript.

After some initial analysis, I decided that Python(1) would be a suitable programming / scripting language to realise the mentioned requirements. In particular, due to the extensive, publicly available, package repository.

Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.

Challenge 1: Markdown to HTML¶

Parsing a Markdown file (.md) to HTML requires us to first read in the file from disk using Python. This can be realised, quite simply using the following snippet.

def read_markdown_file(file_path):
    with open(file_path, "r", encoding="utf-8") as file:
        return file.read()

This function opens the file for reading by using the UTF-8(1) encoding, the location of the file is specified using the variable file_path. Next, it read the file contents, and returns those.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit. UTF-8 is capable of encoding all 1,112,064 valid Unicode code points using one to four one-byte code units.

To convert the contents returned by this function into HTML, we'll use a library. For converting Markdown to HTML multiple libraries exist. The most popular once are markdown and markdown2. For this use-case, I decided to go for markdown2.

Following the instructions in the readme, we obtain the following code.

# Convert Markdown to HTML
converter = markdown2.Markdown()
html_content = converter.convert(markdown_content)  # (1)

markdown_content is the read file using the function call markdown_content = read_markdown_file(file_path).

We first construct a converter instance, next we call the convert function and supply it with the Markdown-file we just read. The result is stored in html_content.

Challenge 2: handling tables and images¶

Now that we managed to convert out Markdown to HTML, one will observe that the paths to the images in the Markdown are also present in the HTML file. However, tables are not properly handled.

In Markdown you can specify tables using the following formatting
(for reference see the extended syntax on markdownguide.org).

| Syntax    | Description |
| --------- | ----------- |
| Header    | Title       |
| Paragraph | Text        |

Which renders to something like

Syntax	Description
Header	Title
Paragraph	Text

Luckily, it is rather simple to support such tables in markdown2 by supplying extras in the converter.

import markdown2

converter = markdown2.Markdown(
    extras=["tables"]
)
html_content = converter.convert(markdown_content)

Challenge 3: SVG images¶

SVG images are generally not handled properly by mail clients. After some googling, I discovered this guide which nicely illustrates the (lack of) compliance of various mail clients and various ways of (not) embedding SVG images in an email.

Considering the standalone requirement and the required supported mail clients. It is clear that embedding the original SVGs is too risky.

How about converting the SVG images into e.g. PNG format?(1)

Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format —unofficially, the initials PNG stood for the recursive acronym "PNG's not GIF".

To do so, we first need to obtain the set of all images that are referenced in the Markdown file. I came up with the following.

import re

def extract_image_urls(markdown_content):
    # Regular expression to find image URLs in Markdown and HTML (1)
    image_pattern = re.compile(
        r'!\[.*?\]\((.*?)\)|<img.*?src=["\'](.*?)["\'].*?>', re.DOTALL
    )

    # Combine matches from both patterns
    return set(
        [
            match.group(1) or match.group(2)
            for match in image_pattern.finditer(markdown_content)
        ]
    )

The pattern might seem quite complex. I asked ChatGPT 3.5 to explain it for you

This regular expression appears to be designed to match two different patterns commonly used to represent images or links in Markdown and HTML. Let's break down the two parts of the regex:
1. Markdown Image Syntax: !\[.*?\]\((.*?)\)
- !: Matches the exclamation mark, often used to indicate an image in Markdown.
- \[(.*?)\]: Matches any text enclosed in square brackets. The .*? inside the brackets is a non-greedy match for any characters, allowing for the inclusion of text inside the square brackets. The matched text inside the brackets is usually the alt text for an image in Markdown.
- \((.*?)\): Matches any text enclosed in parentheses. Again, the .*? is used for non-greedy matching, capturing the URL inside the parentheses.
1. HTML Image Tag: <img.*?src=["\'](.*?)["\'].*?>
- <img: Matches the opening tag of an HTML image element.
- .*?: Matches any characters (non-greedy) between the opening tag and the src attribute.
- src=["\'](.*?)["\']: Matches the src attribute, which contains the URL of the image. The URL can be enclosed in either single or double quotes.
- .*?>: Matches any remaining characters (non-greedy) until the closing angle bracket of the image tag.
Combining the two patterns using the | (pipe) symbol means that the regex will match either the Markdown image syntax or the HTML image tag. The captured groups (indicated by parentheses) will contain the alt text and URL of the image.

This function utilises regular expressions to obtain all image paths mentioned in the Markdown content. Important to note that in markdown it is possible to embed images in two ways

Using syntax ![alt text](path/to/image.ext)
Using plain HTML <img alt="alt text" src="path/to/image.ext"/>

The above regular expression captures both (press the annotation after HTML in the code block to fully grasp it). After capturing all image paths, we put them in a set to prevent duplicates.

We can then convert all listed .svg-files into .png-files using Python package cairosvg.

import cairosvg
import tempfile

def svg_to_png(svg_path, scale=2.0):
    # Create a temporary file for the PNG image
    png_fd, png_path = tempfile.mkstemp(suffix=".png")
    os.close(png_fd)

    # Convert SVG to PNG using cairosvg
    cairosvg.svg2png(url=svg_path, write_to=png_path, scale=scale)

    return png_path

svg_to_png_paths = dict()
for image_path in image_paths:
    if image_path.lower().endswith(".svg"):
        # Convert SVG to PNG and attach
        png_path = svg_to_png(image_path)
        svg_to_png_paths[image_path] = png_path

The above code generates PNG files from the SVG files and stores the on a temporary location. To prevent clogging our system we can clean the created PNG files using the following, once they mail is sent.

import os

# Clean up temporary PNG files
for png_path in svg_to_png_paths.values():
    os.remove(png_path)

This removes the generated images from our disk.

Challenge 4: drafting the mail and attaching the images¶

To create a mail message we make use of the built-in Python email library.

from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

SENDER = "sender@example.com"
TO = ["to-user@example.com"]
CC = ["cc-user@example.com"]

# Create a multipart message
message = MIMEMultipart()
message["From"] = SENDER
message["To"] = ", ".join(TO)
message["Cc"] = ", ".join(CC)
message["Subject"] = "Example email"

To include the images we call

# Attach images (PNG or converted from SVG)
svg_to_png_paths = dict()
unmodified_paths = dict()
for image_path in image_paths:
    if image_path.lower().endswith(".svg"):
        # Convert SVG to PNG and attach
        png_path = svg_to_png(image_path)
        svg_to_png_paths[image_path] = png_path
        with open(png_path, "rb") as png_file:
            img_data = png_file.read()
            img = MIMEImage(img_data, name=os.path.basename(png_path))
            message.attach(img)
    else:
        # Attach other image formats directly
        with open(image_path, "rb") as image_file:
            img_data = image_file.read()
            img = MIMEImage(img_data, name=os.path.basename(image_path))
            message.attach(img)
            unmodified_paths[image_path] = image_path

This code converts (if required) and attaches the images to the message.

Next, we note that attachments in mails do not support directory structures (i.e. folders). This means that all files (images) are attached using their basename (filename) only. To ensure that images are nicely discovered in our HTML, we need to modify the paths there as well.

import os

def update_html_paths(html_content, paths):
    # Update HTML content to reference the image paths
    for source, destination in paths.items():
        html_content = html_content.replace(
            source, os.path.basename(destination)
        )
    return html_content

# Update HTML content to reference PNG paths
html_content = update_html_paths(
    html_content, {**svg_to_png_paths, **unmodified_paths}
)

This all works great in Outlook, however Gmail is a bit troublesome here and requires additional headers for images to be displayed correctly. We can achieve this by adding the Content-ID-header and prepending the filename (src in the <img>-tag) with cid:. We modify the above using

            img_data = png_file.read()
            img.add_header("Content-ID", f"<{os.path.basename(image_path)}>")
            message.attach(img)
            ⋮
            img = MIMEImage(img_data, name=os.path.basename(image_path))
            img.add_header("Content-ID", f"<{os.path.basename(image_path)}>")
            message.attach(img)
            ⋮
        html_content = html_content.replace(
            source, "cid:" + os.path.basename(destination)
        )

Once this is attached, we can confidently attach the HTML content using, the code below.

# Attach HTML content
message.attach(MIMEText(html_content, "html"))

Challenge 5: sending the mail¶

Now that we're all set and got our draft message in place, it is time to send it!

Sending mail through Python is actually quite easy. It gets even simpler, since we can use a mail host without the need to provide credentials for authentication (as long as we are mailing from a whitelisted IP address). To accomplish this we use the built-in Python smtplib library.

import smtplib

HOST = "mailhost.example.com"
PORT = 25

# Connect to the SMTP server and send message
with smtplib.SMTP(HOST, PORT) as server:
    server.sendmail(from_addr=SENDER, to_addrs=TO + CC, msg=message.as_string())

Observe that we are combining the to and cc addresses as mailservers don't distinguish between them.

Bonus 1: make it look nice(r) in Outlook and Gmail¶

To achieve a visually more appealing mail we can decorate our HTML a bit with some additional formatting and styling.

The following snippet will do

custom_styling = """
    <style>
        table {
            border-collapse: collapse;
        }
        table table tr th, table table tr td {
            border: 1px solid #ddd;;
            padding: 8px;
            text-align: left;
        }
        th {
            background-color: #f2f2f2;
        }
    </style>
    """

# For nice rendering in mail client
html_prefix = '<table border="0" cellpadding="0" cellspacing="0" width="100%" style="border-collapse: collapse"><tr><td></td><td width="700">'
html_postfix = "</td><td></td></tr></table>"

html_content = custom_styling + html_prefix + html_content + html_postfix

To make the tables look just a bit nicer, I added some styling like borders and background-colors. Do observe the double table table listing, this is no mistake, it prevents the addition of border on the outer table, which is merely used for formatting the mail-content in a fixed width.

To the frustration of many, layouts are a hell-of-a-job to get properly in-place in mail clients, as most modern ways of building a layout (e.g. for websites) are unsupported. Based on a Stack Overflow answer, I constructed the following fixed-width HTML-layout that works in both Outlook and Gmail. I picked a width of 700px, as most modern devices will be abe to render 700px width on the screen easily. Also various sources mention the optimal width of mails to be between 600px and 700px.

Bonus 2: extracting a subject¶

Coming up with a proper subject (in an automated manner) can be quite a challenge. To simplify this task, I decided to use the first heading that I could find as the mail subject. I used the following code

def extract_first_heading(markdown_content):
    # Regular expression to find the first heading in Markdown
    heading_pattern = re.compile(r"^\s*#{1,6}\s+(.*)$", re.MULTILINE)

    # Find the first heading match
    match = heading_pattern.search(markdown_content)

    # Return the heading text or None if not found
    return match.group(1) if match else None

Note that this regular expression is less complex. It matches headings in Markdown, which are always prepended by one to six #-characters followed by a space.

Conclusion¶

Drafting nicely formatted emails is still a challenge today. However, automating the process and adhering to the layout once it's properly tested are key to success. Through this blog, I hoped to clarify a bit on how to automate tedious (often manual) jobs by providing an example.

For the sake of completeness, I dropped the final script on GitHub.