Wilson Mar bio photo

Wilson Mar

Hello. Hire me!

Email me Calendar Skype call 310 320-7878

LinkedIn Twitter Gitter Google+ Youtube

Github Stackoverflow Pinterest

Yes, it’s a round-trip ticket


Overview

This post is about converting existing HTML into markdown text in a file like README.md.

I wrote this because I haven’t seen an approach like this described.

I’m having to convert hundreds of pages I’ve written in HTML since the 90’s.

Let me help you with this. Call me!

Why Markdown?

Many non-technical writers prefer writing Markdown text instead of using the mouse-enabled Microsoft Word. They say writing pure text allows them to keep their fingers firmly planted on the keyboard even as they apply formatting on the fly. Being able to format using text codes means they don’t have to stop typing or think about anything else to apply styles.

This tutorial is for such people.

Automatic conversion

You can copy HTML and paste into Dom Christie’s website for conversion to Markdown:

http://domchristie.github.io/to-markdown

Ordered lists

My favorite feature of Markdown is it automatically ordered numbers in lists!

We can begin all items of unordered lists with a 0.

1. First item.
0. Second item.
9. Third item.

The coding above Markdown renders correctly as 1,2,3.

Indention

In order for numbering to continue, all lines must be indented.

Heading lines can be indented.

Use 3 spaces in front of 3 backticks.

4 or more back-ticks is a signal to highlight the sentence in a box, not to indent.

Also, Liquid markdown does not recognize indention.

PROTIP: A workaround if you are not able to get automatic numbering: code the numbering yourself.

To make Markdown interpret a paragraph starting with a number as a list, put a left-slash in front of the dot, as in:

1492\. That was the year.

Line breaks

Both styles of line break tags result in a new line (without a blank line in between):

the XHTML style:

Hello<br />there

or HTML-style tags:

Hello<br>there

Paragraphs

One reason Markdown text is easier to write than HTML is there is no need for <p> to force a blank line.

Just a blank line will do.

One can do a mass change of <p> in a text editor.

Remember to clean up ending </p> tags.

Bulk change HTML to Markdown programs

You can specify a URL to a HTML file:

It returns a page of Markdown text you can copy and paste to a Markdown file.

The author of that site provides his Python program at:

Download and run the program using this syntax (assuming Pythong is installed):

   chmod a+x html2text.py ; ./html2text.py erlang.html

PROTIP: Automatic approaches today are usually too automatic, converting what is better left in HTML.

Unordered Lists

CAUTION: Even though HTML can be written or pasted into markdown (.md) files, HTML must be more correct than HTML read by internet browsers.

  • There must be a blank line before <ul> or <ol>.

  • For every <li> there needs to be a </li> or the rendering goes wacky.

  • There must be a blank line after anchor tags <a name=... and a heading text line.

PROTIP: Markdown recognizes different characters to parse into lists:

* Asterisk

+ plus sign

- minus sign

render as:

  • Asterisk

  • plus sign

  • minus sign

Special characters

Markdown treats these characters as ordinary text if there is backslash escape character in front of them:

  • \\ backslash itself
  • \` backtick
  • \* asterisk
  • \_ underscore
  • \{ \} curly braces
  • \[ \] square brackets
  • \( \) parentheses
  • \# hash mark
  • \+ plus sign
  • \- minus sign (hyphen)
  • \. dot
  • \! exclamation mark

PROTIP: If a URL contains attributes, convert & (ampersand)

Another aspect where it would be helpful to use tools is conversion of some special characters that Markdown converts into escape entities that begin with an & (ampersand),

  • < (less than) is turned into &lt;

  • > (greater than) is turned into &gt; because that’s used to signify block quotes in Markdown.

  • the ampersand itself turns to &amp;, as in link URLs.

Headings

Instead of the opening <h2> and such tags, replace with ## (called Atx-style headers).

Markdown recognizes up to 6 hash characters for 6 levels.

The ending ‘##’ character is optional. It can be any number of characters.

Tables

Alternately, Setext-style headers are specified (“underlined”) by a series of equal signs (for first-level headers) and dashes (for second-level headers):


First-level H1 headers
=============

Second-level H2 headers
-------------

Tables in HTML

HTML tables renders well from within Markdown text document.

However, some HTML tables were used in the early days of the internet were used to format an entire page. Such coding would need surgery to look well since tables are now intended to fit into a text column.

Bold and italics

CAUTION: Markdown coding are not processed within HTML tables.

So within tables continue to bold with

<strong>emphasized</strong> rather than Markdown __emphasized__ or **emphasized**

which renders as:

emphasized rather than Markdown emphasized or emphasized

Continue to italicize with:

<em>italicized</em> rather than Markdown _italicized_ or *italicized*

which renders as:

italicized rather than Markdown italicized or italicized

Tools?

To see your markdown turn into HTML, use this online tool:

The easiest way to convert HTML to Markdown text is to use Aaron Swartz’s

My experience is that we’ll need to pretty much go through each line to make it look good in Markdown text.

PROTIP: Keep coding HTML to link to external sites and images.

Example of HTML:

<a taget="_blank" title="hello" href="http://wilsonmar.github.io/">my site</a>

The biggest hassle with converting to Markdown text from HTML coding is that Markdown reverses the order of text and links.

 [mysite](http://wilsonmar.github.io/)

The same goest for the alternate “automatic” format Markdown offers to link:

<http://wilsonmar.github.io>

I’m reluctant to put external links in Markdown because they open in the same window, causing my site to lose visitors to that site.

![mysite logo](http://wilsonmar.github.io/favicon.png/ "optional title")

Notice that links to images would have an exclaimation point in front.

Markdown currently has no syntax for specifying the dimensions of an image.

To embed a YouTube video, use an HTML iframe.

<iframe width="560" height="315" src="https://www.youtube.com/embed/Onv9nhPIBp0" frameborder="0" allowfullscreen> </iframe>

To specify starting the video at a specific time (1 minute 2 seconds), use a link such as:

<a target="_blank" href="https://www.youtube.com/watch?v=Onv9nhPIBp0&t=1m2s">Link to YouTube</a>.

Horizontal rule

A line going across the page in HTML is:

<hr />

Blockquotes in HTML

Markdown ignores the HTML <blockquote> tag. So this appear as if it was not surrounded by the tag:

<block>
This is a block quote.
</block>

Different Parsers

The trouble with Markdown code is that different parsers render them differently into HTML.

In March, 2016 GitHub switched to the Kramdown parser which claims to incorporate the capabilities of other parsers:

Liquid Markdown Syntax

Markdown text in GitHub recognizes Liquid syntax as defined in:

This coding would process html as such between a set of Liquid {% tag markers:


{% highlight html %}
Hello
{% endhighlight %}

Liquid output markup can also be specified between two curly braces, such as:

{{ page.heading | upcase | truncate: 8 }}

The page.heading refers to the heading variable specified in the front matter at the top of the file.

To display Liquid markup in documentation:


{% highlight html %}{% raw %}
{{ page.heading | upcase | truncate: 8 }}
{% endraw %}{% endhighlight %}

In fact, Liquid is a rather (simple yet complete) programming language on its own right, with if/then/else, for loops, etc. The home page for Liquid template language (written in Ruby):

JavaScript

What if we pasted JavaScript (wrapped between <script> tags) in Markdown?

Footnotes

This incorporates the thorough detail about markdown coding at:

A discussion forum about markdown is at:

References

More on front-end styling

This is one of several topics:

  1. Text Editors
  2. Markdown text for GitHub from HTML
  3. 508 Accessibility

  4. JAM Stack Website Project Plan
  5. HTTP/2 Transition Project Plan
  6. Static websites
  7. Jekyll Site Development

  8. Website styles
  9. Website Styling

  10. Email from website
  11. Search within Hyde format Jekyll websites
  12. Windows Tile Pin Picture to Website Feed

  13. Data Visualization using Tableau