LaTeX to HTML

This is a little about mathematics, and a little about writing for the web, but mostly about the nuts and bolts of putting mathematics on the web. I want to record how, mainly with the pandoc program, I have converted some mathematics from a LaTeX file into html. Like “Computer Recovery” then, this post is a laboratory notebook.

The mathematics is a proof of Dirichlet’s 1837 theorem on primes in arithmetic progressions. This is the theorem that, if to some number you keep adding a number that is prime to it, there will be no end to the primes that you encounter in this way.

For some reason, I wanted to learn the proof. Maybe this had to do with having given courses on the Prime Number Theorem of 1896 at the Nesin Mathematics Village, but not having been able to teach there this summer, owing to the Covid pandemic. Dirichlet’s theorem could be part of a course at the Village.

I read the proof in Landau’s Elementary Number Theory, originally published in 1927, ninety years after Dedekind’s theorem. I wrote out the argument, according to my understanding. If you want to read what I wrote, there are:

  • a pdf file (24 pages, size A5) based on the LaTeX file that I composed;

  • an html file derived from my LaTeX file by means of the pandoc program.

Recently I encountered a page of links to over two hundred expository articles by a mathematician. I looked at one article, and I wondered what kind of audience would both

  • need to be told that an automorphism of a field is a bijective homomorphism from the field to itself,

  • already know the field of p-adic numbers.

I might have thought somebody who knew the p-adic numbers would also know something of Galois theory; but maybe not.

In my own article on Dirichlet’s theorem, I had already given accounts of

  • what the reader should know,

  • what I know as an amateur of number theory.

I become increasingly aware of how webpages can be visited from anywhere, although their composers seem unaware of this. If you are a newspaper, what city are you in? If a university, what state or province? If a business, what country? If you are a blog, what are you trying to do, and how can your visitors decide whether to spend time with it?

I have tried to make my own “About” page useful in this way.

I wrote my article on Dirichlet’s theorem, to satisfy my curiosity. Then I remembered posting on this blog my memoir of life on a farm. I had obtained the html file from a LaTeX file using pandoc, and I had been pleased with the results. Notably, pandoc had kept my footnotes as such: they were at the end of the same html file.

I decided to try to convert the Dirichlet article to html.

WordPress allows the embedding of TeX and LaTeX code, but converts the code to images. The tex4ht program does this as well. I wanted to avoid images.

The pandoc program does not create images, but tries to express mathematics (along with everything else) as text. However, there are mathematical expressions that LaTeX accommodates, but html does not, or not so well. The pandoc program leaves those untouched.

I therefore edited my original LaTeX file, to turn all of the mathematics into something that pandoc could interpret. I tried to make it as easy as possible to switch between a LaTeX file as such—a file to be compiled by the latex program—and a file to be converted to html by pandoc.

Here is what I did.

  • Apparently pandoc can handle the commands that you define, even with arguments; but not default arguments. For example, my

    \newcommand{\Zmod}[1][k]{\mathbb Z/#1\mathbb Z}

    didn’t work until I removed [k].

  • I like the compactitem and compactenum environments of the paralist package, but apparently pandoc does not recognize these, so I switched back to itemize and enumerate.

  • Fractions are the big challenge. I used

    \renewcommand{\frac}[2]{#1/#2}.

    One will then need to use parentheses if a numerator or denominator is a sum; but I had this problem in only one case. The reader still has to understand a/bc as meaning a/(bc), although I have allowed (a/b)c to appear as a/bc (this happens when c is a summation with ∑).

  • Summations themselves are a problem; I rewrote each \sum_{j=i}^{n} as \sum_{i\leq j\leq n}. (One could also define a new command with three arguments here.)

  • For one use, I defined:

    \DeclareMathOperator{\nCk}{C}
    \renewcommand{\binom}[2]{\nCk(#1,#2)}

  • None of the environments align, gather, and multline for displaying more than one line of mathematics together gets interpreted by pandoc; therefore I put each line to be displayed into its own equation environment.

  • There is a similar problem with the cases environment; I recast using itemize.

  • Apparently pandoc cannot handle a negated symbol such as \not\equiv, so I used \nequiv from txfonts.

  • I rewrote \pmod{#1} as \;(\text{mod }#1).

Doing that much gave a file that pandoc would render as pure html. However, there were remaining issues.

The program was not dealing properly with the bibliography I had created with BibTeX. Also pandoc gave my section headings <h1> tags. I took care of this by running the following command:

pandoc --base-header-level=2 --bibliography ../../../references.bib --filter pandoc-citeproc dirichlet-simple.tex -o dirichlet-simple.html

The online documentation says --base-header-level= is deprecated, and one should use --shift-heading-level-by=; but this didn’t work for me. (If pandoc didn’t come with my Ubuntu Linux installation, I may have installed it when converting the docx file of somebody else’s philosophy paper to LaTeX.)

At this point, the problem remained that pandoc did not deal properly with

  • theorem environments,

  • labels of equations.

I found discussion of these on the Google group for pandoc. I have not understood why they should be a problem, or at least why references should be a problem.

For example, pandoc prints the label of a theorem as plain text. It could print the label of an equation in the same way, but apparently it doesn’t.

One can apparently customize pandoc, but for now I don’t know how. Therefore I have done the following (probably not something to be done on a regular basis):

  • formatted and numbered by hand my theorems and lemmas;

  • formatted the proofs by hand, inserting $\Box$ at the end if this is text, and \qquad\Box if it’s an equation (pandoc could not deal with \qed);

  • changed \label{#1} (which I habitually place just after \begin{equation}) to \mylabel{#1}, defined as (#1)\qquad;

  • redefined \eqref{#1} as (#1);

  • changed all of my equation labels to the desired serial numbers;

  • changed every {equation} to {equation*}.

After all of this, I cleaned up the html file created by pandoc by:

  • putting blank lines between paragraphs (for ease in reading the html file);

  • changing <span class = "math display"> to <div style= "text-align:center;">;

  • putting <div style = "text-align:justify; margin-left:10%; margin-right:10%;"> at the head, and </div> at the foot, as with all of my blog posts and pages.

Now I am using pandoc to create the present html file from the plain text file that I originally typed.

One Comment

  1. Posted September 9, 2020 at 11:46 am | Permalink | Reply

    Most useful — thanks!

6 Trackbacks

  1. By Why It Works « Polytropy on September 26, 2020 at 10:49 am

    […] I did not try to define compactness in the last post. Perhaps one does not normally learn about this concept until one has spent some time with calculus. Calculus is the practical side of what has the theoretical side called analysis. From high school I have Apostol’s Mathematical Analysis (second edition, Reading, Mass.: Addison-Wesley, 1974); we used selections from it with Mr Brown to learn about uniform convergence, a concept needed also for the proof of Dirichlet’s Theorem that I discussed in “LaTeX to HTML.” […]

  2. By Mathematics and Logic « Polytropy on October 13, 2020 at 8:55 am

    […] the blog page called, and about, “Primes in Arithmetic Progressions”; that page was the subject of the post called “LaTeX to HTML.” […]

  3. By Directory « Polytropy on October 26, 2020 at 1:37 pm

    […] A proof of Dirichlet’s theorem on primes in arithmetic progressions, in pdf and html format (as discussed in “LaTeX to HTML”) […]

  4. By The point of teaching mathematics « Polytropy on October 30, 2020 at 7:01 am

    […] The underlying html file is now created by pandoc from a text file, edited from the one that I obtained by applying pandoc to the old html file. I talk about pandoc in “LaTeX to HTML.” […]

  5. By Computer Recovery « Polytropy on December 8, 2020 at 8:04 am

    […] edit online with the WordPress editor. My current practice, initiated with the composition of “LaTeX to HTML” three months ago, is to edit a text (txt) file, using “Pandoc’s enhanced version of […]

  6. By Gödel, Grammar, and Mathematics « Polytropy on February 14, 2022 at 8:01 am

    […] they are; at any rate, the approach is different. I prepared the post by the method discussed in LaTeX to HTML. Details are at the end, after the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: