LaTeX to HTML

This is a little about mathematics, and a little about writing for the web, but mostly about the nuts and bolts of putting mathematics on the web. I want to record how, mainly with the pandoc program, I have converted some mathematics from a LaTeX file into html. Like “Computer Recovery” then, this post is a laboratory notebook.

The mathematics is a proof of Dirichlet’s 1837 theorem on primes in arithmetic progressions. This is the theorem that, if to some number you keep adding a number that is prime to it, then, among the numbers that you encounter, there will be no end to those that are prime.

For some reason, I wanted to learn the proof. Maybe this had to do with having given courses, in 2018 and 2019, on the Prime Number Theorem of 1896; I taught at the Nesin Mathematics Village, but was not able to teach there this summer, owing to the Covid pandemic. Dirichlet’s theorem could be part of a course at the Village.

I read the proof in Landau’s Elementary Number Theory, a book originally published in 1927, which was ninety years after Dedekind’s theorem. I wrote out the argument, according to my understanding. If you want to read what I wrote, there are:

a pdf file (24 pages, size A5) based on the LaTeX file that I composed;
an html file derived from my LaTeX file by means of the pandoc program.

Would it be worth your while to try to read this, in one form or other?

Recently I encountered a page of links to over two hundred expository articles by a mathematician. I looked at one article, and I wondered what kind of audience would both

need to be told that an automorphism of a field is a bijective homomorphism from the field to itself,
already know the field of p-adic numbers.

I might have thought somebody who knew the p-adic numbers would also know something of Galois theory; but maybe not.

In drafting my own article on Dirichlet’s theorem, before deciding to put it on line, I had already given accounts of

what the reader should know,
what I know as an amateur of number theory.

I become increasingly aware of how webpages can be visited from anywhere, although their composers seem unaware of this. If you are a newspaper, what city are you in? If a university, what state or province? Even that question assumes you are in North America, which you might not be. If you are a blog, what are you trying to do, and how can your visitors decide whether to spend time with it?

I have tried to make my own “About” page useful in this way.

I wrote my article on Dirichlet’s theorem, to satisfy my curiosity. Then I remembered posting on this blog my memoir of life on a farm. I had obtained the html file from a LaTeX file using pandoc, and I had been pleased with the results. Notably, pandoc had kept my footnotes as such: they were at the end of the same html file.

I decided to try to convert the Dirichlet article to html.

WordPress allows the embedding of TeX and LaTeX code, but converts the code to images. The tex4ht program does this as well. I wanted to avoid images.

The pandoc program does not create images, but tries to express mathematics (along with everything else) as text. However, there are mathematical expressions that LaTeX accommodates, but html does not, or not so well. The pandoc program leaves those untouched.

I therefore edited my original LaTeX file, to turn all of the mathematics into something that pandoc could interpret. I tried to make it as easy as possible to switch between a LaTeX file as such—a file to be compiled by the latex program—and a file to be converted to html by pandoc.

Here is what I did.

Apparently pandoc can handle the commands that you define, even with arguments; but not default arguments. For example, my

\newcommand{\Zmod}[1][k]{\mathbb Z/#1\mathbb Z}

didn’t work until I removed [k].
I like the compactitem and compactenum environments of the paralist package, but apparently pandoc does not recognize these, so I switched back to itemize and enumerate.
Fractions are the big challenge. I used

\renewcommand{\frac}[2]{#1/#2}.

One will then need to use parentheses if a numerator or denominator is a sum; but I had this problem in only one case. The reader still has to understand a/bc as meaning a/(bc), although I have allowed (a/b)c to appear as a/b⋅c (this happens when c is a summation with ∑).
Summations themselves are a problem; I rewrote each \sum_{j=i}^{n} as \sum_{i\leq j\leq n}. (One could also define a new command with three arguments here.)
For one use, I defined:

\DeclareMathOperator{\nCk}{C}
\renewcommand{\binom}[2]{\nCk(#1,#2)}
None of the environments align, gather, and multline for displaying more than one line of mathematics together gets interpreted by pandoc; therefore I put each line to be displayed into its own equation environment.
There is a similar problem with the cases environment; I recast using itemize.
Apparently pandoc cannot handle a negated symbol such as \not\equiv, so I used \nequiv from txfonts.
I rewrote \pmod{#1} as \;(\text{mod }#1).

Doing that much gave a file that pandoc would render as pure html. However, there were remaining issues.

The program was not dealing properly with the bibliography I had created with BibTeX. Also pandoc gave my section headings <h1> tags. I took care of this by running the following command:

pandoc --base-header-level=2 --bibliography ../../../references.bib --filter pandoc-citeproc dirichlet-simple.tex -o dirichlet-simple.html

The online documentation says --base-header-level= is deprecated, and one should use --shift-heading-level-by=; but this didn’t work for me. (If pandoc didn’t come with my Ubuntu Linux installation, I may have installed it when converting the docx file of somebody else’s philosophy paper to LaTeX.)

At this point, the problem remained that pandoc did not deal properly with

theorem environments,
labels of equations.

I found discussion of these on the Google group for pandoc. I have not understood why they should be a problem, or at least why references should be a problem.

For example, pandoc prints the label of a theorem as plain text. It could print the label of an equation in the same way, but apparently it doesn’t.

One can apparently customize pandoc, but for now I don’t know how. Therefore I have done the following (probably not something to be done on a regular basis):

formatted and numbered by hand my theorems and lemmas;
formatted the proofs by hand, inserting $\Box$ at the end if this is text, and \qquad\Box if it’s an equation (pandoc could not deal with \qed);
changed \label{#1} (which I habitually place just after \begin{equation}) to \mylabel{#1}, defined as (#1)\qquad;
redefined \eqref{#1} as (#1);
changed all of my equation labels to the desired serial numbers;
changed every {equation} to {equation*}.

After all of this, I cleaned up the html file created by pandoc by:

putting blank lines between paragraphs (for ease in reading the html file);
changing <span class = "math display"> to <div style= "text-align:center;">;
putting

<div style = "text-align:justify; margin-left:10%; margin-right:10%;">

at the head, and </div> at the foot, as with all of my blog posts and pages.

Now I am using pandoc to create the present html file from the plain text file that I originally typed.

Edited April 30, 2024

One Comment

Alexandre Borovik

Posted September 9, 2020 at 11:46 am | Permalink | Reply

Most useful — thanks!

7 Trackbacks

By Why It Works « Polytropy on September 26, 2020 at 10:49 am

[…] I did not try to define compactness in the last post. Perhaps one does not normally learn about this concept until one has spent some time with calculus. Calculus is the practical side of what has the theoretical side called analysis. From high school I have Apostol’s Mathematical Analysis (second edition, Reading, Mass.: Addison-Wesley, 1974); we used selections from it with Mr Brown to learn about uniform convergence, a concept needed also for the proof of Dirichlet’s Theorem that I discussed in “LaTeX to HTML.” […]
By Mathematics and Logic « Polytropy on October 13, 2020 at 8:55 am

[…] the blog page called, and about, “Primes in Arithmetic Progressions”; that page was the subject of the post called “LaTeX to HTML.” […]
By Directory « Polytropy on October 26, 2020 at 1:37 pm

[…] A proof of Dirichlet’s theorem on primes in arithmetic progressions, in pdf and html format (as discussed in “LaTeX to HTML”) […]
By The point of teaching mathematics « Polytropy on October 30, 2020 at 7:01 am

[…] The underlying html file is now created by pandoc from a text file, edited from the one that I obtained by applying pandoc to the old html file. I talk about pandoc in “LaTeX to HTML.” […]
By Computer Recovery « Polytropy on December 8, 2020 at 8:04 am

[…] edit online with the WordPress editor. My current practice, initiated with the composition of “LaTeX to HTML” three months ago, is to edit a text (txt) file, using “Pandoc’s enhanced version of […]
By Gödel, Grammar, and Mathematics « Polytropy on February 14, 2022 at 8:01 am

[…] they are; at any rate, the approach is different. I prepared the post by the method discussed in LaTeX to HTML. Details are at the end, after the […]
By More on Dialectic « Polytropy on June 30, 2023 at 9:19 am

[…] In the table, each of the terms “Intelligible World” and “World of Appearances” should occupy a cell that spans two rows; however, this is not allowed by the pandoc program, which I use to convert the txt file that I compose to the html file that a browser makes readable. In 2020 I wrote about using pandoc for converting “LaTeX to HTML.”↩︎ […]

Polytropy

LaTeX to HTML

One Comment

7 Trackbacks

Leave a comment Cancel reply

Follow Blog by Email

Meta

Archives

Categories

Recent Posts

Search

Blogroll

RSS Feeds

Meta

Polytropy

LaTeX to HTML

Share this:

Related

One Comment

7 Trackbacks

Leave a comment Cancel reply

Follow Blog by Email

Meta

Archives

Categories

Recent Posts

Search

Blogroll

RSS Feeds

Meta