Important information: Google announced that, from September 2026, Android devices will require ALL apps to be signed by Google, effectively leading to an iOS situation. Value your right to a computer that does what you want; do not tolerate this monopolistic practice! Contact me if you don't understand why it is bad. Click to learn more.

More robust markdown parser

by roundabout, Wednesday, 27 March 2024, 17:44:02 (1711561442), pushed by roundabout, Wednesday, 31 July 2024, 06:54:42 (1722408882)

Author identity: vlad <vlad.muntoiu@gmail.com>

26299766e55f71d32139495f3be908283fa28c20

markdown.py

@@ -61,11 +61,9 @@ class Heading(Container):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            class Paragraph(Container):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def __init__(self):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def __init__(self, content):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    super().__init__("")
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def addLine(self, content):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                self.content.extend([*parse_line(content), " "])
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                self.content = parse_line(content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                def __repr__(self):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    return "Paragraph:\n\t" + repr(self.content)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -75,6 +73,19 @@ class Paragraph(Container):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    return "p"
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        class Blockquote(Paragraph):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def __init__(self, content):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                super().__init__("")
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                self.content = tokenise(content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def __repr__(self):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                return "Blockquote:\n\t" + repr(self.content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            @property
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def tag_name(self):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                return "blockquote"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            class Emphasis(Container):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                def __init__(self, content, value):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    super().__init__(content)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -140,7 +151,7 @@ class Link(Element):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    self.image = image
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                def __repr__(self):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                return f"{'Image' if self.image else 'Link'}: {self.text} -> {self.destination}"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                return f"{'Image' if self.image else 'Link'}: {self.content} -> {self.destination}"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                @property
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                def tag_name(self):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -182,9 +193,10 @@ def parse_line(source):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    if i.group("diff"):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        tokens.append(Diff(i.group("textDiff"), i.group("diff")))
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    if i.group("urlText"):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    tokens.append(Link(i.group("urlText"), i.group("urlDestination")))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                if i.group("imageFlag"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    tokens.append(Image(i.group("urlText"), i.group("urlDestination")))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    if i.group("imageFlag"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tokens.append(Image(i.group("urlText"), i.group("urlDestination")))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    else:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tokens.append(Link(i.group("urlText"), i.group("urlDestination")))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                tokens.append(source[lookup:])
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -194,27 +206,51 @@ def parse_line(source):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            def tokenise(source):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                tokens = []
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            current_block = Element
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            current_block = Element()
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            lines = source.split("\n")
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            for line in source.split("\n"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            i = 0
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            while i < len(lines):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                line = lines[i]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    if not line.strip():
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        # Void block
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        tokens.append(current_block)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        current_block = Element()
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    i += 1
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    elif line.startswith("#") and leading(line.lstrip("#"), " "):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        tokens.append(current_block)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        content = line.lstrip("#").strip()
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        current_block = Heading(content, leading(line, "#"))
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    i += 1
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                elif line.startswith(">"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    if not isinstance(current_block, Blockquote):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tokens.append(current_block)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    content = ""
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    while i < len(lines) and lines[i].startswith(">"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        content += lines[i].lstrip(">").strip() + "\n"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        i += 1
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    current_block = Blockquote(content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    else:
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        if not isinstance(current_block, Paragraph):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                            # Paragraph is default
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                            tokens.append(current_block)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        current_block = Paragraph()
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    current_block.addLine(line.strip())
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    content = ""
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    while i < len(lines) and not lines[i].startswith("#") and not lines[i].startswith(">") and lines[i].strip():
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        content += lines[i].strip() + "\n"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        i += 1
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    current_block = Paragraph(content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                tokens.append(current_block)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -229,6 +265,10 @@ def make_html(ast):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        soup.append(i)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    elif hasattr(i, "content") and i.tag_name != "m-void":
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        tag = soup.new_tag(str(i.tag_name))
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    if i.tag_name == "a":
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tag["href"] = i.destination
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    if i.tag_name == "img":
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tag["src"] = i.destination
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        try:
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                            if isinstance(i.content, list):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                                tag.append(make_html(i.content))
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -255,11 +295,14 @@ if __name__ == '__main__':

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            Lorem **i`p`sum**
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            dolor `sit` amet
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        > Make it as simple as possible, [but not simpler](https://wikipedia.org).
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        > > If you can't explain it simply, you don't understand it well enough.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            ...
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            """
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                )
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            # for i in ast:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            #     print(repr(i))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            for i in ast:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                print(repr(i))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # Now convert the AST to HTML
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                print(make_html(ast).prettify(formatter=beautifulsoup.formatter.HTMLFormatter(indent=4)))