By using this site, you agree to have cookies stored on your device, strictly for functional purposes, such as storing your session and preferences.

Dismiss

More robust markdown parser

roundabout,
created on Wednesday, 27 March 2024, 17:44:02 (1711561442), received on Wednesday, 31 July 2024, 06:54:42 (1722408882)
Author identity: vlad <vlad.muntoiu@gmail.com>

26299766e55f71d32139495f3be908283fa28c20

markdown.py

@@ -61,11 +61,9 @@ class Heading(Container):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            class Paragraph(Container):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def __init__(self):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def __init__(self, content):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    super().__init__("")
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def addLine(self, content):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                self.content.extend([*parse_line(content), " "])
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                self.content = parse_line(content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                def __repr__(self):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    return "Paragraph:\n\t" + repr(self.content)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -75,6 +73,19 @@ class Paragraph(Container):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    return "p"
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        class Blockquote(Paragraph):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def __init__(self, content):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                super().__init__("")
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                self.content = tokenise(content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def __repr__(self):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                return "Blockquote:\n\t" + repr(self.content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            @property
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            def tag_name(self):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                return "blockquote"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            class Emphasis(Container):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                def __init__(self, content, value):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    super().__init__(content)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -140,7 +151,7 @@ class Link(Element):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    self.image = image
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                def __repr__(self):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                return f"{'Image' if self.image else 'Link'}: {self.text} -> {self.destination}"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                return f"{'Image' if self.image else 'Link'}: {self.content} -> {self.destination}"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                @property
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                def tag_name(self):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -182,9 +193,10 @@ def parse_line(source):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    if i.group("diff"):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        tokens.append(Diff(i.group("textDiff"), i.group("diff")))
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    if i.group("urlText"):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    tokens.append(Link(i.group("urlText"), i.group("urlDestination")))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                if i.group("imageFlag"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    tokens.append(Image(i.group("urlText"), i.group("urlDestination")))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    if i.group("imageFlag"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tokens.append(Image(i.group("urlText"), i.group("urlDestination")))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    else:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tokens.append(Link(i.group("urlText"), i.group("urlDestination")))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                tokens.append(source[lookup:])
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -194,27 +206,51 @@ def parse_line(source):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            def tokenise(source):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                tokens = []
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            current_block = Element
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            current_block = Element()
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            lines = source.split("\n")
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            for line in source.split("\n"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            i = 0
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            while i < len(lines):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                line = lines[i]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    if not line.strip():
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        # Void block
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        tokens.append(current_block)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        current_block = Element()
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    i += 1
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    elif line.startswith("#") and leading(line.lstrip("#"), " "):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        tokens.append(current_block)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        content = line.lstrip("#").strip()
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        current_block = Heading(content, leading(line, "#"))
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    i += 1
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                elif line.startswith(">"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    if not isinstance(current_block, Blockquote):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tokens.append(current_block)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    content = ""
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    while i < len(lines) and lines[i].startswith(">"):
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        content += lines[i].lstrip(">").strip() + "\n"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        i += 1
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    current_block = Blockquote(content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    else:
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        if not isinstance(current_block, Paragraph):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                            # Paragraph is default
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                            tokens.append(current_block)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        current_block = Paragraph()
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    current_block.addLine(line.strip())
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    content = ""
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    while i < len(lines) and not lines[i].startswith("#") and not lines[i].startswith(">") and lines[i].strip():
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        content += lines[i].strip() + "\n"
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        i += 1
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    current_block = Paragraph(content)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                tokens.append(current_block)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -229,6 +265,10 @@ def make_html(ast):

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        soup.append(i)
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                    elif hasattr(i, "content") and i.tag_name != "m-void":
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        tag = soup.new_tag(str(i.tag_name))
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    if i.tag_name == "a":
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tag["href"] = i.destination
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                    if i.tag_name == "img":
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        tag["src"] = i.destination
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                        try:
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                            if isinstance(i.content, list):
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                                tag.append(make_html(i.content))
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -255,11 +295,14 @@ if __name__ == '__main__':

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            Lorem **i`p`sum**
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            dolor `sit` amet
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        > Make it as simple as possible, [but not simpler](https://wikipedia.org).
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        > > If you can't explain it simply, you don't understand it well enough.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            ...
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            """
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                )
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            # for i in ast:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            #     print(repr(i))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            for i in ast:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                print(repr(i))
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # Now convert the AST to HTML
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                print(make_html(ast).prettify(formatter=beautifulsoup.formatter.HTMLFormatter(indent=4)))