
第一章 XML处理基础
1.1 XML文档结构特征
典型XML文档示例:
<school>
<student id="1001">
<name>张三</name>
<score math="90" english="85"/>
</student>
</school>1.2 编码规范要求
- 文件头声明:
- 标签嵌套规则
- 属性值引号使用
第二章 核心解析方法
2.1 DOM解析(xml.dom)
from xml.dom.minidom import parse
doc = parse("data.xml")
students = doc.getElementsByTagName("student")
for s in students:
print(s.getAttribute("id"))2.2 SAX解析(xml.sax)
class StudentHandler(xml.sax.ContentHandler):
def startElement(self, name, attrs):
if name == "student":
print("ID:", attrs["id"])
parser = xml.sax.make_parser()
parser.setContentHandler(StudentHandler())
parser.parse("data.xml")2.3 ElementTree解析(xml.etree)
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
for student in root.findall('student'):
print(student.attrib['id'])第三章 性能对比测试
3.1 解析效率对比(单位:ms)
方法1MB文件10MB文件内存占用DOM1201350高SAX85820低ElementTree65700中
3.2 异常处理机制
try:
ET.parse('broken.xml')
except ET.ParseError as e:
print(f"解析错误:{e.position}: {e.msg}")第四章 高级应用方案
4.1 命名空间处理
namespaces = {'ns': 'http://school.edu/schema'}
for student in root.findall('ns:student', namespaces):
print(student.find('ns:name', namespaces).text)4.2 XPath查询
# 查找数学成绩大于90的学生
high_scores = root.findall(".//student[score/@math>'90']")第五章 工程实践案例
5.1 配置文件解析
def load_config(config_file):
config = {}
tree = ET.parse(config_file)
for item in tree.findall('setting'):
config[item.get('key')] = item.text
return config5.2 Web服务数据交换
@app.route('/api/xml', methods=['POST'])
def handle_xml():
root = ET.fromstring(request.data)
# 处理逻辑...
return ET.tostring(response_xml)第六章 安全防护方案
6.1 XXE攻击防护
parser = ET.XMLParser(
target=ET.TreeBuilder(),
forbid_dtd=True,
forbid_entities=True
)
safe_tree = ET.parse('input.xml', parser=parser)6.2 输入验证机制
from defusedxml.ElementTree import parse
safe_tree = parse('untrusted.xml')第七章 优化策略
7.1 流式处理大文件
for event, elem in ET.iterparse('large.xml'):
if elem.tag == 'student':
process_student(elem)
elem.clear() # 及时释放内存7.2 并行解析方案
from concurrent.futures import ThreadPoolExecutor
def parse_chunk(xml_chunk):
return ET.fromstring(xml_chunk)
with open('huge.xml') as f:
with ThreadPoolExecutor() as executor:
results = executor.map(parse_chunk, chunk_file(f))版权声明:





