Regex - Remove HTML Tags

Character

Meaning

<

Matches character ā€œ<ā€

[^<]

Negated set - matches any character that is not in the set.

+

Matches one or more of the preceding token

?

If used immediately after any of the quantifiers *, +, ?, or {}, makes the quantifier non-greedy (matching the minimum number of times).

# Replace all html tags with blank from surveyAnswer column in dataframe df.
# regex=True is the default so you can choose not to explicitly specify it.
df["surveyAnswer"] = df["surveyAnswer"].str.replace('<[^<]+?>','',regex=True)

Last updated