Remove HTML Tags from String

 

I recently needed to manipulate some files with extensive HTML tags embedded and found a situation where I needed to evaluate the information that was not cluttered with numerous embedded HTML tags (< ... various settings, etc. >).  So I wrote this little subprogram to do the job.

To install the source programs (the subprogram and a test program), and compile them, execute the bash script:  downloads/removeHTMLtags.setup  [md5: 11a497aba716903aded843965e7e696a].  Right click and save the script in the location where you want the source program to reside, then execute it.  It will create two files containing the source programs, then execute the GnuCOBOL compiler to compile them into the run unit for the test program, and finally will execute the test program.  The GnuCOBOL compiler must be installed prior to executing the bash script.  

Output from the test/verification program:

jay@Phoenix ~ $ ./testSubs 
Before[<!DOCTYPE html>  <html lang="en">     <head>               <title>The Root of all Evil</title>     </head>          <body>               <!-- big bag of evil content -->        </body>      </html>]
After [                      The Root of all Evil]
jay@Phoenix ~ $ 

If you want to be able to dynamically call the subprogram move the object module (removeHTMLtags.so) to a location included in your COB_LIBRARY_PATH.


This page was last updated on April 06, 2021.