<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[xml - JBay Solutions - The Dev Blog]]></title><description><![CDATA[JBay Solutions Development Blog on Java, Android, Play2 and others]]></description><link>http://blog.jbaysolutions.com/</link><generator>Ghost 0.7</generator><lastBuildDate>Wed, 16 Oct 2024 01:15:05 GMT</lastBuildDate><atom:link href="http://blog.jbaysolutions.com/tag/xml/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Reading and converting XML files to Excel in Java]]></title><description><![CDATA[<p>Today we're going to show how to read a XML file and convert it's entries to lines on an excel file. </p>

<p>The XML file is located at <a href="https://github.com/jbaysolutions/xml-to-excel/blob/master/Publication1.xml?raw=true">https://github.com/jbaysolutions/xml-to-excel/blob/master/Publication1.xml?raw=true</a>.</p>

<p><strong>The XML file's main nodes are "Substances", each one has a few</strong></p>]]></description><link>http://blog.jbaysolutions.com/2015/10/16/reading-and-converting-xml-files-to-excel/</link><guid isPermaLink="false">f0159837-7141-49e2-9357-cf838f7e62c0</guid><category><![CDATA[java]]></category><category><![CDATA[apache poi]]></category><category><![CDATA[xml]]></category><dc:creator><![CDATA[Gustavo Santos]]></dc:creator><pubDate>Fri, 16 Oct 2015 14:43:45 GMT</pubDate><content:encoded><![CDATA[<p>Today we're going to show how to read a XML file and convert it's entries to lines on an excel file. </p>

<p>The XML file is located at <a href="https://github.com/jbaysolutions/xml-to-excel/blob/master/Publication1.xml?raw=true">https://github.com/jbaysolutions/xml-to-excel/blob/master/Publication1.xml?raw=true</a>.</p>

<p><strong>The XML file's main nodes are "Substances", each one has a few properties "Name", "entry_force", "directive" and a list of "Product". We're going to create an excel row for each Product. Each row will also have the Product parent Substance details.</strong></p>

<p>Below is a sample of the XML structure:</p>

<pre><code>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;Pesticides&gt;
&lt;Header&gt;
    &lt;Creation_Date&gt;09/07/2015 13:45&lt;/Creation_Date&gt;
&lt;/Header&gt;
&lt;Substances&gt;
    &lt;Name&gt;Garlic extract (++)&lt;/Name&gt;
    &lt;entry_force&gt;01/09/2008&lt;/entry_force&gt;
    &lt;directive&gt;Reg. (EC) No 839/2008&lt;/directive&gt;
    &lt;Product&gt;
        &lt;Product_name&gt;FRUITS, FRESH or FROZEN; TREE NUTS&lt;/Product_name&gt;
        &lt;Product_code&gt;0100000&lt;/Product_code&gt;
        &lt;MRL/&gt;
        &lt;ApplicationDate&gt;01/09/2008&lt;/ApplicationDate&gt;
    &lt;/Product&gt;
    &lt;Product&gt;
        &lt;Product_name&gt;Oranges (Bergamots, Bitter oranges/sour oranges, Blood oranges, Cara caras, Chinottos,
            Trifoliate oranges, Other hybrids of Citrus sinensis, not elsewhere mentioned,)
        &lt;/Product_name&gt;
        &lt;Product_code&gt;0110020&lt;/Product_code&gt;
        &lt;MRL/&gt;
        &lt;ApplicationDate&gt;01/09/2008&lt;/ApplicationDate&gt;
    &lt;/Product&gt;
    &lt;Product&gt;
        &lt;Product_name&gt;Lemons (Buddha's hands/Buddha's fingers, Citrons,)&lt;/Product_name&gt;
        &lt;Product_code&gt;0110030&lt;/Product_code&gt;
        &lt;MRL/&gt;
        &lt;ApplicationDate&gt;01/09/2008&lt;/ApplicationDate&gt;
    &lt;/Product&gt;
    &lt;Product&gt;
        &lt;Product_name&gt;Limes (Indian sweet limes/Palestine sweet limes, Kaffir limes, Sweet limes/mosambis, Tahiti
            limes,)
        &lt;/Product_name&gt;
        &lt;Product_code&gt;0110040&lt;/Product_code&gt;
        &lt;MRL/&gt;
        &lt;ApplicationDate&gt;01/09/2008&lt;/ApplicationDate&gt;
    &lt;/Product&gt;
&lt;/Substances&gt;
&lt;Substances&gt;
(...)
&lt;/Substances&gt;
</code></pre>

<p>As usual, we use  <a href="http://poi.apache.org/" title="Apache POI">Apache POI</a>, to create the excel file.</p>

<p>You can get the sample project used in this post at <a href="https://github.com/jbaysolutions/xml-to-excel">GitHub</a>.</p>

<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>  
<!-- Horizontal For Posts - Text Only -->  
<ins class="adsbygoogle" style="display:inline-block;width:728px;height:90px" data-ad-client="ca-pub-1311169549359552" data-ad-slot="3316155422"></ins>
<script>  
(adsbygoogle = window.adsbygoogle || []).push({});
</script>  

<p><br></p>

<h2 id="downloadingthefile">Downloading the file</h2>

<p>We start by downloading the file from it's original URL location:</p>

<pre><code>File xmlFile = File.createTempFile("substances", "tmp");
String xmlFileUrl = "http://ec.europa.eu/food/plant/pesticides/eu-pesticides-database/public/?event=Execute.DownLoadXML&amp;id=1";
URL url = new URL(xmlFileUrl);
System.out.println("downloading file from " + xmlFileUrl + " ...");
FileUtils.copyURLToFile(url, xmlFile);
System.out.println("downloading finished, parsing...");
</code></pre>

<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>  
<!-- Horizontal For Posts - Text Only -->  
<ins class="adsbygoogle" style="display:inline-block;width:728px;height:90px" data-ad-client="ca-pub-1311169549359552" data-ad-slot="3316155422"></ins>
<script>  
(adsbygoogle = window.adsbygoogle || []).push({});
</script>  

<p><br></p>

<h2 id="preparingtheexcelfile">Preparing the Excel file</h2>

<p>To create the Excel file where we're writing, we start by creating a new workbook, an empty sheet and writing the first line with the column headers:</p>

<pre><code>workbook = new XSSFWorkbook();

CellStyle style = workbook.createCellStyle();
Font boldFont = workbook.createFont();
boldFont.setBold(true);
style.setFont(boldFont);
style.setAlignment(CellStyle.ALIGN_CENTER);

Sheet sheet = workbook.createSheet();
rowNum = 0;
Row row = sheet.createRow(rowNum++);
Cell cell = row.createCell(SUBSTANCE_NAME_COLUMN);
cell.setCellValue("Substance name");
cell.setCellStyle(style);

cell = row.createCell(SUBSTANCE_ENTRY_FORCE_COLUMN);
cell.setCellValue("Substance entry_force");
cell.setCellStyle(style);

cell = row.createCell(SUBSTANCE_DIRECTIVE_COLUMN);
cell.setCellValue("Substance directive");
cell.setCellStyle(style);

cell = row.createCell(PRODUCT_NAME_COLUMN);
cell.setCellValue("Product name");
cell.setCellStyle(style);

cell = row.createCell(PRODUCT_CODE_COLUMN);
cell.setCellValue("Product code");
cell.setCellStyle(style);

cell = row.createCell(PRODUCT_MRL_COLUMN);
cell.setCellValue("MRL");
cell.setCellStyle(style);

cell = row.createCell(APPLICATION_DATE_COLUMN);
cell.setCellValue("Application Date");
cell.setCellStyle(style);
</code></pre>

<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>  
<!-- Horizontal For Posts - Text Only -->  
<ins class="adsbygoogle" style="display:inline-block;width:728px;height:90px" data-ad-client="ca-pub-1311169549359552" data-ad-slot="3316155422"></ins>
<script>  
(adsbygoogle = window.adsbygoogle || []).push({});
</script>  

<p><br></p>

<h2 id="parsing">Parsing</h2>

<p>For this sample, the XML file is parsed using <a href="https://en.wikipedia.org/wiki/Document_Object_Model">DOM</a>.</p>

<p>We get the reference to the excel file sheet:</p>

<pre><code>Sheet sheet = workbook.getSheetAt(0);
</code></pre>

<p>We start by loading the XML document using DOM and getting the Substances node list:</p>

<pre><code>DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);

NodeList nList = doc.getElementsByTagName("Substances");
</code></pre>

<p>Then we iterate through the Substances list and get the Substance properties:</p>

<pre><code>for (int i = 0; i &lt; nList.getLength(); i++) {
    System.out.println("Processing element " + (i+1) + "/" + nList.getLength());
    Node node = nList.item(i);
    if (node.getNodeType() == Node.ELEMENT_NODE) {
        Element element = (Element) node;
        String substanceName = element.getElementsByTagName("Name").item(0).getTextContent();
        String entryForce = element.getElementsByTagName("entry_force").item(0).getTextContent();
        String directive = element.getElementsByTagName("directive").item(0).getTextContent();

        NodeList prods = element.getElementsByTagName("Product");
</code></pre>

<p>When we get to the Product element, we get it as a NodeList and iterate it to get it's details:</p>

<pre><code>for (int j = 0; j &lt; prods.getLength(); j++) {
    Node prod = prods.item(j);
    if (prod.getNodeType() == Node.ELEMENT_NODE) {
        Element product = (Element) prod;
        String prodName = product.getElementsByTagName("Product_name").item(0).getTextContent();
        String prodCode = product.getElementsByTagName("Product_code").item(0).getTextContent();
        String lmr = product.getElementsByTagName("MRL").item(0).getTextContent();
        String applicationDate = product.getElementsByTagName("ApplicationDate").item(0).getTextContent();
</code></pre>

<p>Now that we have all the details we want to write on the excel file, we create a row with all the details:</p>

<pre><code>Row row = sheet.createRow(rowNum++);
Cell cell = row.createCell(SUBSTANCE_NAME_COLUMN);
cell.setCellValue(substanceName);

cell = row.createCell(SUBSTANCE_ENTRY_FORCE_COLUMN);
cell.setCellValue(entryForce);

cell = row.createCell(SUBSTANCE_DIRECTIVE_COLUMN);
cell.setCellValue(directive);

cell = row.createCell(PRODUCT_NAME_COLUMN);
cell.setCellValue(prodName);

cell = row.createCell(PRODUCT_CODE_COLUMN);
cell.setCellValue(prodCode);

cell = row.createCell(PRODUCT_MRL_COLUMN);
cell.setCellValue(lmr);

cell = row.createCell(APPLICATION_DATE_COLUMN);
cell.setCellValue(applicationDate);
</code></pre>

<p>When all the elements are written, we write the excel to the filesystem:</p>

<pre><code>FileOutputStream fileOut = new FileOutputStream("C:/Temp/Excel-Out.xlsx");
workbook.write(fileOut);
workbook.close();
fileOut.close();
</code></pre>

<p>Finally, we delete the downloaded XML file:</p>

<pre><code>if (xmlFile.exists()) {
    System.out.println("delete file-&gt; " + xmlFile.getAbsolutePath());
    if (!xmlFile.delete()) {
        System.out.println("file '" + xmlFile.getAbsolutePath() + "' was not deleted!");
    }
}
</code></pre>

<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>  
<!-- Horizontal For Posts - Text Only -->  
<ins class="adsbygoogle" style="display:inline-block;width:728px;height:90px" data-ad-client="ca-pub-1311169549359552" data-ad-slot="3316155422"></ins>
<script>  
(adsbygoogle = window.adsbygoogle || []).push({});
</script>  

<p><br></p>

<h2 id="conclusion">Conclusion</h2>

<p>The sample project used in this post at <a href="https://github.com/jbaysolutions/excel-xml-reader">GitHub</a> has a main class <a href="https://github.com/jbaysolutions/xml-to-excel/blob/master/src/main/java/com/jbaysolutions/xmlreader/XmlToExcelConverter.java">XmlToExcelConverter</a> to download, parse the file and create the excel file.</p>

<p>Feel free to copy and adapt the code to read other XML files! <br>
Hope it helped anyone having the same issues as us!</p>

<h2 id="references">References</h2>

<p><a href="https://en.wikipedia.org/wiki/Document_Object_Model">DOM</a></p>

<p><a href="https://docs.oracle.com/javase/tutorial/jaxp/dom/">DOM Tutorial</a></p>

<p><a href="http://poi.apache.org/" title="Apache POI">Apache POI</a></p>]]></content:encoded></item></channel></rss>