JBay Solutions Development Blog on Java, Android, Play2 and others
RSS RSS RSS RSS

Reading and converting XML files to Excel in Java

Today we're going to show how to read a XML file and convert it's entries to lines on an excel file.

The XML file is located at https://github.com/jbaysolutions/xml-to-excel/blob/master/Publication1.xml?raw=true.

The XML file's main nodes are "Substances", each one has a few properties "Name", "entry_force", "directive" and a list of "Product". We're going to create an excel row for each Product. Each row will also have the Product parent Substance details.

Below is a sample of the XML structure:

<?xml version="1.0" encoding="UTF-8"?>
<Pesticides>
<Header>
    <Creation_Date>09/07/2015 13:45</Creation_Date>
</Header>
<Substances>
    <Name>Garlic extract (++)</Name>
    <entry_force>01/09/2008</entry_force>
    <directive>Reg. (EC) No 839/2008</directive>
    <Product>
        <Product_name>FRUITS, FRESH or FROZEN; TREE NUTS</Product_name>
        <Product_code>0100000</Product_code>
        <MRL/>
        <ApplicationDate>01/09/2008</ApplicationDate>
    </Product>
    <Product>
        <Product_name>Oranges (Bergamots, Bitter oranges/sour oranges, Blood oranges, Cara caras, Chinottos,
            Trifoliate oranges, Other hybrids of Citrus sinensis, not elsewhere mentioned,)
        </Product_name>
        <Product_code>0110020</Product_code>
        <MRL/>
        <ApplicationDate>01/09/2008</ApplicationDate>
    </Product>
    <Product>
        <Product_name>Lemons (Buddha's hands/Buddha's fingers, Citrons,)</Product_name>
        <Product_code>0110030</Product_code>
        <MRL/>
        <ApplicationDate>01/09/2008</ApplicationDate>
    </Product>
    <Product>
        <Product_name>Limes (Indian sweet limes/Palestine sweet limes, Kaffir limes, Sweet limes/mosambis, Tahiti
            limes,)
        </Product_name>
        <Product_code>0110040</Product_code>
        <MRL/>
        <ApplicationDate>01/09/2008</ApplicationDate>
    </Product>
</Substances>
<Substances>
(...)
</Substances>

As usual, we use Apache POI, to create the excel file.

You can get the sample project used in this post at GitHub.


Downloading the file

We start by downloading the file from it's original URL location:

File xmlFile = File.createTempFile("substances", "tmp");
String xmlFileUrl = "http://ec.europa.eu/food/plant/pesticides/eu-pesticides-database/public/?event=Execute.DownLoadXML&id=1";
URL url = new URL(xmlFileUrl);
System.out.println("downloading file from " + xmlFileUrl + " ...");
FileUtils.copyURLToFile(url, xmlFile);
System.out.println("downloading finished, parsing...");


Preparing the Excel file

To create the Excel file where we're writing, we start by creating a new workbook, an empty sheet and writing the first line with the column headers:

workbook = new XSSFWorkbook();

CellStyle style = workbook.createCellStyle();
Font boldFont = workbook.createFont();
boldFont.setBold(true);
style.setFont(boldFont);
style.setAlignment(CellStyle.ALIGN_CENTER);

Sheet sheet = workbook.createSheet();
rowNum = 0;
Row row = sheet.createRow(rowNum++);
Cell cell = row.createCell(SUBSTANCE_NAME_COLUMN);
cell.setCellValue("Substance name");
cell.setCellStyle(style);

cell = row.createCell(SUBSTANCE_ENTRY_FORCE_COLUMN);
cell.setCellValue("Substance entry_force");
cell.setCellStyle(style);

cell = row.createCell(SUBSTANCE_DIRECTIVE_COLUMN);
cell.setCellValue("Substance directive");
cell.setCellStyle(style);

cell = row.createCell(PRODUCT_NAME_COLUMN);
cell.setCellValue("Product name");
cell.setCellStyle(style);

cell = row.createCell(PRODUCT_CODE_COLUMN);
cell.setCellValue("Product code");
cell.setCellStyle(style);

cell = row.createCell(PRODUCT_MRL_COLUMN);
cell.setCellValue("MRL");
cell.setCellStyle(style);

cell = row.createCell(APPLICATION_DATE_COLUMN);
cell.setCellValue("Application Date");
cell.setCellStyle(style);


Parsing

For this sample, the XML file is parsed using DOM.

We get the reference to the excel file sheet:

Sheet sheet = workbook.getSheetAt(0);

We start by loading the XML document using DOM and getting the Substances node list:

DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);

NodeList nList = doc.getElementsByTagName("Substances");

Then we iterate through the Substances list and get the Substance properties:

for (int i = 0; i < nList.getLength(); i++) {
    System.out.println("Processing element " + (i+1) + "/" + nList.getLength());
    Node node = nList.item(i);
    if (node.getNodeType() == Node.ELEMENT_NODE) {
        Element element = (Element) node;
        String substanceName = element.getElementsByTagName("Name").item(0).getTextContent();
        String entryForce = element.getElementsByTagName("entry_force").item(0).getTextContent();
        String directive = element.getElementsByTagName("directive").item(0).getTextContent();

        NodeList prods = element.getElementsByTagName("Product");

When we get to the Product element, we get it as a NodeList and iterate it to get it's details:

for (int j = 0; j < prods.getLength(); j++) {
    Node prod = prods.item(j);
    if (prod.getNodeType() == Node.ELEMENT_NODE) {
        Element product = (Element) prod;
        String prodName = product.getElementsByTagName("Product_name").item(0).getTextContent();
        String prodCode = product.getElementsByTagName("Product_code").item(0).getTextContent();
        String lmr = product.getElementsByTagName("MRL").item(0).getTextContent();
        String applicationDate = product.getElementsByTagName("ApplicationDate").item(0).getTextContent();

Now that we have all the details we want to write on the excel file, we create a row with all the details:

Row row = sheet.createRow(rowNum++);
Cell cell = row.createCell(SUBSTANCE_NAME_COLUMN);
cell.setCellValue(substanceName);

cell = row.createCell(SUBSTANCE_ENTRY_FORCE_COLUMN);
cell.setCellValue(entryForce);

cell = row.createCell(SUBSTANCE_DIRECTIVE_COLUMN);
cell.setCellValue(directive);

cell = row.createCell(PRODUCT_NAME_COLUMN);
cell.setCellValue(prodName);

cell = row.createCell(PRODUCT_CODE_COLUMN);
cell.setCellValue(prodCode);

cell = row.createCell(PRODUCT_MRL_COLUMN);
cell.setCellValue(lmr);

cell = row.createCell(APPLICATION_DATE_COLUMN);
cell.setCellValue(applicationDate);

When all the elements are written, we write the excel to the filesystem:

FileOutputStream fileOut = new FileOutputStream("C:/Temp/Excel-Out.xlsx");
workbook.write(fileOut);
workbook.close();
fileOut.close();

Finally, we delete the downloaded XML file:

if (xmlFile.exists()) {
    System.out.println("delete file-> " + xmlFile.getAbsolutePath());
    if (!xmlFile.delete()) {
        System.out.println("file '" + xmlFile.getAbsolutePath() + "' was not deleted!");
    }
}


Conclusion

The sample project used in this post at GitHub has a main class XmlToExcelConverter to download, parse the file and create the excel file.

Feel free to copy and adapt the code to read other XML files!
Hope it helped anyone having the same issues as us!

References

DOM

DOM Tutorial

Apache POI



comments powered by Disqus