Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClassCast Exeption while reading XLSX files #25

Open
prassee opened this issue Jan 12, 2023 · 4 comments
Open

ClassCast Exeption while reading XLSX files #25

prassee opened this issue Jan 12, 2023 · 4 comments

Comments

@prassee
Copy link

prassee commented Jan 12, 2023

While reading xlsx files the job throws ClassCastException. The artifact , sample code and error log is attached below.
(Note :- since this is not available in maven central I loaded the jar under lib dir)
Spark Version - 3.2.0
Scala Version - 2.12.15

  val s3Path    = s"s3a://.../*.xlsx"

  val xlsStmts = spark.read
    .format("com.elastacloud.spark.excel")
    .option("cellAddress", "A1") // The first line of the table starts at cell C3
    .option(
      "sheetNamePattern",
      """Xns"""
    )                           // Read data from all sheets matching this pattern (e.g. Sheet1 and Sheet3)
    .option("maxRowCount", 100) // Read only the first 10 records to determine the schema of the data
    .option("thresholdBytesForTempFiles", 50000000) // Setdd
    .load(s3Path)

Error Log

java.lang.ClassCastException: class org.apache.xmlbeans.impl.values.XmlComplexContentImpl 
cannot be cast to class elastashade.poi.schemas.vmldrawing.XmlDocument 

(org.apache.xmlbeans.impl.values.XmlComplexContentImpl and elastashade.poi    
.schemas.vmldrawing.XmlDocument are in unnamed module of loader java.net.URLClassLoader @10bf3464) at 
elastashade.poi.xssf.usermodel.XSSFVMLDrawing.read(XSS    FVMLDrawing.java:147) at 
elastashade.poi.xssf.usermodel.XSSFVMLDrawing.<init>(XSSFVMLDrawing.java:123) at 
elastashade.poi.ooxml.POIXMLFactory.createDocument    Part(POIXMLFactory.java:61) at 
elastashade.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:661) at 
elastashade.poi.ooxml.POIXMLDocumentPart.read(P    OIXMLDocumentPart.java:678) at 
elastashade.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:165) at 
elastashade.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSF    Workbook.java:259) at 
elastashade.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook( 

@dazfuller
Copy link
Contributor

Do you have a sample of the source file that you can share?

@prassee
Copy link
Author

prassee commented Jan 13, 2023

Its mentioned in the desc of the ticket

@dazfuller
Copy link
Contributor

Sorry, I meant the xlsx file specifically. The library happily parses all xlsx files I've given it, so it would be useful to see the kind of data which is leading to this error

@prassee
Copy link
Author

prassee commented Jan 16, 2023

Sorry again the XLSX files do contain financial txns hence cannot share it here.
However it has the following hierarchy of worksheets

root
|_ Xns
|_ Xns inbound
|_ Xns outbound

Each worksheet mentioned about has the same columns

Sl. No. 	Date	Cheque No.	Description	Amount	Category	Balance

On the other hand I have the following points

  • how the jar file has to be placed in a Spark Scala project (with SBT) it would be helpful to see a sample project with suitable jar loaded along with the dependencies.
  • I also use this library with sbt-assembly plugin. Does it matters how to integrate the jar with the plugin ?
    I guess this issue is mostly with the shaded class path.
    I know I had listed as many as possibilities but any help appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants