Skip to content

SebastianTuesta/USING-PYSPARK-FIRST-TIME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

PYSPARK

DOWNLOAD THIS

SET THE PARAMETERS

  • Add the next to enviroment variables of os (in my case, i am using Windows 10):

    • HADOOP_HOME: "C:\hadoop-2.7.0"
    • JAVA_HOME: "C:\Java\jdk1.8.0_171 (It has not to be in Program Files folder!!!)"
    • PYSPARK_PYTHON: "C:\Users\Sebastián\Anaconda3\python.exe"
    • SPARK_HOME: "C:\spark-2.4.3-bin-hadoop2.7\
    • PYSPARK_DRIVER_PYTHON ipython
    • PYSPARK_DRIVER_PYTHON_OPTS notebook
  • Add to the path:

    • C:\spark-2.4.3-bin-hadoop2.7\bin
    • C:\hadoop-2.7.0\bin
    • C:\Java\jdk1.8.0_171\bin
  • Add to the source C:\hadoop-2.7.0\bin, the next files hadood.dll and winutils.exe that are founded in: https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin

  • Add the next jar libraries to the next D:\spark-2.4.3-bin-hadoop2.7\jars:

  • If you want to use s3a protocols: Add this libraries, to the lib folder of spark, download from maven:

    • aws-java-sdk-1.7.4.jar, verify that the dependencie of your hadoop-aws is compatible with aws-java-sdk.

POWER BI

Power BI has problems with python´s anaconda. So, the recomendation is install python of this setup: https://github.com/winpython/winpython/releases/tag/2.1.20190928. Then we must replace the path of Power BI for python.

CONNECT PYTHON WITH ORACLE 32 BITS

https://oracle.github.io/odpi/doc/installation.html#windows https://stackoverflow.com/questions/33709391/using-multiple-python-engines-32bit-64bit-and-2-7-3-5

About

Recomendations and remember of how set pyspark configuration

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages