Running Spark in the cloud with Kubernetes
Running on Google Container Engine (GKE)
- Create a GKE container cluster.
- Obtain kubectl and configure it appropriately.
Find the identity of the master associated with this project.
kubectl cluster-info Kubernetes master is running at https://
- Run spark-submit with the master option set to
k8s://https://<master-ip>:443. The instructions for running spark-submit are provided in the running on kubernetes tutorial.
- Check that your driver pod, and subsequently your executor pods are launched using
kubectl get pods.
- Read the stdout and stderr of the driver pod using
kubectl logs <name-of-driver-pod>, or stream the logs using
kubectl logs -f <name-of-driver-pod>.
- If you face OAuth token expiry errors when you run spark-submit, it is likely because the token needs to be refreshed. The easiest way to fix this is to run any
kubectl versionand then retry your submission.