Get started

Requirement

python-livy requires Python 3.6+.

It does not require third party library for core features, but could be more handful to use with some optional dependencies. See Install.

Install

This package is only hosted on Github.

pip does have feature to support downloading packages from git server, so it not exists a strong reason for me to submit this to PyPI.

pip from VCS

Full installation:

pip install -U 'git+https://github.com/tzing/python-livy.git#egg=livy[pretty,aws]'

Two “extras” is included, they are for:

pretty

Enhance the command line output with color and progress bar.

aws

Install boto3 for using plugin upload_s3.

If you do not need these features, you could use basic installation:

pip install -U 'git+https://github.com/tzing/python-livy.git#egg=livy'

From wheel package

If there’s some reason that you could not directly install from VCS. You could try to get the wheel package from release page.

From source

The dependencies in this project is managed by poetry. Please refer to official document to get poetry first:

git clone git@github.com:tzing/python-livy.git
cd python-livy
poetry install

Usage

Command line tool

First, set the livy server URL in local config file:

livy config set root.api_url http://ip-10-12-34-56.us-west-2.compute.internal:8998/

Note

The given URL should not contain extra path (do not include /ui/batch).

All configurations would be saved in ~/.config/python-livy.json. Settings could be specificied via argument in each command, but we probably do not want to pass these values every time.

Then we could use it to read logs:

livy read-log 1234

For read-log command, it would keep watching the logs until the batch finished by default. Still, we could turn of this behavior by argument or configuration.

Also, we could submit a new task:

livy submit s3://example-bucket/test_script/main.py

Well, it’s bit troublesome to upload the script by our self, so we could utilize the plugin system:

livy config set submit.pre_submit livy.cli.plugin:upload_s3

This tool is shipped with plugin upload_s3. It could automatically upload the local scirpt to AWS S3. This could be helpful if you are using EMR.

Note

Currently it does not have plugin for native HDFS / GCP / Azure. Please file an issue or PR if you need it.

This plugin need extra configure but not supporting set via command line. Please use the editor to open ~/.config/python-livy.json and add pre-submit:upload_s3 section:

{
  "root": {
    "...": "existing configs, please do not change"
  },
  "pre-submit:upload_s3": {
    "bucket": "example-bucket",
    "folder_format": "{time:%Y%m%d%H%M%S}-{script_name}-{uuid}",
    "expire_days": 3
  }
}

There are three keys: bucket for S3 bucket name, folder_format as the prefix to store the scirpt(s), and expire_days to set lifetime to the objects.

After the configure, we could simply use the command line tool to submit the task:

livy submit main.py

Log reader would be started after submission.

Note

upload_s3 plugin uses boto3 to upload the files, you should run this tool with s3:PutObject. Or an error would raised.

As library

We could utilize the core components in another scripts. They do not use any extra dependency and could be retrieved by importting livy package.

Note plugin system would not be triggered in core library. For action like submit, script(s) should be already stored in somewhere readable by the server.

>>> import livy
>>> client = livy.LivyClient("http://ip-10-12-34-56.us-west-2.compute.internal:8998/")
>>> client.create_batch("s3://example-bucket/test_script/main.py")
{
    "id": 55,
    "name": None,
    "owner": None,
    "proxyUser": None,
    "state": "starting",
    "appId": None,
    "appInfo": {
        "driverLogUrl": None,
        "sparkUiUrl": None
    },
    "log": [
        "stdout: ",
        "\nstderr: ",
        "\nYARN Diagnostics: "
    ]
}

>>> reader = livy.LivyBatchLogReader(client, 55)
>>> reader.read_until_finish()  # read logs and broadcast to log handlers

for API document, see Core library.

Advanced usage

Set default configs and repack

In some case, we want to install this tool into multiple environments with setting configurations every time. We could re-packing this tool with default configurations for myself.

First, clone the repo:

git clone git@github.com:tzing/python-livy.git
cd python-livy

Create default-configuration.json under livy/, this is a hardcoded filename would be read by this tool but not exists in this origin repo.

Save everything we want in this file, could be:

{
  "root": {
    "api_url": "http://example.com:8998/"
  },
  "submit": {
    "pre_submit": [
      "livy.cli.plugin:upload_s3"
    ]
  },
  "pre-submit:upload_s3": {
    "bucket": "example-bucket",
    "folder_format": "{time:%Y%m%d%H%M%S}-{script_name}-{uuid}"
  }
}

And build this tool for distributing:

poetry build

Then find the wheel or tar file in dist/.