Core library

Core library is a set of wrapped API call and message handling classes. All of these classes could be used without extra dependency.

livy.LivyClient

class livy.client.LivyClient

Client that wraps requests to Livy server

This implementation follows Livy API v0.7.0 spec.

__init__(url, verify=True, timeout=30.0)
Parameters
  • url (str) – URL to the livy server

  • verify (Union[bool, ssl.SSLContext]) – Verifies SSL certificates or not; or use customized SSL context

  • timeout (float) – Timeout seconds for the connection.

Raises
  • TypeError – On a invalid data type is used for inputted argument

  • OperationError – On URL scheme is not supportted

Return type

None

Note

This package is designed to use lesser third-party libraries, thus it does not use high-level interface to handling the requests. And it results with limitations that we could not use such rich features as requests.

check(capture=True)

Check if server is up.

Parameters

capture (bool) – Capture the exception and returns boolean. Set to False for raise livy.exception.RequestError on error.

Return type

bool

create_batch(file, proxy_user=None, class_name=None, args=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, conf=None)

Request to create a batch.

Parameters
  • file (str) – File containing the application to execute

  • proxy_user (str) – User to impersonate when running the job

  • class_name (str) – Application Java/Spark main class

  • args (List[str]) – Command line arguments for the application

  • jars (List[str]) – Java dependencies to be used in this batch

  • py_files (List[str]) – Python dependencies to be used in this batch

  • files (List[str]) – files to be used in this batch

  • driver_memory (str) – Amount of memory to use for the driver process

  • driver_cores (int) – Number of cores to use for the driver process

  • executor_memory (str) – Amount of memory to use per executor process

  • executor_cores (int) – Number of cores to use for each executor

  • num_executors (int) – Number of executors to launch for this batch

  • archives (List[str]) – Archives to be used in this batch

  • queue (str) – The name of the YARN queue to which submitted

  • name (str) – The session name to execute this batch

  • conf (Dict[str, str]) – Spark configuration properties

Returns

batch – Created batch object from livy server

Return type

dict

Raises
delete_batch(batch_id)

Kill the batch job.

Parameters

batch_id (int) – Batch ID

Raises
Return type

None

get_batch_information(batch_id)

Get summary information to specific batch.

Parameters

batch_id (int) – Batch ID

Returns

batch – Batch information form livy server

Return type

dict

Raises
get_batch_log(batch_id, from_=None, size=None)

Get logs from the batch.

Parameters
  • batch_id (int) – Batch ID

  • from (int) – Offset

  • size (int) – Max line numbers to return

  • from_ (Optional[int]) –

Returns

logs – Log lines

Return type

List[str]

Raises
get_batch_state(batch_id)

Get state of the batch.

Parameters

batch_id (int) – Batch ID

Returns

state – Current state; Literally starting, running, dead, killed, or success.

Return type

str

Raises

Note

There is no complete batch state list in official document.

The given state list is based on my observation in real practice (perhaps we could check it’s source code as confirmation?). Since both state from session list and statement list could be observed, the complete possible list could be: not_started, starting, available, idle, waiting, busy, running, shutting_down, error, dead, killed, success, cancelling and cancelled.

is_batch_ended(batch_id)

Check batch state and return True if it is finished.

Parameters

batch_id (int) – Batch ID

Returns

finished – Task is over

Return type

bool

Raises

livy.LivyBatchLogReader

class livy.logreader.LivyBatchLogReader

Read Livy batch logs and publish to Python’s logging infrastructure.

__init__(client, batch_id, timezone=datetime.timezone.utc, prefix=None)
Parameters
Raises

TypeError – On a invalid data type is used for inputted argument

Return type

None

add_parsers(pattern, parser)

Add log parser to this reader.

Parameters
  • pattern (re.Pattern) – Regex pattern to match the log. Note the pattern must match line start (^) symbol and compiled with multiline (re.M) flag.

  • parser (callable) – The parser to extract fields from the re.Match object. It should takes re.Match as input and returns livy.logreader.LivyLogParseResult instance.

Returns

Return type

No return. It raises exception on any error.

Note

The pattern must wrapped entire log

The re.Match object is also used to locate the position in the log. It might cause error if the regex pattern does not match entire log lines.

Do not directly emit log from the parser

The fetched log might have overlap with previous action, this reader does cached the processed result and prevent duplicated logs emitted.

read()

Read log once.

Returns

Return type

No data return. All logs would be pipe to Python’s logging.

Note

Livy does not split logs into different object or something. What we could be reterived from server is a list of string that mixed with outputs in stdout and stderr.

This function is degned to match through each of known format via regex, and fallback to stdout/stderr if we could not parse it.

Parsers are pluggable. Beyond the builtin parsers, read instruction from docstring of add_parser().

read_until_finish(block=True, interval=0.4)

Keep monitoring and read logs until the task is finished.

Parameters
  • block (bool) – Block the current thread or not. Would fire a backend thread if True.

  • interval (float) – Interval seconds to query the log.

Returns

Return type

No data return. All logs would be pipe to Python’s logging.

See also

read()

stop_read()

Stop background which is created by read_until_finish(). Only takes effect after it is created.

class livy.logreader.LivyLogParseResult

Log parse result.

static __new__(_cls, created, level, name, message)

Create new instance of LivyLogParseResult(created, level, name, message)

Parameters
created: datetime.datetime

Timestamp that this log is created. Could be None if we could not determine when does it created. The system would fill with the time which last log is created.

level: int

Log level.

message: str

Log message.

name: str

Logger name. Could be None if we do not know, would be fallback to corresponding section name in livy’s log (stdout, stderr or YARN Diagnostics).

Exceptions

exception livy.exception.Error

Base exception type for python-livy package

exception livy.exception.OperationError

User mis-use the library.

exception livy.exception.RequestError

Error during data transportation

__init__(code, reason, error=None)
Parameters
  • code (int) –

  • reason (str) –

Return type

None

exception livy.exception.TypeError

Wrapped type error type, for easier printing more information

__init__(name, expect, got)
Parameters
Return type

None