Core library¶

Core library is a set of wrapped API call and message handling classes. All of these classes could be used without extra dependency.

livy.LivyClient¶

class livy.client.LivyClient¶

Client that wraps requests to Livy server

This implementation follows Livy API v0.7.0 spec.

__init__(url, verify=True, timeout=30.0)¶

Parameters

url (str) – URL to the livy server
verify (Union[bool, ssl.SSLContext]) – Verifies SSL certificates or not; or use customized SSL context
timeout (float) – Timeout seconds for the connection.

Raises

TypeError – On a invalid data type is used for inputted argument
OperationError – On URL scheme is not supportted

Return type

None

Note

This package is designed to use lesser third-party libraries, thus it does not use high-level interface to handling the requests. And it results with limitations that we could not use such rich features as requests.

check(capture=True)¶

Check if server is up.

Parameters: capture (bool) – Capture the exception and returns boolean. Set to False for raise livy.exception.RequestError on error.
Return type: bool

create_batch(file, proxy_user=None, class_name=None, args=None, jars=None, py_files=None, files=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, archives=None, queue=None, name=None, conf=None)¶

Request to create a batch.

Parameters

file (str) – File containing the application to execute
proxy_user (str) – User to impersonate when running the job
class_name (str) – Application Java/Spark main class
args (List[str]) – Command line arguments for the application
jars (List[str]) – Java dependencies to be used in this batch
py_files (List[str]) – Python dependencies to be used in this batch
files (List[str]) – files to be used in this batch
driver_memory (str) – Amount of memory to use for the driver process
driver_cores (int) – Number of cores to use for the driver process
executor_memory (str) – Amount of memory to use per executor process
executor_cores (int) – Number of cores to use for each executor
num_executors (int) – Number of executors to launch for this batch
archives (List[str]) – Archives to be used in this batch
queue (str) – The name of the YARN queue to which submitted
name (str) – The session name to execute this batch
conf (Dict[str, str]) – Spark configuration properties

Returns

batch – Created batch object from livy server

Return type

dict

Raises

TypeError – On input parameters not matches expected data type
RequestError – On connection error

delete_batch(batch_id)¶

Kill the batch job.

Parameters

batch_id (int) – Batch ID

Raises

TypeError – On input parameters not matches expected data type
RequestError – On connection error

Return type

None

get_batch_information(batch_id)¶

Get summary information to specific batch.

Parameters

batch_id (int) – Batch ID

Returns

batch – Batch information form livy server

Return type

dict

Raises

TypeError – On input parameters not matches expected data type
RequestError – On connection error

get_batch_log(batch_id, from_=None, size=None)¶

Get logs from the batch.

Parameters

batch_id (int) – Batch ID
from (int) – Offset
size (int) – Max line numbers to return
from_ (Optional[int]) –

Returns

logs – Log lines

Return type

List[str]

Raises

TypeError – On input parameters not matches expected data type
RequestError – On connection error

get_batch_state(batch_id)¶

Get state of the batch.

Parameters

batch_id (int) – Batch ID

Returns

state – Current state; Literally starting, running, dead, killed, or success.

Return type

str

Raises

TypeError – On input parameters not matches expected data type
RequestError – On connection error

Note

There is no complete batch state list in official document.

The given state list is based on my observation in real practice (perhaps we could check it’s source code as confirmation?). Since both state from session list and statement list could be observed, the complete possible list could be: not_started, starting, available, idle, waiting, busy, running, shutting_down, error, dead, killed, success, cancelling and cancelled.

is_batch_ended(batch_id)¶

Check batch state and return True if it is finished.

Parameters

batch_id (int) – Batch ID

Returns

finished – Task is over

Return type

bool

Raises

TypeError – On input parameters not matches expected data type
RequestError – On connection error

livy.LivyBatchLogReader¶

class livy.logreader.LivyBatchLogReader¶

Read Livy batch logs and publish to Python’s logging infrastructure.

__init__(client, batch_id, timezone=datetime.timezone.utc, prefix=None)¶

Parameters

client (livy.client.LivyClient) – Livy client that is pre-configured
batch_id (int) – Batch ID to be watched
timezone (datetime.tzinfo) – Server time zone
prefix (str) – Prefix to be added to logger name

Raises

TypeError – On a invalid data type is used for inputted argument

Return type

None

add_parsers(pattern, parser)¶

Add log parser to this reader.

Parameters

pattern (re.Pattern) – Regex pattern to match the log. Note the pattern must match line start (^) symbol and compiled with multiline (re.M) flag.
parser (callable) – The parser to extract fields from the re.Match object. It should takes re.Match as input and returns livy.logreader.LivyLogParseResult instance.

Returns

Return type

No return. It raises exception on any error.

Note

The pattern must wrapped entire log: The re.Match object is also used to locate the position in the log. It might cause error if the regex pattern does not match entire log lines.
Do not directly emit log from the parser: The fetched log might have overlap with previous action, this reader does cached the processed result and prevent duplicated logs emitted.

read()¶

Read log once.

Returns
Return type: No data return. All logs would be pipe to Python’s logging.

Note

Livy does not split logs into different object or something. What we could be reterived from server is a list of string that mixed with outputs in stdout and stderr.

This function is degned to match through each of known format via regex, and fallback to stdout/stderr if we could not parse it.

Parsers are pluggable. Beyond the builtin parsers, read instruction from docstring of add_parser().

read_until_finish(block=True, interval=0.4)¶

Keep monitoring and read logs until the task is finished.

Parameters

block (bool) – Block the current thread or not. Would fire a backend thread if True.
interval (float) – Interval seconds to query the log.

Returns

Return type

No data return. All logs would be pipe to Python’s logging.

Exceptions¶

exception livy.exception.Error¶: Base exception type for python-livy package

exception livy.exception.OperationError¶: User mis-use the library.

exception livy.exception.RequestError¶

Error during data transportation

__init__(code, reason, error=None)¶

Parameters

code (int) –
reason (str) –

Return type

None

exception livy.exception.TypeError¶

Wrapped type error type, for easier printing more information

__init__(name, expect, got)¶

Parameters

name (str) –
expect (Union[str, type]) –
got (Union[str, type]) –

Return type

None

API Reference Utility