API reference

Public surface of pylmrob.

Top-level functions

pylmrob.lmrob(formula, data, control=None, weights=None, na_action='drop', seed=None, **kwargs)[source]

Fit a robust MM linear regression.

Parameters:

formula (str) – R-style formula, e.g. "y ~ x1 + x2 + x3". Parsed by formulaic.
data (DataFrame) – DataFrame containing the columns referenced by formula.
control (Control | None) – Algorithm parameters; defaults to Control() (KS2014 preset, engine_c=True).
weights (ndarray | None) – Optional non-negative per-case weights (length len(data)). Implemented via the sqrt(w)-transform that R’s lmrob uses: the transformed design (sqrt(w)*X, sqrt(w)*y) goes through the unweighted fit. Zero-weight rows are dropped. Compatible with both the default Cython engine and the legacy NumPy path (the transform is applied before any path dispatch, so the Cython kernel never needs to know about weights itself).
na_action (str) – "drop" (default) drops rows with any NA before fitting.
seed (int | Generator | None) – Seed for the resampling RNG.
kwargs (Any)

Return type:

LmRobResults

pylmrob.anova(*fits, test='Wald')[source]

Robust nested-model anova on a sequence of LmRobResults.

The first argument is the largest (full) model; subsequent arguments are progressively reduced models. Each adjacent pair must be strictly nested via term names.

Parameters:

fits (LmRobResults) – Two or more LmRobResults ordered from largest to smallest.
test (str) – "Wald" (default) or "Deviance". The Deviance test refits the reduced model via M-iteration at the full model’s scale; it requires the full model’s method to end with "M".

Return type:

AnovaTable

Estimator class

class pylmrob.LmRob(control=None)[source]

Bases: _LmRobBaseSklearn

scikit-learn-style estimator wrapper around lmrob().

Inherits BaseEstimator + RegressorMixin when scikit-learn is installed. Drops to a bare class otherwise so non-sklearn callers don’t pay the import cost.

Parameters:: control (Control | None)

predict(X)[source]

Predict on a new design matrix (raw, without intercept column).

LmRob always fits with an intercept, so we wrap X in a DataFrame with the same column names used at fit time and let the stored formula spec re-add the intercept.

Return type:: ndarray
Parameters:: X (ndarray)

score(X, y)[source]

Standard R^2 on a test set, sklearn convention.

1 - SS_res / SS_tot where SS_res = sum((y - y_hat)^2). This is the OLS coefficient of determination, not the robust R^2 reported by summary() (use self.result_.summary().r_squared for that). Returning OLS R^2 keeps LmRob compatible with sklearn utilities (cross_val_score, GridSearchCV) that assume the regressor scorer contract.

Return type:

float

Parameters:

X (ndarray)
y (ndarray)

Result objects

class pylmrob.results.LmRobResults(coef_, scale_, weights_, rweights_, residuals_, fitted_, cov_, df_residual_, converged_, n_iter_, nobs_, term_names_, control, init_=<factory>, rhs_spec_=None, design_x_=None, design_y_=None)[source]

Bases: object

Output of an lmrob fit.

Attributes mirror R’s lmrob object where practical.

Parameters:

coef_ (np.ndarray)
scale_ (float)
weights_ (np.ndarray)
rweights_ (np.ndarray)
residuals_ (np.ndarray)
fitted_ (np.ndarray)
cov_ (np.ndarray)
df_residual_ (int)
converged_ (bool)
n_iter_ (int)
nobs_ (int)
term_names_ (list[str])
control (Control)
init_ (dict[str, object])
rhs_spec_ (object | None)
design_x_ (np.ndarray | None)
design_y_ (np.ndarray | None)

confint(level=0.95, method='wald', *, n_boot=1000, seed=None, n_workers=1, kind='percentile')[source]

Confidence intervals for the regression coefficients.

Two methods:

"wald" (default): asymptotic normal CIs from the sandwich covariance. z * se where z is the standard-normal quantile at (1 + level) / 2.
"bootstrap": case-resampling bootstrap; runs bootstrap() internally and returns the requested kind ("percentile" or "basic") CIs.

Parameters:

level (float) – Coverage level, e.g. 0.95.
method (str) – "wald" (default) or "bootstrap".
n_boot (int) – Forwarded to bootstrap() when method="bootstrap".
seed (int | None) – Forwarded to bootstrap() when method="bootstrap".
n_workers (int) – Forwarded to bootstrap() when method="bootstrap".
kind (str) – Bootstrap CI kind: "percentile" or "basic". Ignored for method="wald".

Return type:

ndarray

conf_int(alpha=0.05)[source]

statsmodels spelling of confint(). Uses 1 - alpha.

Return type:: ndarray
Parameters:: alpha (float)

predict(new_data, *, interval='none', level=0.95)[source]

Predict on new data, optionally with confidence/prediction bands.

Accepts either:

a pandas DataFrame with the columns referenced by the original formula. The fit’s stored formulaic ModelSpec re-applies any factor encoding, interactions, I(x**2) transforms, etc.
a 2-D NumPy array already shaped (n, p), matching the original design (intercept column included if the formula had one).

Parameters:

interval (str) – "none" (default) returns the point predictions, shape (n,). "confidence" returns (n, 3) columns (fit, lwr, upr) with the confidence interval for the mean response at each new observation (Var = X^T cov X). "prediction" returns (n, 3) with the prediction interval for a single new observation (Var = sigma^2 + X^T cov X).
level (float) – Confidence level for the interval. Default 0.95.
of (Bands use the t-distribution with df_residual_ degrees)
freedom
convention. (mirroring R's predict.lm / predict.lmrob)
new_data (object)

Return type:

ndarray

predict_std(new_data, *, kind='confidence')[source]

Standard deviation of the prediction at each new observation.

Returns sqrt(Var(X^T beta_hat)) (kind="confidence", default) or sqrt(sigma^2 + Var(X^T beta_hat)) (kind="prediction"). Use this when you want the raw SE and intend to build your own intervals (e.g. with a non-Gaussian distribution or for a Bayesian update); predict() with interval="confidence" already returns t-quantile-scaled bands.

Return type:

ndarray

Parameters:

new_data (object)
kind (str)

diagnostics(outlier_threshold=2.5)[source]

Per-observation diagnostic statistics.

Returns a pylmrob.diagnostics.DiagnosticsTable with leverage, robust Cook’s distance, standardized residuals, the robust weights, and a boolean outlier flag (|std_residuals| > outlier_threshold).

Requires the fit to have a stashed design matrix (design_x_); the default lmrob() call always stashes it.

Return type:: object
Parameters:: outlier_threshold (float)

bootstrap(n_boot=1000, level=0.95, seed=None, n_workers=1)[source]

Method-style spelling of pylmrob.bootstrap().

Equivalent to pylmrob.bootstrap(self, n_boot=..., ...); matches the fit.anova() / fit.diagnostics() style.

Return type:

Any

Parameters:

n_boot (int)
level (float)
seed (int | Generator | None)
n_workers (int)

anova(*others, test='Wald')[source]

Method-style spelling of pylmrob.anova().

Equivalent to pylmrob.anova(self, *others, test=test); lets you write full.anova(reduced) instead of the free-function form, matching R’s idiom.

Return type:

object

Parameters:

others (LmRobResults)
test (str)

summary(style='r', detail='brief')[source]

Return a SummaryLmRob matching R’s summary.lmrob.

Parameters:

style (str) – "r" (default): R-style summary.lmrob output, matching robustbase line-for-line where practical. "statsmodels": a fixed-width table matching the statsmodels.iolib.summary.Summary layout. Use this when piping pylmrob fits into statsmodels-shaped reporting code.
detail (str) – "brief" (default) emits the standard summary. "full" appends a footer with init method, init scale, MM iter count, and engine settings (engine_c, rng). Use this when debugging convergence or unexpected results.
via (The returned object stringifies to the chosen style)
overrides (str() or print(); calling its render method)
object. (the stored choice without rebuilding the)

Return type:

SummaryLmRob

class pylmrob.summary.SummaryLmRob(coefficients, term_names, scale, r_squared, adj_r_squared, df_residual, nobs, residuals, rweights, cov, converged, n_iter, control, has_intercept, style='r', detail='brief', init_info=None, engine_c=None, rng=None)[source]

Bases: object

summary.lmrob analogue. Stringifies to an R-style printout.

Parameters:

coefficients (np.ndarray)
term_names (list[str])
scale (float)
r_squared (float)
adj_r_squared (float)
df_residual (int)
nobs (int)
residuals (np.ndarray)
rweights (np.ndarray)
cov (np.ndarray)
converged (bool)
n_iter (int)
control (Control)
has_intercept (bool)
style (str)
detail (str)
init_info (dict[str, object] | None)
engine_c (bool | None)
rng (str | None)

render(*, style='r', detail='brief')[source]

Render the summary table.

Parameters:

style (str) – "r" (default): R-style summary.lmrob output. "statsmodels": a fixed-width table matching the statsmodels.iolib.summary.Summary layout for users who pipe pylmrob fits into statsmodels-shaped reporting code.
detail (str) – "brief" (default) emits the standard summary. "full" appends a footer with the init method, init scale, MM iter count, and engine settings (engine_c, rng). Use this when debugging convergence.

Return type:

str

class pylmrob.anova.AnovaTable(table, term_lists, test, method)[source]

Bases: object

Robust Wald-test table comparing nested lmrob fits.

Columns mirror R’s anova.lmrob output: pseudoDf, Test.Stat, Df, Pr(>chisq).

Parameters:

table (ndarray)
term_lists (list[list[str]])
test (str)
method (str)

Control parameters

class pylmrob.Control(setting=None, psi=None, tuning_chi=None, tuning_psi=None, init='S', method=None, nResample=500, max_it=50, k_max=200, refine_tol=1e-07, rel_tol=1e-07, solve_tol=1e-07, scale_tol=1e-10, zero_tol=1e-10, best_r_s=2, k_fast_s=1, k_m_s=20, mts=1000, subsampling='nonsingular', cov=None, eps_outlier=None, eps_x=None, seed=None, trace_lev=0, n_workers=1, fast_rng=False, rng='PCG64', engine_c=True, bb=0.5, extra=<factory>)[source]

Bases: object

Parameters controlling an lmrob fit.

Defaults follow R’s lmrob.control(setting="KS2014").

Parameters:

setting (Literal['KS2011', 'KS2014', 'MM'] | None)
psi (Literal['bisquare', 'huber', 'hampel', 'optimal', 'ggw', 'lqq', 'welsh'] | None)
tuning_chi (float | tuple[float, ...] | None)
tuning_psi (float | tuple[float, ...] | None)
init (Literal['auto', 'S', 'M-S', 'L1'])
method (str | None)
nResample (int)
max_it (int)
k_max (int)
refine_tol (float)
rel_tol (float)
solve_tol (float)
scale_tol (float)
zero_tol (float)
best_r_s (int)
k_fast_s (int)
k_m_s (int)
mts (int)
subsampling (Literal['nonsingular', 'simple'])
cov (str | None)
eps_outlier (float | None)
eps_x (float | None)
seed (int | None)
trace_lev (int)
n_workers (int)
fast_rng (bool)
rng (Literal['PCG64', 'MT19937', 'R'])
engine_c (bool)
bb (float)
extra (dict[str, object])

get_params(deep=True)[source]

Return Control’s public fields as a sklearn-style dict.

Return type:: dict[str, object]
Parameters:: deep (bool)

set_params(**params)[source]

Set Control’s public fields in place, sklearn convention.

Return type:: Control
Parameters:: params (object)

classmethod preset(setting, **overrides)[source]

Build a Control for a named preset.

Settings:

"KS2014": psi=”bisquare” (matches robustbase 0.99-7 default).
"KS2011": same families with KS2011-specific cov estimator.
"MM": legacy MM defaults (psi=”bisquare”).

Return type:

Control

Parameters:

setting (Literal['KS2011', 'KS2014', 'MM'])
overrides (object)

Psi family kernels

Public interface to psi/chi/weight functions.

The numerical kernels live in pylmrob._psifuns (NumPy reference) and will be mirrored by pylmrob._core._psi (Cython) in a future performance pass.

R cross-reference:

psi.psi(x, family, k) <-> robustbase::Mpsi(x, k, family)
psi.rho(x, family, k) <-> robustbase::Mchi(x, k, family) (for the chi-shaped families used in lmrob; identical here)
psi.psi_prime(x, family, k) <-> robustbase::Mpsi(x, k, family, deriv=1)
psi.wgt(x, family, k) <-> robustbase::Mwgt(x, k, family)

pylmrob.psi.tuning_for_breakdown(family, breakdown=0.5)[source]

Return tuning constants giving the requested breakdown (chi side).

Return type:

tuple[float, ...]

Parameters:

family (str)
breakdown (float)

pylmrob.psi.tuning_for_efficiency(family, efficiency=0.95)[source]

Return tuning constants giving the requested asymptotic efficiency.

For Phase 2 we only support the canonical 95%-efficiency tuning that R uses by default. Other efficiency targets will land in Phase 8 alongside the full Control machinery.

Return type:

tuple[float, ...]

Parameters:

family (str)
efficiency (float)