API reference
Public surface of pylmrob.
Top-level functions
- pylmrob.lmrob(formula, data, control=None, weights=None, na_action='drop', seed=None, **kwargs)[source]
Fit a robust MM linear regression.
- Parameters:
formula (
str) – R-style formula, e.g."y ~ x1 + x2 + x3". Parsed byformulaic.data (
DataFrame) – DataFrame containing the columns referenced byformula.control (
Control|None) – Algorithm parameters; defaults toControl()(KS2014 preset,engine_c=True).weights (
ndarray|None) – Optional non-negative per-case weights (lengthlen(data)). Implemented via thesqrt(w)-transform that R’s lmrob uses: the transformed design(sqrt(w)*X, sqrt(w)*y)goes through the unweighted fit. Zero-weight rows are dropped. Compatible with both the default Cython engine and the legacy NumPy path (the transform is applied before any path dispatch, so the Cython kernel never needs to know about weights itself).na_action (
str) –"drop"(default) drops rows with any NA before fitting.seed (
int|Generator|None) – Seed for the resampling RNG.kwargs (Any)
- Return type:
- pylmrob.anova(*fits, test='Wald')[source]
Robust nested-model anova on a sequence of
LmRobResults.The first argument is the largest (full) model; subsequent arguments are progressively reduced models. Each adjacent pair must be strictly nested via term names.
- Parameters:
fits (
LmRobResults) – Two or moreLmRobResultsordered from largest to smallest.test (
str) –"Wald"(default) or"Deviance". The Deviance test refits the reduced model via M-iteration at the full model’s scale; it requires the full model’s method to end with"M".
- Return type:
Estimator class
- class pylmrob.LmRob(control=None)[source]
Bases:
_LmRobBaseSklearnscikit-learn-style estimator wrapper around
lmrob().Inherits
BaseEstimator+RegressorMixinwhen scikit-learn is installed. Drops to a bare class otherwise so non-sklearn callers don’t pay the import cost.- Parameters:
control (Control | None)
- predict(X)[source]
Predict on a new design matrix (raw, without intercept column).
LmRobalways fits with an intercept, so we wrapXin a DataFrame with the same column names used at fit time and let the stored formula spec re-add the intercept.
- score(X, y)[source]
Standard
R^2on a test set, sklearn convention.1 - SS_res / SS_totwhere SS_res = sum((y - y_hat)^2). This is the OLS coefficient of determination, not the robust R^2 reported bysummary()(useself.result_.summary().r_squaredfor that). Returning OLS R^2 keepsLmRobcompatible with sklearn utilities (cross_val_score,GridSearchCV) that assume the regressor scorer contract.
Result objects
- class pylmrob.results.LmRobResults(coef_, scale_, weights_, rweights_, residuals_, fitted_, cov_, df_residual_, converged_, n_iter_, nobs_, term_names_, control, init_=<factory>, rhs_spec_=None, design_x_=None, design_y_=None)[source]
Bases:
objectOutput of an
lmrobfit.Attributes mirror R’s
lmrobobject where practical.- Parameters:
coef_ (np.ndarray)
scale_ (float)
weights_ (np.ndarray)
rweights_ (np.ndarray)
residuals_ (np.ndarray)
fitted_ (np.ndarray)
cov_ (np.ndarray)
df_residual_ (int)
converged_ (bool)
n_iter_ (int)
nobs_ (int)
control (Control)
rhs_spec_ (object | None)
design_x_ (np.ndarray | None)
design_y_ (np.ndarray | None)
- confint(level=0.95, method='wald', *, n_boot=1000, seed=None, n_workers=1, kind='percentile')[source]
Confidence intervals for the regression coefficients.
Two methods:
"wald"(default): asymptotic normal CIs from the sandwich covariance.z * sewherezis the standard-normal quantile at(1 + level) / 2."bootstrap": case-resampling bootstrap; runsbootstrap()internally and returns the requestedkind("percentile"or"basic") CIs.
- Parameters:
level (
float) – Coverage level, e.g.0.95.method (
str) –"wald"(default) or"bootstrap".n_boot (
int) – Forwarded tobootstrap()whenmethod="bootstrap".seed (
int|None) – Forwarded tobootstrap()whenmethod="bootstrap".n_workers (
int) – Forwarded tobootstrap()whenmethod="bootstrap".kind (
str) – Bootstrap CI kind:"percentile"or"basic". Ignored formethod="wald".
- Return type:
- predict(new_data, *, interval='none', level=0.95)[source]
Predict on new data, optionally with confidence/prediction bands.
Accepts either:
a pandas
DataFramewith the columns referenced by the original formula. The fit’s stored formulaicModelSpecre-applies any factor encoding, interactions,I(x**2)transforms, etc.a 2-D NumPy array already shaped
(n, p), matching the original design (intercept column included if the formula had one).
- Parameters:
interval (
str) –"none"(default) returns the point predictions, shape(n,)."confidence"returns(n, 3)columns(fit, lwr, upr)with the confidence interval for the mean response at each new observation (Var = X^T cov X)."prediction"returns(n, 3)with the prediction interval for a single new observation (Var = sigma^2 + X^T cov X).level (
float) – Confidence level for the interval. Default 0.95.of (Bands use the t-distribution with df_residual_ degrees)
freedom
convention. (mirroring R's predict.lm / predict.lmrob)
new_data (object)
- Return type:
- predict_std(new_data, *, kind='confidence')[source]
Standard deviation of the prediction at each new observation.
Returns
sqrt(Var(X^T beta_hat))(kind="confidence", default) orsqrt(sigma^2 + Var(X^T beta_hat))(kind="prediction"). Use this when you want the raw SE and intend to build your own intervals (e.g. with a non-Gaussian distribution or for a Bayesian update);predict()withinterval="confidence"already returns t-quantile-scaled bands.
- diagnostics(outlier_threshold=2.5)[source]
Per-observation diagnostic statistics.
Returns a
pylmrob.diagnostics.DiagnosticsTablewith leverage, robust Cook’s distance, standardized residuals, the robust weights, and a boolean outlier flag (|std_residuals| > outlier_threshold).Requires the fit to have a stashed design matrix (
design_x_); the defaultlmrob()call always stashes it.
- bootstrap(n_boot=1000, level=0.95, seed=None, n_workers=1)[source]
Method-style spelling of
pylmrob.bootstrap().Equivalent to
pylmrob.bootstrap(self, n_boot=..., ...); matches thefit.anova()/fit.diagnostics()style.
- anova(*others, test='Wald')[source]
Method-style spelling of
pylmrob.anova().Equivalent to
pylmrob.anova(self, *others, test=test); lets you writefull.anova(reduced)instead of the free-function form, matching R’s idiom.- Return type:
- Parameters:
others (LmRobResults)
test (str)
- summary(style='r', detail='brief')[source]
Return a
SummaryLmRobmatching R’ssummary.lmrob.- Parameters:
style (
str) –"r"(default): R-stylesummary.lmroboutput, matchingrobustbaseline-for-line where practical."statsmodels": a fixed-width table matching thestatsmodels.iolib.summary.Summarylayout. Use this when piping pylmrob fits into statsmodels-shaped reporting code.detail (
str) –"brief"(default) emits the standard summary."full"appends a footer with init method, init scale, MM iter count, and engine settings (engine_c,rng). Use this when debugging convergence or unexpected results.via (The returned object stringifies to the chosen style)
overrides (str() or print(); calling its render method)
object. (the stored choice without rebuilding the)
- Return type:
- class pylmrob.summary.SummaryLmRob(coefficients, term_names, scale, r_squared, adj_r_squared, df_residual, nobs, residuals, rweights, cov, converged, n_iter, control, has_intercept, style='r', detail='brief', init_info=None, engine_c=None, rng=None)[source]
Bases:
objectsummary.lmrobanalogue. Stringifies to an R-style printout.- Parameters:
coefficients (np.ndarray)
scale (float)
r_squared (float)
adj_r_squared (float)
df_residual (int)
nobs (int)
residuals (np.ndarray)
rweights (np.ndarray)
cov (np.ndarray)
converged (bool)
n_iter (int)
control (Control)
has_intercept (bool)
style (str)
detail (str)
engine_c (bool | None)
rng (str | None)
- render(*, style='r', detail='brief')[source]
Render the summary table.
- Parameters:
style (
str) –"r"(default): R-stylesummary.lmroboutput."statsmodels": a fixed-width table matching thestatsmodels.iolib.summary.Summarylayout for users who pipe pylmrob fits into statsmodels-shaped reporting code.detail (
str) –"brief"(default) emits the standard summary."full"appends a footer with the init method, init scale, MM iter count, and engine settings (engine_c, rng). Use this when debugging convergence.
- Return type:
Control parameters
- class pylmrob.Control(setting=None, psi=None, tuning_chi=None, tuning_psi=None, init='S', method=None, nResample=500, max_it=50, k_max=200, refine_tol=1e-07, rel_tol=1e-07, solve_tol=1e-07, scale_tol=1e-10, zero_tol=1e-10, best_r_s=2, k_fast_s=1, k_m_s=20, mts=1000, subsampling='nonsingular', cov=None, eps_outlier=None, eps_x=None, seed=None, trace_lev=0, n_workers=1, fast_rng=False, rng='PCG64', engine_c=True, bb=0.5, extra=<factory>)[source]
Bases:
objectParameters controlling an
lmrobfit.Defaults follow R’s
lmrob.control(setting="KS2014").- Parameters:
setting (Literal['KS2011', 'KS2014', 'MM'] | None)
psi (Literal['bisquare', 'huber', 'hampel', 'optimal', 'ggw', 'lqq', 'welsh'] | None)
init (Literal['auto', 'S', 'M-S', 'L1'])
method (str | None)
nResample (int)
max_it (int)
k_max (int)
refine_tol (float)
rel_tol (float)
solve_tol (float)
scale_tol (float)
zero_tol (float)
best_r_s (int)
k_fast_s (int)
k_m_s (int)
mts (int)
subsampling (Literal['nonsingular', 'simple'])
cov (str | None)
eps_outlier (float | None)
eps_x (float | None)
seed (int | None)
trace_lev (int)
n_workers (int)
fast_rng (bool)
rng (Literal['PCG64', 'MT19937', 'R'])
engine_c (bool)
bb (float)
Psi family kernels
Public interface to psi/chi/weight functions.
The numerical kernels live in pylmrob._psifuns (NumPy reference)
and will be mirrored by pylmrob._core._psi (Cython) in a future
performance pass.
R cross-reference:
psi.psi(x, family, k)<->robustbase::Mpsi(x, k, family)psi.rho(x, family, k)<->robustbase::Mchi(x, k, family)(for the chi-shaped families used in lmrob; identical here)psi.psi_prime(x, family, k)<->robustbase::Mpsi(x, k, family, deriv=1)psi.wgt(x, family, k)<->robustbase::Mwgt(x, k, family)
- pylmrob.psi.tuning_for_breakdown(family, breakdown=0.5)[source]
Return tuning constants giving the requested breakdown (chi side).
- pylmrob.psi.tuning_for_efficiency(family, efficiency=0.95)[source]
Return tuning constants giving the requested asymptotic efficiency.
For Phase 2 we only support the canonical 95%-efficiency tuning that R uses by default. Other efficiency targets will land in Phase 8 alongside the full
Controlmachinery.