A Note on Lending Club and Institutional Investment

Allen Sirolly / September 7, 2017

This post is related to ongoing research with Tzuo Hann Law (Boston College).

One of technology’s roles in modern finance is disrupting intermediation. In the case of household and small business credit, online marketplace lending platforms compete with traditional intermediaries such as banks or online money lenders, operating as streamlined crowdfunding platforms for lending. Investors can purchase debt instruments which represent pieces of an underlying loan and earn returns in the form of interest paid by borrowers, while the platforms earn revenue through various fees.

Lending Club—the largest platform in this “Lendscape”—publishes a rich data set of loan originations, complete with borrowers’ credit profiles, updated loan status, and recent payment information. It is publicly available from Lending Club Statistics and has been the subject of detailed analyses in blog posts, academic papers, and elsewhere. It has even been featured on Kaggle.

A less-advertised fact is that Lending Club also publishes a granular view of the month-by-month payment histories of individual loans, available from Additional Statistics (registered members only). This post leverages the payments data to show how portfolio selection differs for large institutional investors1 relative to small-money investors.

Critically, we obtained additional data from the site that can be used to identify whether each loan was funded through the “whole” or “fractional” program. This allows us to compare aggregate portfolio selection and performance between two classes of investors, institutional (corresponding to whole loan purchasers) and retail (fractional or note purchasers). In particular, there is strong evidence that institutional investors prefer to fund loans whose borrowers will prepay long before the maturity date, which can potentially be explained by structural features of the marketplace.2

The Data

The downloadable files are large—almost 6 GB each—but contain redundancies, as constant loan attributes are recorded for every (loan, payment) row. These “fixed” attributes can be spliced off and saved to a separate file, reducing the total file size to under 4 GB—small enough to fit in memory.3

library(data.table) # using v1.10.5 (dev)

ph_fixed = fread('paymentHistory_fixed.csv', key='loan_id')
ph_master = fread('paymentHistory.csv', key=c('loan_id','mob'))

As the file names suggest, paymentHistory_fixed.csv contains the fixed attributes for each loan id, and paymentHistory.csv contains the breakdown of borrowers’ month-by-month payments. The following variables are available:

In ph_fixed:

In ph_master:

The variable initial_list_status is available in the public data and identifies whether a loan was initially listed in the whole (W) or fractional (F) market. Loans listed “whole” become available for fractional funding (and vice versa) if there are no buyers within a certain time frame. To determine whether a loan was actually funded by a whole loan investor, we use this in conjunction with num_investors which is scraped from individual “loan detail” pages (example screencap below).4 The uptake of whole loans varies significantly over time and across loan grades but is typically 70-80 percent overall.

Sample “loanDetail.action” page which includes field showing number of investors

The following code chunk merges in these additional variables and transforms the data:

additional = fread('additional.csv')
setnames(additional, 'initial_list_status', 'ils')
ph_fixed = additional[ph_fixed, on='loan_id']

# Month of issuance
ph_fixed[, `:=`(issueMon = zoo::as.yearmon(issueddate, '%b%Y'), issueddate = NULL)]

# Number of payment periods elapsed since issuance
ph_fixed[, numPay:= 12*(zoo::as.yearmon('Aug 2017') - issueMon)]

# Correct discrepancies in num_investors (Found evidence that whole loans
# no longer pushed into fractional market after buffer period, starting
# Jun 2016. So every W loan must have 1 investor.)
ph_fixed[ils=='W' & issueMon >= 'Jun 2016', num_investors:= 1]

# Pool grades F and G (small sample size)
ph_fixed[grade %in% c('F','G'), grade:= 'FG']

# Identify investor types using initial_list_status & num_investors
ph_fixed[ils=='W' & num_investors==1, investor_type:= 'W']
ph_fixed[ils=='F' | num_investors>=2, investor_type:= 'F']

# Compute the fraction of funding via notes using sales reports data
ph_fixed[is.na(funded_amt_lc), funded_amt_lc:= 0]
ph_fixed[, funded_amt_inv:= funded_amt - funded_amt_lc]
ph_fixed[, note_wgt:= ifelse(is.na(sold_amt), 0, (sold_amt-funded_amt_lc)/funded_amt_inv)]


One important dimension along which aggregate portfolios might differ is the time to full repayment by borrowers, who must pay the installment at minimum but can prepay if desired. With prepayment, a loan may be fully paid well before the maturity date, and consequently investors will lose future interest payments.

Apropos of this idea, the following code chunk computes the fraction of loans, by grade and investor type, which reached a terminal state—either “fully paid” (fp) or “charged off” (co)—by month \(i\), for \(1 \leq i \leq T\). I restrict the sample to 3-year loans, so \(T = 36\). (There are also 5-year loans which I ignore for simplicity.) The filter numPay >= i ensures that only borrowers who could have possibly made \(i\) payments are counted in the denominator (i.e., excluding loans that were issued fewer than \(i\) months ago).

bigT = 36
w_id = ph_fixed[investor_type=='W' & term==bigT & issueMon >= 'Sep 2012',
                .(loan_id, numPay, grade)]
f_id = ph_fixed[investor_type=='F' & term==bigT & issueMon >= 'Sep 2012',
                .(loan_id, numPay, grade)]

# By grade g and months after issuance mob:
# w_fp, f_fp -- number of whole (fractional) loans fully paid
# w_co, f_co -- number of whole (fractional) loans charged off
terminal = CJ(grade=ph_fixed$grade, mob=1:bigT, unique=TRUE)
terminal[, c('w_fp','f_fp','w_co','f_co','nw','nf'):= NA_real_]

for (g in terminal[, unique(grade)]) {
  cat('Grade', g, '\n')
  for (i in 1:bigT) {
    cat(i); if (i<bigT) cat(',') else cat('\n')
    w_ids = w_id[numPay >=i & grade==g, loan_id]
    f_ids = f_id[numPay >=i & grade==g, loan_id]
    terminal[J(g,i), `:=`(
      w_fp = ph_master[J(w_ids)][mob <= i,
                                 sum(period_end_lstat=='Fully Paid')],
      f_fp = ph_master[J(f_ids)][mob <= i,
                                 sum(period_end_lstat=='Fully Paid')],
      w_co = ph_master[J(w_ids)][mob <= i,
                                 sum(period_end_lstat=='Charged Off')],
      f_co = ph_master[J(f_ids)][mob <= i,
                                 sum(period_end_lstat=='Charged Off')],
      nw = length(w_ids),
      nf = length(f_ids))]
  }; rm(i, w_ids, f_ids)
}; rm(g)

# Compute fractions from counts
terminal[, `:=`(
  w_fp = w_fp/nw, f_fp = f_fp/nf, w_co = w_co/nw, f_co = f_co/nf,
  all_fp = (w_fp + f_fp)/(nw + nf), all_co = (w_co + f_co)/(nw + nf))]

# Reshape wide --> long
terminal.long = melt(terminal, id=key(terminal), patterns('_fp','_co'),
                     value.name=c('Fully paid', 'Charged off'))

levels(terminal.long$variable) <- c('Whole','Fractional','All')
setkey(terminal.long, variable, grade, mob)

The plots below reveal two interesting empirical facts about aggregate portfolios. First, loans funded by institutional investors are paid off sooner than those funded by retail investors, even though the overall default rates (measured from the top) are slightly higher. This holds true within all loan grades, so it cannot be the result of differing selection with respect to grade.

ggplot(terminal.long[!'All'], aes(x=mob, y=100*`Fully paid`, color=variable)) +
  geom_line() +
  geom_point(data=function(d) d[mob==bigT]) +
  geom_line(aes(y=100*(1-`Charged off`)), lty='dashed') +
  geom_point(data=function(d) d[mob==bigT], aes(y=100*(1-`Charged off`))) +
  scale_color_manual('', values=c(Fractional='blue', Whole='red')) +
  scale_x_continuous(breaks=seq(0,bigT,6)) +
  scale_y_continuous(breaks=seq(0,100,20)) +
  facet_wrap('grade', ncol=2) +
  labs(x='Months since issuance', y='',
       title='Time to Terminal Status, 3-Year Loans, by Grade',
       subtitle='Percent fully paid (solid) or charged off (dashed)') +

The next code chunk below computes the fraction of whole loan offerings funded by institutional investors (“uptake”), conditional on vintage and the ex-post realization of months until full repayment. The top of the corresponding plot is dominated by bright red, suggesting that these investors are skilled at identifying loans that end up being repaid shortly after issuance. (“Skilled” in the sense of having high recall.) This helps to rule out the possibility that the entire selection of offerings is skewed toward such loans, which could be consistent with the first plot but not this one.

# data.table with 'fixed' attributes for fully paid loans,
# plus month of final payment (prepaid_mob)
ph_fixed_FP = ph_fixed[
  ph_master[period_end_lstat=='Fully Paid',
            .(loan_id, prepaid_mob=mob)][
              !duplicated(loan_id, fromLast=TRUE)],
  on='loan_id', nomatch=0]

W_uptake = ph_fixed_FP[ils=='W' & term==36 & prepaid_mob <= 36 &
                         vintage < '16Q3',
                       .(V1 = sum((num_investors==1)*funded_amt)/sum(funded_amt),
                         count = .N),
                       keyby=.(vintage, prepaid_mob)]

ggplot(W_uptake, aes(x=vintage, y=prepaid_mob)) +
    geom_tile(aes(fill=ifelse(count < 7, NA_real_, V1))) +
    scale_fill_gradient('', low='blue', high='red', na.value='transparent') +
    scale_y_reverse(breaks=seq(0,36,6)) +
    labs(x='Vintage', y='Months until fully paid',
         title='Institutional Investors Prefer Early Payment',
         subtitle='Fraction of whole loan offerings funded by 1 investor')

There are at least two explanations that are consistent with the above observations.

First is Lending Club’s policy governing servicing fees:5

Note investors generally pay us a servicing fee equal to 1% of payment amounts received from the borrower. Whole loan purchasers pay a monthly servicing fee of up to 1.3% per annum on the month-end principal balance of loans serviced.

When fees are a proportion of remaining principal, investors will prefer early payment—even if it means accepting higher default risk, as suggested by the first plot.6 (See this supplemental post for a mathematical explanation.) Thus servicing fees may act as a mechanism to influence portfolio selection for each of the two types of investors.

A second possible explanation concerns the availability of liquidity. On Lending Club, loans are 3-year and 5-year fixed income investments, and it may take that long to realize the full return from the initial capital. Retail investors have access to a Note Trading Platform via Folio Investing, which acts as a secondary market to provide liquidity. Institutional investors have no analogous avenues to trade whole loans, which means the only liquidity option is to fund loans whose borrowers will prepay. Like servicing fees, this can also be viewed as a structural feature of the market that might produce the observed patterns.


As large institutional investors have eclipsed7 small-money retail investors on Lending Club and other platforms, it has become increasingly important to understand how these two types interact through their investment choices. In one simplified view, they compete for high-quality loans that will yield the most attractive returns, prompting questions related to the fairness of marketplace design. This is an especially salient issue for Lending Club since whole loan investors are given priority in loan selection.

This post presented evidence that institutional and retail investors have different preferences over loan portfolios, and that these differences may be induced by features of Lending Club’s market design, namely servicing fees and access to liquidity. While the analysis is far from complete, it blunts the appeal of the pure competitive view, instead suggesting that platforms can deliver fair opportunities for attractive returns to both types of investors.

  1. Which include banks, pension funds, finance and insurance companies, endowments, asset managers, hedge funds, and others

  2. The pattern could also simply be due to differing risk preferences, but I think the structural/design explanations are more compelling. A more rigorous approach might exploit a policy change affecting market structure, but it’s unclear whether the relevant policies ever changed since the inception of the whole loans program in 2012.

  3. The script that does this is available here.

  4. The combination of initial_list_status and num_investors is only a proxy, since it’s possible for a single fractional investor to buy all of the notes for a loan that was initially listed whole. The Lending Club Sales Reports can be used to figure out the actual category of funding instrument—notes, whole loans, or trust certificates—which corresponds directly to investor type but is unreliable prior to September 2014. The analysis below uses the proxy but is robust to the Sales Reports alternative in the post-September 2014 frame.

  5. From page 45 of Form 10-K

  6. The loss of future interest payments due to prepayment is at least partially offset by the ability to reinvest in new loans.

  7. When the whole loan program started in 2012, roughly 20 percent of loan offerings were allocated to institutional investors. In 2016, the fraction was as high as 85 percent.