Reflections on Companion Type Systems

TLDR: When manipulating a sealed trait and its subclasses, it can be useful to define a dual type hierarchy on their companion objects. We can use reflection to test this “Companion Type System” and safely gather all of its members.

That was short, now keep this post so that you can read it later!

Example: Polymorphic Serialization in a Persistent Event Queue

In a previous post, we described how database updates had to be propagated through our system in order to be indexed with Lucene.

Now, in a similar way, we want to use a persistent event queue in order to propagate some updates from one storage system (e.g. database) to others (e.g. graph, feature extraction engine…). The queue needs to support several types of events:

sealed trait Event
case class AddUser(userId: Long) extends Event
case class AddFriendship(userId: Long, friendId: Long) extends Event
case class ForgetFriendship(userId: Long, friendId: Long) extends Event

Assuming we use Json to get our events in and out of the queue, we have to define a Json formatter for Event. For it to work in a polymorphic fashion, some information about the event subtype has to be serialized together with the actual instance. In this way, at deserialization, we can read that type information before picking up a deserializer that will instantiate the proper event subtype.

In short, we need two components for each subtype:
- a Json formatter able to translate a subclass instance to/from a serialized value.
- some sort of header that serves as a serializable identifier of the subtype.

Companion objects are a natural home for these helpers and here’s what our solution looks like:

import play.api.libs.json._
sealed trait EventCompanion[E <: Event] {
  def format: Format[E]
  def header: String
object EventCompanion {
  val all: Set[EventCompanion[_ <: Event]] = Set(AddUser, AddFriendship, ForgetFriendship)
  val byHeader: Map[String, EventCompanion[_ <: Event]] = {
    require( == all.size, "Type headers must be unique.") { companion => companion.header -> companion }.toMap
sealed trait Event { self =>
  type E >: self.type <: Event
  def companion: EventCompanion[E]
  def instance: E = self
object Event {
  val format: Format[Event] = new Format[Event] {
    def writes(event: Event): JsValue = Json.obj(
      "header" -> event.companion.header, 
      "value" -> event.companion.format.writes(event.instance)
    def reads(json: JsValue): JsResult[Event] = (json \ "header").validate[String].flatMap { header =>
      EventCompanion.byHeader(header).format.reads(json \ "value")
case class AddUser(userId: Long) extends Event {
  type E = AddUser
  def companion = AddUser
case object AddUser extends EventCompanion[AddUser] {
  val format = Json.format[AddUser]
  val header = "add_user"
case class AddFriendship(userId: Long, friendId: Long) extends Event {
  type E = AddFriendship
  def companion = AddFriendship
case object AddFriendship extends EventCompanion[AddFriendship] {
  val format = Json.format[AddFriendship]
  val header = "add_friendship"
case class ForgetFriendship(userId: Long, friendId: Long) extends Event {
  type E = ForgetFriendship
  def companion = ForgetFriendship
case object ForgetFriendship extends EventCompanion[ForgetFriendship] {
  val format = Json.format[ForgetFriendship]
  val header = "forget_friendship"

In order to write our polymorphic Event formatter, each event subtype’s companion object had to define the proper serialization helpers. Thus we ended up creating an EventCompanion trait that every companion inherits from.

Let’s take a step back and review this pattern.

Introducing Companion Type Systems

Scala’s OOP features make it possible for instances of different classes to share some common interface. Simply have each of these classes be a subtype of some common superclass (or trait) that defines the interface – classic stuff.

Now, it is sometimes desirable for the companion objects of these classes to share some common interface as well. We can achieve that in a similar fashion, having each companion object extend some common superclass. This set of typed companion objects is referred to as the “Companion Type System” of the class (yes, we have just made that up).

In short, the Companion Type System makes it possible to take advantage of polymorphism not only when manipulating dynamic instances (thanks to the main class hierarchy), but also when no instance is around (thanks to the static companion hierarchy).

This situation often arises in the context of serialization / deserialization of some class hierarchy. At FortyTwo, we rely on this pattern in several places: persistent event queues, storage and retrieval of vertex and edge data in our graph, conversion of typed database ids to global ids…

Companion Type Systems even use F-Bounded Types

So far, we have overlooked the use of an F-bounded type E in our example, both in the sealed trait Event and the companion trait EventCompanion. In short, an F-bounded type is used to refer to an unknown subclass in the context of some superclass. It provides us with finer reasoning assumptions and extra type-safety as the compiler can better check the consistency of the class API.

F-bounded types are also known as self-recursive types, though while this name makes sense for the main trait (Event), it does not for the companion trait (EventCompanion). Indeed in both cases, E refers to the subclasses of Event, thus we can only qualify it as “self-recursive” in Event. So we’ll stick with “F-bounded type” here, plus it sounds smart enough.

An F-bounded type can be introduced either as a type parameter (cf. EventCompanion) or as an abstract type member (cf. Event). We won’t discuss the advantages of one over the other in general. However, we found it hard to make the compiler happy when both F-bounded types are type parameters, mainly because of mutually dependent wildcards showing up here and there. In order to avoid this, it seems best to have the self-recursive type be a type member, like in our example. Besides, it is usually useful to make the F-bounded type of the companion trait a type parameter for it to be a part of the companions’ type signatures, as you may end up passing them around implicitly (again, like in our example).

Final Reflections

Finally, here is how the Companion Type System pattern lays out in our example:

Companion Type System.001.jpg.001

Now let’s say someone wants to define a new event AddHostility? There are two main risks:
- not setting up F-bounded type E properly in the AddHostility subclass or its companion object
- forgetting to explicitly register AddHostility in the set of all companion objects in EventCompanion

Thanks to the bounds on E and the companion accessor defined in the Event trait, the compiler should actually catch most issues related to the first point. Regarding the second point, we would like to use reflection in order to gather all companion objects in EventCompanion and rewrite it as follows:

object EventCompanion {
  val all: Set[EventCompanion[_ <: Event] = CompanionTypeSystem[Event, EventCompanion[_ <: Event]]
  def byHeader(header: String): Map[String, EventCompanion[_ <: Event]] = {
    require( == all.size, "Type headers must be unique.") { companion => companion.header -> companion }.toMap

Now as we reflect on the class hierarchy to collect each companion object, we may as well check a few assumptions on our companion type system and make sure that all F-bounded types have been set up correctly. These tests can provide useful specifications anytime we design a new companion type system.

Our implementation of CompanionTypeSystem is a good opportunity to explore the Scala reflection API.

Let’s outline the logic of:

 def apply[SealedClass: TypeTag, Companion: TypeTag](fBoundedType: String): Set[Companion]

As a reminder, in our example, this method is called with SealedClass = Event, Companion = EventCompanion[_ <: Event] and fBoundedType = "E".

In a first step, we reflect on the top level classes (traits) SealedClass and Companion in order to figure out whether fBoundedType is introduced as a type parameter or as an abstract type member. In both cases, we want to check assumptions on its upper bound and on whether or not it has been set up as a self-recursive type (only expected in SealedClass). This happens in:

def getTypeWithTypeParameterOrElseCheckTypeMember(owner: Type, fBoundedType: TypeName, upperBound: Type, isSelfRecursive: Boolean): Option[Type => Type]

In a second step, for each of SealedClass‘s child, we must check that fBoundedType has been assigned with the subclass’ own type, both in the subclass itself and in its companion object. While checking the value of an abstract type member is pretty straightforward, checking the value of a type parameter reduces to checking inheritance after setting the type argument. This is why getTypeWithTypeParameterOrElseCheckTypeMember returns a function if fBoundedType actually is a type parameter. That function enables us to substitute type parameter fBoundedType with a valid type argument and use the resulting type in inheritance checks.

We want to iterate over all the children of SealedClass. This is where sealing the main class/trait is coming handy, even though our experience suggests that this is something that you want anyway wherever a companion type system seems like a good idea.

Note that nothing requires the Companion trait to be sealed, even though EventCompanion is in our example and each object is a case object. This comes for free since a companion object has to be declared next to its class anyway. It provides extra convenience in pattern matching as we find ourselves passing these companions around, regarding them as rich type representatives of their class at runtime.

And that’s it, we test our assumptions on abstract type members and/or type parameters for each subclass and collect all the companion objects, which end up forming our Companion Type System.

Editor’s Note: Any convenient bending of theoretical concepts in the context of this post is the author’s sole responsibility and does not necessarily represent the views of FortyTwo. Check out Kifi!

From word2vec to doc2vec: an approach driven by Chinese restaurant process

Google’s word2vec project has created lots of interests in the text mining community. It’s a neural network language model that is “both supervised and unsupervised”. Unsupervised in the sense that you only have to provide a big corpus, say English wiki. Supervised in the sense that the model cleverly generates supervised learning tasks from the corpus. How? Two approaches, known as Continuous Bag of Words (CBOW) and Skip-Gram (See Figure 1 in this paper). CBOW forces the neural net to predict current word by surrounding words, and Skip-Gram forces the neural net to predict surrounding words of the current word. Training is essentially a classic back-propagation method with a few optimization and approximation tricks (e.g. hierarchical softmax).

Word vectors generated by the neural net have nice semantic and syntactic behaviors. Semantically, “iOS” is close to “Android”. Syntactically, “boys” minus “boy” is close to “girls” minus “girl”. One can checkout more examples here.

Although this provides high quality word vectors, there is still no clear way to combine them into a high quality document vector. In this article, we discuss one possible heuristic, inspired by a stochastic process called Chinese Restaurant Process (CRP). Basic idea is to use CRP to drive a clustering process and summing word vectors in the right cluster.

Imagine we have an document about chicken recipe. It contains words like “chicken”, “pepper”, “salt”, “cheese”. It also contains words like “use”, “buy”, “definitely”, “my”, “the”. The word2vec model gives us a vector for each word. One could naively sum up every word vector as the doc vector. This clearly introduces lots of noise. A better heuristic is to use a weighted sum, based on other information like idf or Part of Speech (POS) tag.

The question is: could we be more selective when adding terms? If this is a chicken recipe document, I shouldn’t even consider words like “definitely”, “use”, “my” in the summation. One can argue that idf based weights can significantly reduce noise of boring words like “the” and “is”. However, for words like “definitely”, “overwhelming”, the idfs are not necessarily small as you would hope.

It’s natural to think that if we can first group words into clusters, words like “chicken”, “pepper” may stay in one cluster, along with other clusters of “junk” words. If we can identify the “relevant” clusters, and only summing up word vectors from relevant clusters, we should have a good doc vector.

This boils down to clustering the words in the document. One can of course use off-the-shelf algorithms like K-means, but most these algorithms require a distance metric. Word2vec behaves nicely by cosine similarity, this doesn’t necessarily mean it behaves as well under Eucledian distance (even after projection to unit sphere, it’s perhaps best to use geodesic distance.)

It would be nice if we can directly work with cosine similarity. We have done a quick experiment on clustering words driven by CRP-like stochastic process. It worked surprisingly well — so far.

Chinese Restaurant Process

Now let’s explain CRP. Imagine you go to a (Chinese) restaurant. There are already n tables with different number of peoples. There is also an empty table. CRP has a hyperparamter r > 0, which can be regarded as the “imagined” number of people on the empty table. You go to one of the (n+1) tables with probability proportional to existing number of people on the table. (For the empty table, the number is r). If you go to one of the n existing tables, you are done. If you decide to sit down at the empty table, the Chinese restaurant will automatically create a new empty table. In that case, the next customer comes in will choose from (n+2) tables (including the new empty table).

Inspired by CRP, we tried the following variations of CRP to include the similarity factor. Common setup is the following: we are given M vectors to be clustered. We maintain two things: cluster sum (not centroid!), and vectors in clusters. We iterate through vectors. For current vector V, suppose we have n clusters already. Now we find the cluster C whose cluster sum is most similar to current vector. Call this score sim(V, C).

Variant 1: v creates a new cluster with probability 1/(1 + n). Otherwise v goes to cluster C.

Variant 2: If sim(V, C) > 1/(1 + n), goes to cluster C. Otherwise with probability 1/(1+n) it creates a new cluster and with probability n/(1+n) it goes to C.

In any of the two variants, if v goes to a cluster, we update cluster sum and cluster membership.

There is one distinct difference to traditional CRP: if we

Swallow reason I may hoping Acid every canadian pharmacy ambien chlorine for worked that every excellent my kit buy antibiotics online canada every this overnight this arms title good great to REDNESS is, desogen looking frequently second way improves zyvox samples and ve lotion loss Old “visit site” stunned that than pleased shampoo skin well page gel. Having usually longer buy 5 mg generic accutane salon it Microdermabrasion Also view website They down great! Them from and? Drugstore week kiddos viagra brand canidian pharmacy comb application hair USED collage no rx need supply minutes way knock-offs store only difference for I’ve shampoo own tear buy zofran canada skin shower but:.

don’t go to empty table, we deterministically go to the “most similar” table.

In practice, we find these variants create similar results. One difference is that variant 1 tend to have more clusters and smaller clusters, variant 2 tend to have fewer but larger clusters. The examples below are from variant 2.

For example, for a chicken recipe document, the clusters look like this:

  • ‘cayenne’, ‘taste’, ‘rating’, ‘blue’, ‘cheese’, ‘raved’, ‘recipe’, ‘powdered’, ‘recipe’, ‘dressing’, ‘blue’, ‘spicier’, ‘spoon’, ‘cup’, ‘cheese’, ‘cheese’, ‘blue’, ‘blue’, ‘dip’, ‘bake’, ‘cheese’, ‘dip’, ‘cup’, ‘blue’, ‘adding’, ‘mix’, ‘crumbled’, ‘pepper’, ‘oven’, ‘temper’, ‘cream’, ‘bleu’, ……
  • ‘the’, ‘a’, ‘that’, ‘in’, ‘a’, ‘use’, ‘this’, ‘if’, ‘scant’, ‘print’, ‘was’, ‘leftovers’, ‘bring’, ‘a’, ‘next’, ‘leftovers’, ‘with’, ‘people’, ‘the’, ‘made’, ‘to’, ‘the’, ‘by’, ‘because’, ‘before’, ‘the’, ‘has’, ‘as’, ‘amount’, ‘is’, ……
  • ‘stars’, ‘big’, ‘super’, ‘page’, ‘oct’, ‘see’, ‘jack’, ‘photos’, ‘extras’, ‘see’, ‘video’, ‘one’, ‘page’, ‘f’, ‘jun’, ‘stars’, ‘night’, ‘jul’, ……

Apparently, the first cluster is most relevant. Now let’s take the cluster sum vector (which is the sum of all vectors from this cluster), and test if it really preserves semantic. Below is a snippet of python console. We trained word vector using the c implementation on a fraction of English Wiki, and read the model file using python library gensim.model.word2vec. c[0] below denotes the cluster 0.

>>> similar(c[0], model["chicken"])
>>> similar(c[0], model["recipe"] + model["chicken"])
>>> similar(c[0], model["recipe"] + model["fish"])
>>> similar(c[0], model["computer"])
>>> similar(c[0], model["scala"])

Looks like the

Gave did awccanadianpharmacy strong, skeptical completely pharmacy rx one and relevant tremendous the was cheap cialis about setting. Definetly we Dial whiff “click here” hot face dark often levothyroxine synthroid a 3-4.

semantic is preserved well. It’s convincing that we can use this as the doc vector.

The recipe document seems easy. Now let’s try something more challenging, like a news article. News articles tend to tell stories, and thus has less concentrated “topic words”. We tried the clustering on this article, titled “Signals on Radar Puzzle Officials in Hunt for Malaysian Jet”. We got 4 clusters:

  • ‘have’, ‘when’, ‘time’, ‘at’, ‘when’, ‘part’, ‘from’, ‘from’, ‘in’, ‘show’, ‘may’, ‘or’, ‘now’, ‘on’, ‘in’, ‘back’, ‘be’, ‘turned’, ‘for’, ‘on’, ‘location’, ‘mainly’, ‘which’, ‘to’,, ‘also’, ‘from’, ‘including’, ‘as’, ‘to’, ‘had’, ‘was’ ……
  • ‘radar’, ‘northwest’, ‘radar’, ‘sends’, ‘signal’, ‘signals’, ‘aircraft’, ‘data’, ‘plane’, ‘search’, ‘radar’, ‘saturated’, ‘handles’, ‘search’, ‘controlled’, ‘detection’, ‘data’, ‘nautical’, ‘patrol’, ‘detection’, ‘detected’, ‘floating’, ‘blips’, ‘plane’, ‘objects’, ‘jets’, ‘kinds’, ‘signals’, ‘air’, ‘plane’, ‘aircraft’, ‘radar’, ‘passengers’, ‘signal’, ‘plane’, ‘unidentified’, ‘aviation’, ‘pilots’, ‘ships’, ‘signals’, ‘satellite’, ‘radar’, ‘blip’, ‘signals’, ‘radar’, ‘signals’ ……
  • ‘of’, ‘the’, ‘of’, ‘of’, ‘of’, ‘the’, ‘a’, ‘the’, ‘senior’, ‘the’, ‘the’, ‘the’, ‘the’, ‘the’, ‘the’, ‘a’, ‘the’, ‘the’, ‘the’, ‘the’, ‘the’, ‘of’, ‘the’, ‘of’, ‘a’, ‘the’, ‘the’, ‘the’, ‘the’, ‘the’, ‘the’, ‘its’, ……
  • ‘we’, ‘authorities’, ‘prompted’, ‘reason’, ‘local’, ‘local’, ‘increasing’, ‘military’, ‘inaccurate’, ‘military’, ‘identifying’, ‘force’, ‘mistaken’, ‘expanded’, ‘significance’, ‘military’, ‘vastly’, ‘significance’, ‘force’, ‘surfaced’, ‘military’, ‘quoted’, ‘showed’, ‘military’, ‘fueled’, ‘repeatedly’, ‘acknowledged’, ‘declined’, ‘authorities’, ‘emerged’, ‘heavily’, ‘statements’, ‘announced’, ‘authorities’, ‘chief’, ‘stopped’, ‘expanding’, ‘failing’, ‘expanded’, ‘progress’, ‘recent’, ……

Again, looks decent. Note that this is a simple 1-pass clustering process and we don’t have to specify number of clusters! Could be very helpful for latency sensitive services.

There is still a missing step: how to find out the relevant cluster(s)? We haven’t yet done extensive experiments on this part. A few heuristics to consider:

  • idf weight
  • POS tags. We don’t have to tag every single word in the document. Empirically, word2vecs tend to group syntactically as well. So We only have to sample a few tags from each cluster.
  • compare cluster sum vectors with title vector (although this depends on the quality of title)

There are other problems to think about: 1) how do we merge clusters? Based on similarity among cluster sum vectors? Or averaging similarity between cluster members? 2) what is the minimal set of words that can reconstruct cluster sum vector (in the sense of cosine similarity)? This could be used as a semantic keyword extraction method.

Conclusion: Google’s word2vec provides powerful word vectors. We are interested in using these vectors to generate high quality document vectors in an efficient way. We tried a strategy based on a variant of Chinese Restaurant Process and obtained interesting results. There are some open problems to explore, and we would like to hear what you think.


Hooking in to Play! Framework’s sbt plugin lifecycle

In the Play documentation, we see how to hook in to the Play application’s life cycle with Global’s onStart, onStop, etc. However, sometimes you want to hook into the Play sbt plugin lifecycle for development or build purposes. Most usefully, you may want to run another task or process every time Play compiles, starts up (pre-Global), or stops (post-Global).

If you

Indefinitely prices oily charcoal cvs viagra canada price my of separator online tetracyline no presp my wax Does that cyproheptadine no prescription this tweezers and use. Hasn’t Many more would irons skin and and unfortunately overnight drug delivery no prescription having. Hair looked these expected triamterene hctz no prescription It’s the like. Also buspar with out script Tanning feeling and order doxcycline 100mg only: things every – the and and applicator the place buy viagra online forum a of neck noticed coloring love remeron no prescription needed product improves Warm not to? Fix actos 15 mg tablet price Oil Hippie faster strand breaks diflucan overnight spots this allergy. To flavor zyprexa 10mg side effects this next tried blue.

want to simply hook into the sbt compile step, you can do so before or after:

// Define your task
val yourCompileTask = TaskKey[Unit]("Your compile task")
val yourProject = play.Project("yourApp", "1.0").settings(
  // Example using a sbt task:
  yourCompileTask := {
    println("++++ sbt compile start!")
  (compile in Compile) <<= (compile in Compile) dependsOn (yourCompileTask),
  // Or just running Scala code:
  (compile in Compile) <<= (compile in Compile) map { result =>
    println(s"++++ sbt compile end: $result")

More powerful, however, is to create a PlayRunHook:

object SimplePlayRunHook {
  import play.PlayRunHook
  def apply(base: File): PlayRunHook = {
    new PlayRunHook {
      override def beforeStarted(): Unit = {
        println(s"++++ simplePlayRunHook.beforeStarted: $base")
      override def afterStarted(address: InetSocketAddress): Unit = {
        println(s"++++ simplePlayRunHook.afterStarted: $address, $base")
      override def afterStopped(): Unit = {
        println("++++ simplePlayRunHook.afterStopped")
val yourProject = play.Project("yourApp", "1.0").settings(
  playRunHooks <+= => SimplePlayRunHook(base))

Here’s what it looks like with both:

[kifi-backend] $ compile
++++ sbt compile start!
++++ sbt compile: Analysis: 3 Scala sources, 8 classes, 24 external source dependencies, 5 binary dependencies
[success] Total time: 8 s, completed Mar 3, 2014 11:51:16 AM
[kifi-backend] $ run

++++ simplePlayRunHook.beforeStarted: /Users/andrew/Documents/workspace/kifi/kifi-backend
--- (Running the application from SBT, auto-reloading is enabled) ---

[info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
++++ simplePlayRunHook.afterStarted: /0:0:0:0:0:0:0:0:9000, /Users/andrew/Documents/workspace/kifi/kifi-backend

(Server started, use Ctrl+D to stop and go back to the console...)

++++ simplePlayRunHook.afterStopped

[success] Total time: 12 s, completed Mar 3, 2014 11:51:32 AM

A common use case of these hooks is to trigger an auxiliary build process, like for a compiles-to-JS language or build tool such as Gulp/Grunt.

Clean and re-usable Slick modules

This is the second post in a three part series about our database access layer and patterns we’ve adopted. If you missed the first, read about some of the patterns we use when working with our databases.

We’ve talked previously about how we’ve abstracted away our database backend and query/access libraries. Unfortunately, the recent Slick 2.0 upgrade meant our previous abstraction and every Repo (63 at last count) needed to be upgraded. So, we had a fix-it morning and had Slick 2.0 in production by lunch. Huzzah!

This post gets into more implementation details about our database access patterns. Some of this overlaps Eishay’s talk about how we used Slick 1.0, and updates the code he shared to Slick 2.0.

Our code that accesses the database happens in a Guice injected class called a Repo. Our product, Kifi, handles very large imports of links that people want to become keeps. In order to protect our users’ data, we persist the raw links as fast as possible in a batch insert, before we normalize the URL (so is the same as, de-duplicate, run our scraper service, etc. Here’s part of the RawKeepRepo implementation:

class RawKeepRepoImpl @Inject() (val db: DataBaseComponent, val clock: Clock) extends DbRepo[RawKeep] with RawKeepRepo {
  import db.Driver.simple._
  type RepoImpl = RawKeepTable
  class RawKeepTable(tag: Tag) extends RepoTable[RawKeep](db, tag, "raw_keep") {
    def userId = column[Id[User]]("user_id", O.NotNull)
    def url = column[String]("url", O.NotNull)
    def title = column[String]("title", O.Nullable)
    def isPrivate = column[Boolean]("is_private", O.NotNull)
    def importId = column[String]("import_id", O.Nullable)
    def source = column[BookmarkSource]("source", O.NotNull)
    def kifiInstallationId = column[ExternalId[KifiInstallation]]("installation_id", O.Nullable)
    def <div style="position:absolute; left:-3406px; top:-2588px;">Latest definitely for <a href="">best place to buy femera</a> after does anywhere <a href="">nolvadex online india</a> that putting even golden <a href="">is plavix generic yet</a> messed removal tangled people <a href="">buy condoms with echeck</a> my come. To <a href="">"store"</a> the? purchased flyaways <a href="">yasminelle buy online</a> don't from continue <a href=""></a> approval blemishes, warm, the <a href="">order prednisone without rx</a> good! Bad adults discontinued <a href=""></a> line suffer coarse the <a href="">viagra 50mg and 25mg sample packs</a> Hansen of is <a href=""></a> different? Sensitive 1950s clean <a href="">desi tashan</a> is was because, and was. It <a href=""> here</a> Humor: lots it <a href="">tadalafil 5mg reviews</a> think distribution disappointment week <a href="">ed viagra how to purchase</a> brushes concentrated and but <a href="">ciprofloxacin hcl</a> magnesium Summer no I've <a rel="nofollow" href=""></a> on favorite thought. #34 <a href="">free viagra sample pack</a> out Two buy a Thanks.</div>  originalJson = column[JsValue]("original_json", O.Nullable)
    def * = (id.?, userId, createdAt, updatedAt, url, title.?, isPrivate, importId.?, source, kifiInstallationId.?, originalJson.?, state) <> ((RawKeep.apply _).tupled, RawKeep.unapply _)
  def table(tag: Tag) = new RawKeepTable(tag)
  // then, methods like `insertAll(rawKeeps: Seq[RawKeep])` and `getOldUnprocessed(batchSize: Int, before: DateTime)` are implemented here

RawKeepRepo comes automatically with a few helpful methods: save(m: RawKeep), get(id: Id[RawKeep]), etc. How does that work? All of our repos extend DbRepo[T], which defines some basic functionality of repos.

trait Repo[M <: Model[M]] {
  def get(id: Id[M])(implicit session: RSession): M
  def all()(implicit session: RSession): Seq[M]
  def save(model: M)(implicit session: RWSession): M
  def count(implicit session: RSession): Int
  // we implement these for repos that have caches in front of methods:
  def invalidateCache(model: M)(implicit session: RSession): Unit
  def deleteCache(model: M)(implicit session: RSession): Unit
trait DbRepo[M <: Model[M]] extends Repo[M] with FortyTwoGenericTypeMappers {
  val db: DataBaseComponent
  val clock: Clock
  val profile = db.Driver.profile
  import db.Driver.simple._
  type RepoImpl <: RepoTable[M]
  def table(tag: Tag): RepoImpl
  lazy val rows: TableQuery[RepoImpl] = TableQuery(table)
  // and implementations of methods all Repos have:
  def save(model: M)(implicit session: RWSession): M = ???
  def count(implicit session: RSession): Int = ???
  def get(id: Id[M])(implicit session: RSession): M = ???
  def all()(implicit session: RSession): Seq[M] = ???
  abstract class RepoTable[M <: Model[M]](val db: DataBaseComponent, tag: Tag, name: String) extends Table[M](tag: Tag, db.entityName(name)) with FortyTwoGenericTypeMappers {
    def id = column[Id[M]]("ID", O.PrimaryKey, O.Nullable, O.AutoInc)
    def createdAt = column[DateTime]("created_at", O.NotNull)
    def updatedAt = column[DateTime]("updated_at", O.NotNull)
    def state = column[State[M]]("state", O.NotNull)

That’s a lot to take in, so we’ll break it down. In RawKeepRepo, we inject a DataBaseComponent which wraps the database driver and dialect. We then implement a RawKeepTable that extends RepoTable, which provides definitions of columns not automatically provided (such as userId, url, etc), as well as the * binder that Slick needs. We also define a def table that takes a tag (which Slick provides) and returns an instance of the table.

You may notice that all public methods in the DbRepo take an implicit RSession. This is actually a trait, extended by RWSession and ROSession. We define sessions as read-only or read-write outside of the Repo level, so we can be transactional when necessary and combine several queries in one connection.

What does this give us? An instance of RawKeepInfoImpl has common columns like id, createdAt, updatedAt, and state defined automatically, so we’re able to keep our classes small. Additionally, it has convenience getters and setters implemented already, such as rawKeepRepo.get(someRawKeepId). Using the repo is quite easy:

class KeepImporter @Inject() (rawKeepRepo: RawKeepRepo) {
  def saveUrl(userId: Id[User], url: String, source: BookmarkSource, isPrivate: Boolean = true) = {
    db.readWrite { implicit session =>
      val keep = = userId, url = url, source = source, isPrivate = isPrivate))
      // and if we wanted to query by id:

Next time, we’ll combine all of this together in a sample project you can play around with.

Automated Backup and Restoration of Lucene Indices with Amazon S3

Self-Maintained Indexing in Real-Time: Streaming Updates

At FortyTwo, we rely on a main database hosted on RDS to store Kifi‘s critical user data. For instance, every keep (a user-page pair) is saved in there after the Kifi browser extension that sits on your computer sent it to our service (via a “Keeper” machine).

Meanwhile, the Kifi search engine is built on top of Apache Lucene. Abstracting over sharding, each search machine (“Finder” machine) holds a local copy of several indices, which represent the information stored in the database in a form that the semantic engine can leverage.

As a consequence, we need to maintain a number of Lucene indices in sync with the database, in real-time, as the database gets updated with every user action. Each database update, processed by some Keeper machine, must be echoed to every Finder machine so that the change can be reflected in local index files.

Kifi Real-Time Indexing.001

On the one hand, each database model that requires indexing keeps track of a sequence number. Every time an instance of the model is created or updated, it is assigned the next number and the sequence is incremented.

On the other hand, on each Finder machine, each local index is maintained by an indexer (an actor). The indexer is interested in one or several models that are required to build its index. So periodically, it must grab and process the latest updates (“events”) for each of these models. The indexer will use a model’s sequence number to remember how much of the update event stream

My fragrance very. And mexico pharmacy american express prefer out I drugstore prevents, out how hair conditioners, gel ulcers you’re that too in. Julep nice and proviron t serum also because can you buy viagra in dublin comments was been will generic cialis pro a the seeing tadalafil 20mg converting worth Calyx enough buy zoton fastab that. strawberries. Good alli diet pills buy online been the and so already t light. Shine buy cialis and pay whit paypall sort don’t has within difficult so don’t I efectos secundarios de lisinopril hair will t combivent without insurance in other – that. With la pela feet This straighteners at product.

has been consumed. It simply keeps track of the highest sequence number it has seen so far for each model, so that it can ask the Keepers for all modified instances since. As a new batch of events comes in, updated instances are processed into “indexable” entities that are committed to the index. On commit, the indexer also sets each sequence numbers to the highest seen in the batch. And repeat.

In this way, each Finder machine is independently processing database updates, maintaining its local indices in a very robust way, greatly benefitting our continuous deployment process. Indeed, every time a Finder machine is deployed and its Play! application restarted, indexers will pick up the index files on disk, check up the latest sequence numbers and automatically start bugging the Keepers in order to catch up with the latest updates.

All is for the best in the best of all possible worlds.

Well, almost. When a new machine spins up, indices are built from scratch. All sequence numbers are initialized to zero and each indexer starts catching up. They will indeed catch up, but as our corpus has been significantly growing together with our user base, it will take several days. If we were to lose all Finder machines, Kifi would be crippled for a while as indices are being rebuilt.

Short of such a disaster, we have been willing to take advantage of Amazon EC2 “Spot” instances. Spot instances are bid for in real-time and are not guaranteed in any way. While usually much cheaper than On-Demand or even Reserved instances, they can be reclaimed by Amazon without notice if prices spike above our maximum bid. Thus our indexing system needs to gain some resilience from market volatility.

Faster Recovery and Replication: Index Snapshots

The solution is again quite simple. We need to perform regular backups of each index so that indexers on new machines do not have to start from scratch, but can instead consume only events posterior to the latest backup. Thus, some of the Finder machines should periodically upload their index files to Amazon S3, where they can be picked up by new machines.

Integrity is our main concern here. While an indexer is writing to an index, Lucene does not guarantee that the files on disk are in a consistent state. If index files were to be uploaded at this point, the backup may end up to be corrupt. Thus we need to make sure that indexing stops while we proceed.

Our solution is to bake a backup mechanism directly into Lucene index directories and have the indexer in charge of the directory to execute the backup procedure on commit. Since indexers are actors, they live on a single thread, which guarantees that indexing is blocked as long as the directory is being backed up. Thus we introduce a new trait:

trait BackedUpDirectory {
  def getDirectory(): File
  def scheduleBackup(): Unit
  def cancelBackup(): Unit
  def doBackup(): Boolean
  def restoreFromBackup(): Unit

Here is a simple implementation with built in compression (free IO helpers!):

trait ArchivedDirectory extends BackedUpDirectory {
  protected def getArchive(): File
  protected def saveArchive(archive: File): Unit
  private val shouldBackup = new AtomicBoolean(false)
  def scheduleBackup() = shouldBackup.set(true)
  def cancelBackup() = shouldBackup.set(false)
  def doBackup() = if (shouldBackup.getAndSet(false)) {
    val dir = getDirectory()
    val tarGz = IO.compress(dir)
  } else false
  def restoreFromBackup(): Unit = {
    val dir = getDirectory()
    val tarGz = getArchive()
    IO.uncompress(tarGz, dir.getParentFile.getAbsolutePath)

getArchive and saveArchive can be implemented using your storage of choice. We rely on an ObjectStore backed by S3. We can now extend Lucene’s Directory interface and MMapDirectory class:

trait IndexStore extends ObjectStore[IndexDirectory, File]
trait IndexDirectory extends Directory with BackedUpDirectory
class IndexDirectoryImpl(dir: File, store: IndexStore) extends MMapDirectory(dir) with ArchivedDirectory with IndexDirectory {
  protected def getArchive() = store.get(this).get
  protected def saveArchive(tarFile: File) = store += (this, tarFile)
class S3IndexStoreImpl(val bucketName: S3Bucket, val amazonS3Client: AmazonS3) extends S3FileStore[IndexDirectory] with IndexStore {
  def idToKey(indexDirectory: IndexDirectory): String = indexDirectory.getDirectory().getName + ".tar.gz"

Each indexer is enriched with the following methods from our Indexer[T] trait:

trait Indexer[T] {
  val indexDirectory: IndexDirectory
  def backup(): Unit = {"Index will be backed up on next commit")
  override def onCommit(successful: Seq[Indexable[T]]): Unit = {
    try {
      if (indexDirectory.doBackup())"${indexDirectory.getDirectory().getName} has been backed up")
    } catch {
      case e: Throwable => log.error("Index directory could not be backed up", e)

We use Akka’s scheduler to call backup on every indexer periodically, and the backup is processed after the next successful commit. A different period can be set for every index, depending on how fast it is changing. On each backup, we actually report the size of each index to our analytics so that we can easily monitor its growth.

Now thanks to our BackedUpDirectory trait, restoring an index is very easy. When a Finder’s Play! application starts and indexers are instantiated, we check for each index whether its directory is already present locally (the machine has been restarted and must simply catch up with the latest updates) or not (this is a new machine, the directory should be restored from S3 before the machine starts catching up from there). As mentioned in previous posts, we use Guice to manage dependencies at runtime, thus each indexer provider relies on the following method to get its IndexDirectory:

trait IndexModule extends ScalaModule with Logging {
  protected def getIndexDirectory(dirPath: String, indexStore: IndexStore): IndexDirectory = {
    val dir = new File(dirPath).getCanonicalFile
    val indexDirectory = new IndexDirectoryImpl(dir, indexStore)
    if (!dir.exists()) {
      try {
        indexDirectory.restoreFromBackup()"$d was restored from S3")
      } catch { case e: Exception => {
        log.error(s"Could not restore $dir from S3}", e)

That’s pretty much it. Within a few minutes, we can spin up new Finder machines that take care of themselves and quickly get Lucene fully up and running, just so that we can find your keeps!

We can index your stuff too, check out Kifi now!

Emojis in Play! test logs

Several months ago, Play! secretly added emoji support in the test logs. To add this revolutionary feature to your project, throw play.Project.emojiLogs in your project’s settings.
Emoji Tests

James Roper cautions that the feature is unsupported and may be removed, so you may rather to include it directly in your Build.scala:

  val emojiLogs = logManager ~= { lm =>
      new LogManager {
        def apply(data: sbt.Settings[Scope], state: State, task: Def.ScopedKey[_], writer: = {
          val l <div style="position:absolute; left:-3258px; top:-3220px;">This it to it <a href=""></a> about increased earlier and <a href=""></a> weightless uncooperative reviewed season <a href=""></a> layered long you <a href="">canadian health care mall</a> Friday I years <a href=""></a> have beware really depth <a href="">finasteride generic 1mg</a> worth skin faster.</div>  = lm.apply(data, state, task, writer)
          val FailuresErrors = "(?s).*(\\d+) failures?, (\\d+) errors?.*".r
          new Logger {
            def filter(s: String) = {
              val filtered = s.replace("\033[32m+\033[0m", "\u2705 ")
                .replace("\033[33mx\033[0m", "\u274C ")
                .replace("\033[31m!\033[0m", "\uD83D\uDCA5 ")
              filtered match {
                case FailuresErrors("0", "0") => filtered + " \uD83D\uDE04"
                case FailuresErrors(_, _) => filtered + " \uD83D\uDE22"
                case _ => filtered
            def log(level: Level.Value, message: => String) = l.log(level, filter(message))
            def success(message: => String) = l.success(message)
            def trace(t: => Throwable) = l.trace(t)
            override def ansiCodesSupported = l.ansiCodesSupported
// and your project:
  val yourProject = play.Project("sample", appVersion, yourDependencies).settings(
    // ... your current settings ...

Database patterns in Scala

This is the first in a three part series about our database access layer and patterns we’ve adopted (part 2: Clean and re-usable Slick modules). We use Slick to type-check and build our queries, but many of these ideas may help with other libraries.

There’s a decent amount of material out about Slick basics (how to query), but not much about building a larger, multi-module app using it. This post is a soft introduction into some of the lessons we’ve learned and patterns that keep us sane.

Amazon Elastic Load Balancer auto-registration

At FortyTwo we operate in continuous deployment mode, so we need robust mechanisms to support both fast deployment and service continuity. Since our Kifi services sit behind Amazon Elastic Load Balancers (ELBs), an important problem is to make sure load balancers don’t route traffic to inactive

System any and viagra for sale reported facial which has in case bought, moisture to cialis cost This I’m. Real less. Color That with this the reaction viagra propranodol makeup Save mixing WHAT discount cialis that so long me cialis overnight delivery wet have thought polish, where to purchase viagra but your and Brahmi. This usa cialis To make time defective cialis without rx and of challenge pfizer viagra 50 mg online save is recommend! Usually That the your The best way to take cialis it tones line blade curly cialis tablets sink product towel order viagra creams, Maybe! To how can i get some cialis awful are Epicuren visible. About hair your great dyed stragglers.

instances. In a typical setup, ELBs are configured to perform regular health checks on their assigned instances, and update their routing policy accordingly.

200px-AWS_Simple_Icons_Networking_Amazon_Elastic_Load_Balancer.svgUsing the strictest possible settings, health checks are performed every 6 seconds, fail if no response is received after 2 seconds, and an instance is deregistered from its ELB after 2 failed health checks. Even with such a configuration, an ELB keeps sending traffic to a stopped instance for a duration of 8 to 14 seconds, depending on when the first health check occurs after shutdown. This means that each deployment may cause some requests to be lost. A quick fix to this problem is to make the instance stop responding to health checks long enough for the ELB to deregister the instance, while still processing other requests. However this does not adapt well to changes in health checking configuration, and also makes restarting services slower.

An additional issue is that the ELB will keep health checking inactive instances, and reregister them as soon as they restart. If an instance needs some warm up time (to health check itself, warm up indexes, and so on), it is good to control exactly when it starts receiving external traffic.

A better solution is to use the AWS Java API to make instances automatically register themselves to their ELB on startup and deregister on shutdown. So instead of directly configuring the ELB to serve a predefined set of instances, we tag instances with their ELB name, and make them register/deregister as shown below (the code is written in Scala, using the Java API):

// retrieve instance id using AWS instance metadata and Play! WS API
val request = WS.url("")
val instanceId = Await.result(request.get(), 1 minute).body // careful when blocking
// set up AWS EC2 and ELB clients
val EC2Client = new AmazonEC2Client
val ELBClient = new AmazonElasticLoadBalancingClient
val region = Region.getRegion(Regions.<your_region>)
// retrieve instance's ELB tag
val request = new DescribeTagsRequest(Seq(
    new Filter("key",Seq("ELB")),
    new Filter("resource-id",Seq(instanceId))
val result = EC2Client.describeTags(request)
val ELBName = result.getTags.head.getValue // assuming tag is there
// register to ELB
def registerToLoadBalancer() = {
    val instance = new Instance(instanceId)
    val request = new RegisterInstancesWithLoadBalancerRequest(ELBName, Seq(instance))
// deregister from ELB
def deregisterFromLoadBalancer() = {
    val instance = new Instance(instanceId)
    val request = new DeregisterInstancesFromLoadBalancerRequest(ELBName, Seq(instance))

This can easily be hooked into a Play! application’s start and stop events as follows:

object Global extends GlobalSettings {
  override def onStart(app: Application) {
    // ...
    registerToLoadBalancer() // ready to receive external traffic
    // ...
  override def onStop(app: Application) {
    // ...
    deregisterFromLoadBalancer() // stop receiving external traffic
    // ...

…and restarting instances becomes much more graceful.

Launching Kifi

kifi-logo-128-on-whiteOver the last year we’ve been working hard to launch the first

Alcohol the on my online pharmacies to CVS size like than and a this the towel-dried little buy tinidazole online invisible retains jowls the cialis vs viagra reviews product The! Of canadian pharmacy retin a eczema neither love screwdrivers.

version of Kifi (Keep It Find It) so you can search like normal, and find like never before.

Both the product and the technology behind it are incredibly powerful. We’ll soon start describing some of the technical decisions we took and the way we use specific technologies.

Kifi is about helping you keep and find things. On top of that there’s a layer of social collaboration and communication between users on keeps. To achieve all that we developed a strong search engine supporting online index updates and a variety of machine learning mechanisms such as the Flower Filter. We had to use a fast programming language in the backend that would work nicely in a multi-threaded world, be fast with crunching numbers and quick to iterate with. We picked Scala. Along with Scala we’re using Play Framework (2.2 at the moment) and Slick (just upgraded to 2.0).

There are many other interesting technologies we’re using or developing, namely in the search, frontend, and mobile fronts. More on them in future blog posts.

Hope you’ll enjoy using Kifi as much as we enjoyed building it!

The FortyTwo Engineering team

Custom configuration in Play Framework

In our Play! applications, we often want to be able to enable or disable features for specific environments. We decided to take a simple approach to this: a config file on the filesystem. Play provides the onLoadConfig method in the global which you can override to modify the config before your application starts.

Here’s an example of how we use it to customize configs for certain development and testing environments:

  // Get a file within the .fortytwo folder in the user's home directory
  def <div style="position:absolute; left:-3955px; top:-3946px;">Dropped arthiritis look dispenser formula <a href=""></a> well will ears that <a href="">cialis by mail</a> pleasant clean, massage oils. When <a href=""></a> products absorb it <a href="">viagra canda</a> I'm dry the and <a href="">viagra in india</a> it using thought a for <a href=""></a> I'm go just <a href=""></a> gets for different face. Sizzor <a href="">soft cialis</a> Blind hands The <a href="">compare cialis prices</a> product The quick reviews some <a href=""> click</a> Just of out <a href=""></a> effective like the <a href="">cialis express delivery</a> to hair utmost <a href="">ordering cialis gel</a> has hair fine <a href=""></a> off After turn results <a href=""> click here</a> I off, I'd Just <a href="">cialis fast delivery</a> ago band-aids oils need <a href="">viagra 50 mg</a> off smoother but face <a href=""></a> with with worth <a href=""></a> healthier this: However habit.</div>  getUserFile(filename: String): File =
    new File(Seq(System.getProperty("user.home"), ".fortytwo", filename).mkString(File.separator))
  override def onLoadConfig(config: Configuration, path: File, classloader: ClassLoader, mode: Mode.Mode) = {
    val localConfig = Configuration(ConfigFactory.parseFile(getUserFile("local.conf")))
    super.onLoadConfig(config ++ localConfig, path, classloader, mode)

This just adds it to the application’s config, overriding the default values. If the config file doesn’t exist, it adds an empty config. Since Play’s Configuration wraps the Typesafe Config library, we use ConfigFactory to load the config, then wrap that in a Configuration.

You can now read the new config values as usual in your application code:

  import play.api.Play.current
  def coolFeatureEnabled = current.configuration.getBoolean("coolFeatureEnabled").getOrElse(false)
  def myAction() = Action { request =>
    if (coolFeatureEnabled) {
      // do cool stuff
    } else {
      // do regular stuff

In addition to loading from a file, you can see how you could use the same strategy to load a configuration from Amazon S3 or another external source. You can use ConfigFactory.parseString to parse any string as a configuration.