LinkedIn today is announcing the release of new open-source software for machine learning called FeatureFu. It’s meant to help programmers with the process of feature engineering, which involves creating high-quality new data based on existing data that algorithms can use to work efficiently.
FeatureFu can come in handy for a wide variety of purposes, including classification, clustering, and normalization, as LinkedIn senior software engineer Bing Zhao wrote in a blog post on FeatureFu.
This is the latest open-source tool that LinkedIn has made available for the rest of the world to use. Other releases include Kafka, Samza, Voldemort, and most recently Pinot.
Other companies that regularly release open-source software include Facebook, LinkedIn, and Airbnb, which open-sourced the machine learning package Aerosolve in June.
AI Weekly
The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.
Included with VentureBeat Insider and VentureBeat VIP memberships.
Zhao demonstrates the power of FeatureFu with an example derived from LinkedIn itself:
For example, in homepage feeds ranking for a professional social network, we often want to capture members preference of different feed types (e.g. news article from an influencer, recent job change from a connection) by counting the number of historical likes and number of impressions of each feed type for the member.
The raw counts usually need to be combined into a like-per-impression ratio with smoothing before it can be used as a stable feature, with a mathematical formula like: (1+likes)/(10+impressions). Normally, the formula has to be coded into an online feature serving system and any change of the formula will need a code change/deployment, which requires significant operational overhead. With Expr and FeatureFu, we will only need to write the formula as an s-expression “(/ (+ 1 likes) (+ 10 impression))”, and include it in the model configuration file, any further change to the formula – like additional smoothing by taking logarithm of the counts – will just need a configuration change of the s-expression itself: “(- (log2 (+ 10 impressions)) (log2 (+ 1 likes)))”, which is much more flexible and agile.
Written completely in Java, FeatureFu will be updated over time with new feature generation capabilities, Zhao wrote.
Find FeatureFu on GitHub here. Learn more about it from Zhao’s blog post.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More