AM++: A Generalized Active Messaging Framework

Transcript

1 AM++: A Generalized Active Message Framework Jeremiah Willcock , Torsten Hoefler , Nicholas Edmonds, and Andrew Lumsdaine

2 Large Scale Computing -  Not just for PDEs anymore  Many new, important HPC applications are data - driven (“informatics applications”)  Social network analysis  Bioinformatics

3 Data - Driven Applications Different from “traditional” applications   Communication highly data dependent -  Little memory locality  Impractical to load balance  Many small messages to random nodes  Computational ecosystem is a bad match for informatics applications  Hardware  Software  Programming paradigms  Problem solving approaches

4 Two - - First Search Sided (BSP) Breadth any rank’s is not empty : while queue i in ranks : out_queue [ i ]  empty for vertex for in in_queue [ * ]: v color ( v ) is white: if color ( v )  black for vertex in neighbors( v ): w append to out_queue [owner( w )] w for i in ranks : start receiving in_queue [ i ] from rank i [ for in ranks : start sending out_queue j j ] to rank j synchronize and finish communications

5 Two Sided (BSP) Breadth - First Search - Rank 0 Rank 1 Rank 2 Rank 3 Get neighbors Redistribute queues Combine received queues

6 Messaging Models  - sided Two MPI  Explicit sends and receives   One - sided  MPI - 2 one - sided, ARMCI, PGAS languages  Remote put and get operations  Limited set of atomic updates into remote memory  Active messages  GASNet , DCMF, LAPI, Charm++, X10, etc.  Explicit sends, implicit receives  User - defined handler called on receiver for each message

7 Active Messages Created by von Eicken  Process 1 Process 2 al, for Split - C (1992) et  Messages sent explicitly Send Receivers register  handlers but not Message Time handler involved with individual messages Reply Messages often  asynchronous for higher Reply handler throughput

8 Active Message Breadth - First Search vertex_handler (vertex ): handler v ( color ) is white: if v ( color )  black v append v to new_queue while any rank’s queue is not empty : new_queue  empty begin active message epoch vertex v in queue : for for vertex w in neighbors( v ): vertex_handler( tell ( w ) to run owner w ) end active message epoch queue  new_queue

9 Active Message Breadth - First Search 0 Rank Rank Rank 2 Rank 3 1 Get neighbors Send vertex messages Active Check color message maps handler Insert into queues

10 Low - - Level AM Systems Level vs. High Active messaging systems (loosely) on a spectrum  of features vs. performance  Low - level systems typically have restrictions on message handler behavior, explicit buffer management, etc.  - level systems often provide dynamic load balancing, High service discovery, authentication/security, etc. DCMF GASNet Java RMI Charm++/X10 Low High

11 The AM++ Framework  - and AM++ provides a “middle ground” between low - high level systems  Gets performance from low - level systems  Gets programmability from high - level systems  - level features can be built on top of AM++ High AM++ DCMF GASNet Java RMI Charm++/X10 Low High

12 Key Characteristics For use by applications   AM handlers can send messages  Mix of generative (template) and object - oriented approaches  Object - orientation for flexibility and type erasure  Templates for optimal performance  Flexible/application - specific message coalescing  Messages sent to processes, not objects

13 Example Create Message Transport (Not restricted to MPI) Coalescing layer (and underlying message type) Message Handler Messages are nested to depth 0 Epoch scope

14 AM++ Design

15 Transport  Interface to underlying communication layer MPI and currently  GASNet  Designed to send large messages produced by higher - level components Object - oriented techniques  - allow run time flexibility (type erasure)  MPI - style progress model  Progress thread optional  User must call into AM++

16 Message Types  Handler registration for messages within transport  Type - safe interface to reduce user casts and errors  Automatic data buffer handling

17 Termination Detection/Epochs AM++ handlers can send messages   When have they all been sent and handled?  Termination detection – a standard distributed computing problem  Some applications send a fixed depth of nested messages  Time divided into epochs

18 Message Coalescing Standard way to amortize overheads   Trade off latency for throughput  Layered on transport and message type  Can be specific to application or message type  Handlers apply to one small message at a time  Sends are of a single small message

19 Message Handler Optimizations Coalescing uses generative programming and C++  templates for performance on high message rates  Small - message handler type is known statically  Simple loop calls handler  Compiler can optimize using standard techniques

20 Message Reductions Some applications have messages that are   Idempotent: duplicate messages can be ignored  Reducible: some messages can be combined  Detect some at sender  Cache

21 AM++ and Threads AM++ is thread - safe   Models for thread use:  Run separate handlers in separate threads  Split a single message across several threads  Coalescing buffer sizes affect parallelism in both models

22 Evaluation: Message Latency section L - data - rate Single , GASNet 1.14.0 testam InfiniBand

23 Evaluation: Message Bandwidth section L - data - rate Single , GASNet 1.14.0 testam InfiniBand

24 Breadth First Search: Strong Scaling - 27 - data - rate vertices, degree 4 , dual - socket dual - core, 2 Single InfiniBand

25 Breadth First Search: Weak Scaling - 25 - data - rate vertices/node, degree 4 , dual - socket dual - core, 2 Single InfiniBand

26 Delta Stepping: Strong Scaling - 27 - data - rate vertices, degree 4 , dual - socket dual - core, 2 Single InfiniBand

27 Delta Stepping: Weak Scaling - 24 - data - rate vertices/node, degree 4 , dual - socket dual - core, 2 Single InfiniBand

28 Conclusion Generative programming techniques used to design  a flexible active messaging framework, AM++  “Middle ground” between previous low - level and high - level systems  Features can be composed on that framework  Performance comparable to other systems

Related documents

TLA Referral Dir

TLA Referral Dir

2019–2020 REFERRAL DIRECTORY Legal Services and Other Resources for Low-Income Texans Published by the LEGAL ACCESS DIVISION OF THE STATE BAR OF TEXAS . Austin, Texas 78711-2487 P.O. Box 12487 . Fax: ...

More info »
TLA Referral Dir

TLA Referral Dir

2019–2020 REFERRAL DIRECTORY Legal Services and Other Resources for Low-Income Texans Published by the LEGAL ACCESS DIVISION OF THE STATE BAR OF TEXAS . Austin, Texas 78711-2487 P.O. Box 12487 . Fax: ...

More info »
Devi Parikh

Devi Parikh

Forcing Vision and Language Models to Not Just Talk But Also Actually See Devi Parikh 1 Slide credit: Devi Parikh

More info »
PDF A v3

PDF A v3

PDF -A transcript Slide 1 - Welcome Slide notes Click to add notes for the selected slide 47 Page 1 of

More info »
An Introduction to Computer Networks

An Introduction to Computer Networks

An Introduction to Computer Networks Release 1.9.18 Peter L Dordal Mar 31, 2019

More info »
it

it

Entropy and Information Theory First Edition, Corrected March 3, 2013

More info »
NVP final

NVP final

News. Voice. Power. News. Voice. Power. Critical News Literacy in Action mikvachallenge.org

More info »
Digital News Report 2018

Digital News Report 2018

1 Reuters Institute Digital News Report 2018

More info »
proposals workbook

proposals workbook

Improving the Odds... Supplemental Workbook How to Pr epare Government Contract Proposals February 2014 gned to serve as a guide for preparing This workbook is desi as a government contract proposals....

More info »
Profile Fitting for Analysis of XRPD Data using HighScore Plus v3

Profile Fitting for Analysis of XRPD Data using HighScore Plus v3

Profile Fitting for Analysis of XRPD Data using HighScore Plus v3 Scott A Speakman, Ph.D. Center for Materials Science and Engineering at MIT [email protected] http://prism.mit.edu/xray

More info »
LessThan39WeeksToolkit

LessThan39WeeksToolkit

A California Toolkit 8/31/11 #2 Errata to Transform Maternity Care Elimination of Non-medically Indicated (Elective) Deliveries Before 39 Weeks Gestational Age Tive projec T This collabora was develop...

More info »
User’s Guide   Epson Perfection V600 Photo

User’s Guide Epson Perfection V600 Photo

User’s Guide Overview of Your Scanner Features Guide to the Scanner Parts Check this section to identify the parts of your scanner. These features help you create the best possible scans, no matter wh...

More info »
MR

MR

MODERN ROBOTICS MECHANICS, PLANNING, AND CONTROL Kevin M. Lynch and Frank C. Park May 3, 2017 This document is the preprint version of Modern Robotics Mechanics, Planning, and Control c © Kevin M. Lyn...

More info »
PowerPoint Presentation

PowerPoint Presentation

Digital Subscription Reader Revenue Benchmarks & Best Practices from 500+ Publications Worldwide

More info »
catalog e

catalog e

How to Buy. Wright Tool is proud to partner with thousands of distributors world wide to bring our products to tool users where they need them, Visit us at and when they need them. To locate a distrib...

More info »
PowerPoint Presentation

PowerPoint Presentation

W hat Employers Need to Know about the Final ACA Reporting Forms prior to the webinar the slides that you download highly recommend We (case sensitive) http://ow.ly/LCcOh at Bethany Lopusnak, Laura Ke...

More info »
corecurr all

corecurr all

Core Curriculum on Tuberculosis: What the Clinician Should Know Sixth Edition 2013 CS234269

More info »
Government Contracting 101

Government Contracting 101

U.S. Small Business Administration Office of Government Contracting and Business Development Government Contracting 101 Part 1 Overview of Small Business Programs March 2015 (Revised)

More info »
Health Center Program Site Visit Protocol

Health Center Program Site Visit Protocol

Health Center Program Site Visit Protocol Last updated: April 18 , 201 9

More info »