Ben Yossef GoodBadUgly

Transcript

1 CELF ELC Europe 2009 On Threads, Processes and Co-Processes Gilad Ben-Yossef Codefidence Ltd. (C) 2008,2009 Codefidence Ltd.

2 About Me  Chief Coffee Drinker of Codefidence Ltd. nd  Building Embedded Linux System , 2 Co-Author of edition  Israeli FOSS NPO Hamakor co-founder  August Penguin co-founder  git blame .* -- FOSS  Linux  Asterisk  cfgsh (RIP) ... (C) 2008,2009 Codefidence Ltd.

3 Linux Process and Threads Stack Stack Stack Stack Stack State State State State State Signal Signal Signal Signal Signal Mask Mask Mask Mask Mask Priority Priority Priority Priority Priority Thread Thread Thread Thread Thread 1 2 3 4 1 Process 124 Process 123 Signal File File Signal Memory Memory Descriptors Handlers Handlers Descriptors (C) 2008,2009 Codefidence Ltd.

4 The Question Should we use threads? Given an application that calls for several tasks, should we implement each task as a thread in the same processes or in a separate processes? (C) 2008,2009 Codefidence Ltd.

5 What Do The Old Wise Men Say?  " If you think you need threads then your processes are too fat. "  , Rob Pike co-author of The Practice of Programming and The Unix Programming Environment.  “ A Computer is a state machine. Threads are for people who can't program state machines. "  , Linux kernel programmer Alan Cox (C) 2008,2009 Codefidence Ltd.

6 Spot the Difference (1)  The Linux scheduler always operates at a thread granularity level.  You can start a new process without loading a new program, just like with a thread  fork() vs. exec()  Process can communicate with each other just like threads  Shared memory, mutexes, semaphores, message queues etc. work between processes too. (C) 2008,2009 Codefidence Ltd.

7 Spot the Difference (2)  Process creation time is roughly double that of a thread.  ... but Linux process creation time is still very small.  ... but most embedded systems pre-create all tasks in advance anyway.  Each process has it's own virtual memory address space.  Threads (of the same process) share the virtual address space. (C) 2008,2009 Codefidence Ltd.

8 Physical and Virtual Memory Physical address space Virtual address spaces 0xFFFFFFFFF 0xFFFFFFFFF 0xFFFFFFFFF I/O memory 3 Process 0 Process1 I/O memory 2 0x00000000 0x00000000 I/O memory 1 Flash MMU CPU 0xFFFFFFFFF Memory RAM 1 Management All the processes have Unit their own virtual Process2 RAM 0 address space, and run as if they had access to 0x00000000 the whole address 0x00000000 space. This slide is © 2004 – 2009 Free Electrons. (C) 2008,2009 Codefidence Ltd.

9 Page Tables Page Frame Virtual Page Address Address Address Read Space Write Number * ASN Permission Physical Virtual Execute Cached 12 0x5340 0x8000 RWXC 15 RX 0x3390 0x8000 Read Only Execute Non Cached Memory MMU CPU Bus * Does not exists on all architectures. (C) 2008,2009 Codefidence Ltd.

10 Translation Look-aside Buffers  The MMU caches the content of page tables in a CPU local cache called the TLB.  TLBs can be managed in hardware automatically (x86, Sparc) or by software (Mips, PowerPC)  Making changes to the page tables might require a TLB cache flush, if the architecture does not support TLB ASN (Alpha, Intel Nehalem, AMD SVM) (C) 2008,2009 Codefidence Ltd.

11 Cache Indexes  Data and Instruction caches may use either virtual or physical address as the key to the cache.  Physically indexed caches don't care about different address spaces.  Virtually indexed caches cannot keep more then one alias to the same physical address  Or need to carefully manage aliasing using tagging. (C) 2008,2009 Codefidence Ltd.

12 LMBench  LMBench is a suite of simple, portable benchmarks by Larry McVoy and Carl Staelin  lat_ctx measures context switching time.  Original lat_ctx supported only measurement of inter process context switches.  I have extended it to measure also inter thread content switches.  All bugs are mine, not Larry and Carl :-) (C) 2008,2009 Codefidence Ltd.

13 How lat_ctx works (1) 2. Pass token on a 8923478234972364972349723469234692346923462397 462937y923236497234693246928923478234972364972 Unix pipe to 349723469234692346923462397462937y923236497234 next task 6932469289234782349723649723497234692346923469 23462397462937y9232364972346932469289234782349 72364972349723469234692346923462397462937y9232 3649723469324692892347823497236497234972346923 4692346923462397462937y92323649723469324692892 3478234972364972349723469234692346923462397462 937y923236497234693246928923478234972364972349 723469234692346923462397462937y923236497234693 Task 1 2469289234782349723649723497234692346923469234 62397462937y9232364972346932469289234782349723 64972349723469234692346923462397462937y9232364 9723469324692892347823497236497234972346923469 2346923462397462937y92323649723469324692892347 8234972364972349723469234692346923462397462937 y923236497234693246923246928923478234972364972 Task 3 349723469234692346923462397462937y923236497234 6932469289234782349723649723497234692346923469 23462397462937y9232364972346932469289234782349 72364972349723469234692346923462397462937y9232 3649723469324692892347823497236497234972346923 4692346923462397462937y92323649723469324692892 3478234972364972349723469234692346923462397462 937y92323649723469324692892347823497236497234 Task 2 1. Perform Can set task size calculation on a and number of tasks variable size array (C) 2008,2009 Codefidence Ltd.

14 How lat_ctx works (2)  Both the data and the instruction cache get polluted by some amount before the token is passed on.  The data cache gets polluted by approximately the process ``size''.  The instruction cache gets polluted by a constant amount, approximately 2.7 thousand instructions.  The benchmark measures only the context switch time, not including the overhead of doing the work.  A warm up run with hot caches is used as a reference. (C) 2008,2009 Codefidence Ltd.

15 lat_ctx Accuracy  The numbers produced by this benchmark are somewhat inaccurate;  They vary by about 10 to 15% from run to run.  The possible reasons for the inaccuracies are detailed in LMBench documentation  They aren't really sure either. (C) 2008,2009 Codefidence Ltd.

16 Context Switch Costs  Using the modified lat_ctx we can measure the difference between the context switch times of threads and processes.  Two systems used:  I ntel x86 Core2 Duo 2Ghz  Dual Core  Hardware TLB  Freescale PowerPC MPC8568 MDS  Single core  Software TLB (C) 2008,2009 Codefidence Ltd.

17 X86 Core2 Duo 2Ghz 0k data 1.85 1.83 1.81 1.81 1.8 1.76 1.74 1.75 Threads Proccesses 1.69 1.7 1.65 Context Switch Time in Usec 1.6 20 10 5 Number of Threads/Processes (C) 2008,2009 Codefidence Ltd.

18 X86 Core2 Duo 2Ghz 16k data 2.24 2.23 2.23 2.23 2.22 2.22 2.22 2.22 2.22 2.21 Threads 2.21 Proccesses 2.21 2.2 2.2 2.2 Context Switch Time in Usec 2.19 2.19 20 10 5 Number of Threads/Processes (C) 2008,2009 Codefidence Ltd.

19 X86 Core2 Duo 2Ghz 128k data 18 16.61 15.85 16 14 12 10 Threads 8 Proccesses 6 3.73 3.28 4 1.99 1.92 Context Switch Time in Usec 2 0 20 10 5 Number of Threads/Processes (C) 2008,2009 Codefidence Ltd.

20 PPC MPC8568 MDS 0k data 4 3.44 3.5 3.26 3.24 3.05 3 2.61 2.36 2.5 Threads 2 Proccesses 1.5 1 Context Switch Time in Usec 0.5 0 20 10 5 Number of Threads/Processes (C) 2008,2009 Codefidence Ltd.

21 PPC MPC8568 MDS 16k data 20 17.87 17.77 18 16 14.25 13.7 13.35 13.18 14 12 Threads 10 Proccesses 8 6 4 Context Switch Time in Usec 2 0 20 10 5 Number of Threads/Processes (C) 2008,2009 Codefidence Ltd.

22 PPC MPC8568 MDS 128k data 250 228.23 227.68 227.63 222.41 200 178.06 167.88 150 Threads Proccesses 100 50 Context Switch Time in Usec 0 20 10 5 Number of Threads/Processes (C) 2008,2009 Codefidence Ltd.

23 Conclusions  Context switch times change between threads and processes.  It is not a priori obvious that threads are better.  The difference is quite small.  The results vary between architectures and platforms.  Why do we use threads? really (C) 2008,2009 Codefidence Ltd.

24 Why People Use Threads? It's the API, Silly. POSIX thread API offers simple mental model. (C) 2008,2009 Codefidence Ltd.

25 API Complexity: Task Creation   fork() pthread_create()   Zero parameters. Can specify most attributes for new  Set everything yourself thread during create after process creation.  Can specify function  New process begins for new thread to with a virtual copy of start with parent at same location  Easy to grasp  Copy on write semantics address space require shared memory sharing setup (C) 2008,2009 Codefidence Ltd.

26 API Complexity: Shared Memory  Threads share all the memory.  You can easily setup a segment of shared processes using shm_create() memory for  Share only what you need!  BUT... each process may map shared segment at different virtual address.  Pointers to shared memory cannot be shared!  A simple linked list becomes complex. (C) 2008,2009 Codefidence Ltd.

27 API Complexity: File Handles  In Unix, Everything is a File.  Threads share file descriptors.  Processes do not.  Although Unix Domain Socket can be used to pass file descriptors between processes.  System V semaphores undo values not shared as well, unlike threads.  Can make life more complicated. (C) 2008,2009 Codefidence Ltd.

28 Processes API Advantages   Processes Threads   PID visible in the No way to name task . system in a unique name.   Can set process Kernel thread id not related to internal name via thread handle. program_invocation_n ame.  Difficult to ID a thread  Easy to identify a in the system. processes in the system. (C) 2008,2009 Codefidence Ltd.

29 The CoProc Library  CoProc is a proof of concept library that provides an API implementing share-as-you- need semantics for tasks  Wrapper around Linux clone() and friends.  Co Processes offer a golden path between traditional threads and processes.  Check out the code at: http://github.com/gby/coproc (C) 2008,2009 Codefidence Ltd.

30 CoProc Highlights  A managed shared memory segment, guaranteed to be mapped at the same virtual address  Coproc PID and name visible to system  Decide to share file descriptors or not at coproc creation time.  Set attributes at coproc creation time.  Detached/joinable, address space size, core size, max CPU time, stack size, scheduling policy, priority, and CPU affinity supported. (C) 2008,2009 Codefidence Ltd.

31 CoProc API Overview  int coproc_init ( size_t shm_max_size);  pid_t coproc_create ( char * coproc_name, struct coproc_attributes * attrib, int flags, int (* start_routine)( void * * ), void arg);  coproc_exit ( void ); int  void * coproc_alloc ( size_t size);  void coproc_free ( void * ptr);  status); * int coproc_join ( pid_t pid, int (C) 2008,2009 Codefidence Ltd.

32 CoProc Attributes coproc_attributes { struct /* The maximum size of the process rlim_t address_space_size ; address space in bytes */ rlim_t core_file_size ; / * Maximum size of core file */ /* CPU time limit in seconds */ rlim_t cpu_time ; /* The maximum size of the process ; rlim_t stack_size stack, in bytes */ ; int scheduling_policy /* Scheduling policy. */ int scheduling_param ; /* Scheduling priority or nice level */ ; /* The CPU mask of the co­proc */ cpu_set_t cpu_affinity_mask }; (C) 2008,2009 Codefidence Ltd.

33 CoProc Simple Usage Example int pid, ret; char * test_mem; struct coproc_attributes = { ... }; coproc_init (1024 * 1024); test_mem = coproc_alloc (1024); if(!test_mem) abort(); pid = c oproc_create ("test_coproc", &test_coproc_attr, \ COPROC_SHARE_FS ,test_coproc_func, test_mem); if(pid < 0) abort(); c oproc_join (pid, &ret); coproc_free (test_mem); (C) 2008,2009 Codefidence Ltd.

34 Credits  The author wishes to acknowledge the contribution of the following people:  Larry McVoy and Carl Staelin for LMBench.  Joel Issacason, for an eye opening paper about a different approach to the same issue  Rusty Russel, for libantithreads, yet another approach to the same issue.  Free Electrons, for Virt/Phy slide.  Sergio Leone, Clint Eastwood, Lee Van Cleef, and Eli Wallach for the movie :-) (C) 2008,2009 Codefidence Ltd.

35 Thank You For Listening! Questions?  Codefidece Ltd.: http://codefidence.com  Community site: http://tuxology.net  Email: [email protected]  Phone: +972-52-8260388  Twitter: @giladby  SIP: [email protected]  Skype: gilad_codefidence  (C) 2008,2009 Codefidence Ltd.

Related documents

OctoberCUR2018

OctoberCUR2018

CHANCELLOR'S UNIVERSITY REPORT OCTOBER 29 2018

More info »
Analyze This

Analyze This

ANALYZE THIS Screenplay by PETER TOLAN and HAROLD RAMIS and KENNETH LONERGAN Story by KENNETH LONERGAN and PETER TOLAN July 1998 Draft FOR EDUCATIONAL PURPOSES ONLY

More info »
Analyze That

Analyze That

ANALYZE THAT Screenplay by PETER STEINFELD and HAROLD RAMIS and PETER TOLAN Based on characters created by KENNETH LONERGAN and PETER TOLAN June 2002 Draft FOR EDUCATIONAL PURPOSES ONLY

More info »
2064261 2019 03 29 order granting plaintiffs  msj

2064261 2019 03 29 order granting plaintiffs msj

Case 3:17-cv-01017-BEN-JLB Document 87 Filed 03/29/19 PageID.8055 Page 1 of 86 1 2 3 4 5 6 UNITED STATES DISTRICT COURT 7 SOUTHERN DISTRICT OF CALIFORNIA 8 9 , 3:17cv1017 Case No.: VIRGINIA DUNCAN, et...

More info »
CDIR 2018 07 27

CDIR 2018 07 27

S. Pub. 115-7 2017-2018 Official Congressional Directory 115th Congress Convened January 3, 2017 JOINT COMMITTEE ON PRINTING UNITED STATES CONGRESS UNITED STATES GOVERNMENT PUBLISHING OFFICE WASHINGTO...

More info »
June2018CUR

June2018CUR

CHANCELLOR'S UNIVERSITY REPORT JUNE 25 2018

More info »
Microsoft PowerPoint   blackhat 2018 tosend.pptx

Microsoft PowerPoint blackhat 2018 tosend.pptx

Ben-Gurion University of the Negev The Air-Gap Jumpers Mordechai Guri, PhD The Head of R&D, Cyber-Security Research Center Ben-Gurion University of the Negev, Israel

More info »
625137

625137

2018-19 Nebraska All-Sports Record Book - Nebraska Communications Office -

More info »
DEC CUR

DEC CUR

CHANCELLOR'S UNIVERSITY REPORT DECEMBER 10, 2018

More info »
No Slide Title

No Slide Title

Pesquisas de Diretoria Indústria de Coordenação PF BRASIL - PIM de Março de Resultados 2019

More info »
~edelman.indb

~edelman.indb

Deuteronomy–Kings emerging as Authorit Ative BooK s A Conversation Edited by Diana V. Edelman riente Ancient n ast m onographs – m onografías sobre el Antiguo Cercano o ear e society of Biblical Liter...

More info »
Microsoft Word   Table of Contents Full Report with ES.doc

Microsoft Word Table of Contents Full Report with ES.doc

Stern Review: The Economics of Climate Change PAGE TABLE OF CONTENTS i-xxvii Executive Summary i Preface & Acknowledgements iv Introduction to Review vi Summary of Conclusions Part I Climate change: o...

More info »
The Adventures of Huckleberry Finn

The Adventures of Huckleberry Finn

THE ADVENTURES OF THE ADVENTURES OF HUCKLEBERR Y FINN HUCKLEBERR Y FINN BY MARK TWAIN A GLASSBOOK CLASSIC

More info »
The Best of Charlie Munger 1994 2011

The Best of Charlie Munger 1994 2011

The Best of Charlie Munger: 1994-2011 A collection of speeches, essays, and Wesco annual meeting notes

More info »
tr.book

tr.book

116th Congress, 1st Session – – – – – – – – – – – – – House Document 116-2 8 THE 2019 ANNUAL REPORT OF THE BOARD OF TRUSTEES OF THE FEDERAL OLD-AGE AND SURVIVORS INSURANCE AND FEDERAL DISABILITY INSUR...

More info »
27636 Rest U03 Congreve.qxd

27636 Rest U03 Congreve.qxd

1 WILLIAM CONGREVE 1670–1729 Love for Love Dramatis Personae valentine father to and sir sampson legend , ben fallen under his father’s displeasure by his expensive way of living, in , valentine angel...

More info »
MDT 2017

MDT 2017

United Nations Model Double Taxation Convention between Developed and Developing Countries 2017

More info »
Government Finance Statistics Manual 2014

Government Finance Statistics Manual 2014

MANUAL GOVERNMENT FINANCE STATISTICS MANUAL 2014 2014 INTERNATIONAL MONETARY FUND

More info »