
PASCO 2015 Presentations9:30am Thursday June 25th, 2015, in K9509.A Compact Parallel Implementation of F4Roman PearceA parallel implementation for polynomial multiplication modulo a primeMarshall LawOptimizing and Parallelizing the Modular GCD AlgorithmMichael Monagan
Abstracts A Compact Parallel Implementation of F4 We present a compact parallel implementation of the F4 algorithm for computing Groebner bases. Our C library is just 30KB and computes mod a 31 bit prime. We use Maple code to perform Chinese remaindering and rational reconstruction to compute over Q. We show how to easily parallelize the sparse linear algebra using Cilk, and we discuss our other experiments in parallelization. A parallel implementation for polynomial multiplication modulo a prime. We present a parallel implementation in Cilk C of a modular algorithm for multiplying two polynomials in Z_q[x] for integer q>1, for multicore computers. Our algorithm uses Chinese remaindering. It multiplies modulo primes p_1,p_2,... in parallel and uses a parallel FFT for each prime. Our software multiplies two polynomials of degree 10^9 modulo a 32 bit integer q in 83 seconds on a 20 core computer. Optimizing and Parallelizing the Modular GCD Algorithm Our goal is to design and implement a high performance modular GCD algorithm for polynomial GCD computation in Z_p[x1,x2,...,xn] for multicore computers which will be used to compute the GCD of polynomials over Z. For n=2 we have designed and implemented in C a highly optimized serial code for primes p<2^63. For n>2 we parallelized in Cilk C Brown's dense modular GCD algorithm using our serial bivariate code at the base. For n=3, we obtain good parallel speedup on multicore computers with 16 and 20 cores. We also compare our code with the GCD codes in Maple and Magma. 